-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
airbyte-ci: consolidate the use of Gradle #32877
airbyte-ci: consolidate the use of Gradle #32877
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
4b2b3a8
to
833288e
Compare
stdout = await (await executor.run_task("spotlessCheck")).stdout() | ||
return CommandResult(check.commands["java"], status=StepStatus.SUCCESS, stdout=stdout) | ||
except dagger.ExecError as e: | ||
return CommandResult(check.commands["java"], status=StepStatus.FAILURE, stderr=e.stderr, stdout=e.stdout, exc_info=e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth factoring this CommandResult constructor boilerplate? It's basically get_check_command_result
but without the run_check
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
# Ensure that the .m2 directory exists. | ||
f"mkdir -p {local_maven_repository_path}", | ||
# Copy the gradle cache from the volume to the gradle home. | ||
f"(rsync -a --stats --mkpath {gradle_dep_cache_path}/ {gradle_home_path} || true)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't break up the with_exec
block formerly defined in with_whole_git_repo
; it won't work, because dagger's layer caching will get in the way. It's so painful to get right, trust me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right.
So I should always chain rsync + gradle exec when running a gradle task right?
sh_dash_c(
[
# Ensure that the .m2 directory exists.
f"mkdir -p {self.LOCAL_MAVEN_REPOSITORY_PATH}",
# Load from the cache volume.
f"(rsync -a --stats --mkpath {self.GRADLE_DEP_CACHE_PATH}/ {self.GRADLE_HOME_PATH} || true)",
# Resolve all dependencies and write their checksums to './gradle/verification-metadata.dryrun.xml'.
self._get_gradle_command("help", *warm_dependency_cache_args),
# Build the CDK and publish it to the local maven repository.
self._get_gradle_command(":airbyte-cdk:java:airbyte-cdk:publishSnapshotIfNeeded"),
# Store to the cache volume.
f"(rsync -a --stats {self.GRADLE_HOME_PATH}/ {self.GRADLE_DEP_CACHE_PATH} || true)",
]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More accurately: you want exactly that block above to run before running any other gradle task. Those subsequent tasks don't need to rsync because the S3 build cache will take over.
b676c6f
to
2718129
Compare
@postamar I believe I made the change you suggested.
I made a slight change to the
I changed it to:
Let me know if this change is impactful. So far the |
@property | ||
def dependency_cache_volume(self) -> dagger.CacheVolume: | ||
"""This cache volume is for sharing gradle dependencies (jars and poms) across all pipeline runs.""" | ||
return self.dagger_client.cache_volume(self.DEPENDENCY_CACHE_VOLUME_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why do we want to have this cache volume as a property when we're never going to want to reference it outside the class.
""" | ||
|
||
rsync_cache_volume_to_gradle_home = f"(rsync -a --stats --mkpath {gradle_dep_cache_path}/ {gradle_home_path} || true)" | ||
rsync_gradle_home_to_cache_volume = f"(rsync -a --stats {self.GRADLE_HOME_PATH}/ {self.GRADLE_DEP_CACHE_PATH} || true)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does defining these strings as variables really improve readability? To me it's just more lines of code.
|
||
Returns: | ||
Callable: A function which takes a gradle container and returns a gradle container with the S3 build cache credentials set as environment variables. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, nobody is ever going to call this, nor should we encourage them to. Is this docstring really useful? Same comment applies elsewhere.
|
||
rsync_cache_volume_to_gradle_home = f"(rsync -a --stats --mkpath {gradle_dep_cache_path}/ {gradle_home_path} || true)" | ||
rsync_gradle_home_to_cache_volume = f"(rsync -a --stats {self.GRADLE_HOME_PATH}/ {self.GRADLE_DEP_CACHE_PATH} || true)" | ||
# Running a gradle task like "help" with these arguments will trigger updating all dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's --write-verification-metadata --dry-run
which warms the cache. Gradle's CLI kind of sucks here.
Best move the warm_dependency_cache_args = ["--write-verification-metadata", "sha256", "--dry-run"] if not self.local_execution else ["--dry-run"]
in here, the last thing we want is people customizing this stuff. The whole point of this change is to re-use the same container for all gradle tasks, after all, and only the execution context should change things.
dagger.Container: The gradle container with the task executed. | ||
""" | ||
if "gradlew" not in await self.gradle_container.directory(".").entries(): | ||
raise Exception("gradlew not found in the current directory, please mount a directory with gradlew to the container.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gratuitous complexity? The failure will become apparent rather quickly, won't it?
""" | ||
if "gradlew" not in await self.gradle_container.directory(".").entries(): | ||
raise Exception("gradlew not found in the current directory, please mount a directory with gradlew to the container.") | ||
self.gradle_container = self.gradle_container.with_exec(sh_dash_c([self._get_gradle_command(task_name, *args)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think we want to mutate self.gradle_container
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests aren't happy but I haven't looked into that. I only have one blocking comment: I don't think self.gradle_container
should be mutated.
Otherwise, I'd like to encourage you to trim down the complexity of this code by avoiding to anticipate future problems too much. The airbyte-ci codebase is dangerously large as it is.
build_and_publish_local_cdk = self._get_gradle_command(":airbyte-cdk:java:airbyte-cdk:publishSnapshotIfNeeded") | ||
|
||
gradle_commands = [resolve_deps_and_write_checksum] | ||
if build_cdk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this argument and always run the CDK build task. Gradle will do the right thing.
self.gradle_container = await ( | ||
self.get_base_gradle_container() | ||
.with_(self.set_s3_build_cache_credentials(self.s3_build_cache_access_key_id, self.s3_build_cache_secret_key)) | ||
.with_mounted_directory(self.workdir, self.gradle_project_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not going to do what we want. You want to maintain the old behaviour where the whole repo is mounted to download all the dependencies. This way, in practice, that task is a no-op. Otherwise the cache volume gets polluted from one run to the next depending on which connector you build and you end up downloading stuff, which is slow.
Please, maintain the existing behaviour exactly. It took me several days to get right because it's so, so full of gotchas.
7f60e4f
to
2baf06b
Compare
2baf06b
to
f06d3a6
Compare
Sounds good! This certainly changes things. I'll get to it right now. |
…use_of_Gradle_btw_format_and_connector_test/build
OK, I understand why we mutate Anyway, here's what I came up with in light of the changes to airbyte-ci format. |
What
Closes #32557
The
format
command did not use any gradle caching logic which could make it faster.We have such a logic on the
GradleTask
step class.This PR is an attempt to consolidate a way to run Gradle tasks for multiple use cases.
How
GradleTaskExecutor
context manager class by copying the logic existing in theGradleTask(Step)
class.Using a context manager is helpful because it eases the gradle cache seeding on setup (
aenter
) and persisting the gralde cache on teardown (aexit
).format check/fix java
GradleTask(Step)
Benefits:
GradleTask(Step)