-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker: infer dependencies on COPY of pex binaries. #12920
docker: infer dependencies on COPY of pex binaries. #12920
Conversation
Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Addresses, | ||
UnparsedAddressInputs( | ||
dockerfile.putative_target_addresses, | ||
owning_address=original_tgt.target.address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps owning_address does not make sense to use here?
As the addresses from the Dockerfile should be relative to the project root, not relative to the docker_image target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think it makes sense, agreed.
def translate_to_address(self, value: str) -> str | None: | ||
# Translate something that resembles a packaged pex binary to its corresponding target | ||
# address. E.g. src.python.tool/bin.pex => src/python/tool:bin | ||
pex = re.match(_pex_target_regexp, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to resolve what target produced a certain output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This is a little bit fragile though - if someone changes the address->path mapping this will break, and they won't know. So tests will be key! Testing this function separately, and aggressively, will be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Didn't spend time testing it until I was sure it was going to be used, in it's current form :)
I haven't forgotten about this! Will try and get to it tomorrow or Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat stuff!! Really cool.
def translate_to_address(self, value: str) -> str | None: | ||
# Translate something that resembles a packaged pex binary to its corresponding target | ||
# address. E.g. src.python.tool/bin.pex => src/python/tool:bin | ||
pex = re.match(_pex_target_regexp, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This is a little bit fragile though - if someone changes the address->path mapping this will break, and they won't know. So tests will be key! Testing this function separately, and aggressively, will be important.
from pants.engine.fs import DigestContents, GlobMatchErrorBehavior, PathGlobs | ||
from pants.engine.rules import Get, collect_rules, rule | ||
|
||
_pex_target_regexp = re.compile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth compiling this on first use, inside the rule. The re
module maintains an internal cache of recently used compiled regexes, so performance is probably fine (and even if not, the cost of recompiling the regex is probably negligible compared to the rest of this).
The reason is: In pants v1 we noticed that a substantial amount of time on every pantsd restart was taken up by compiling module-level regexes on import.
Addresses, | ||
UnparsedAddressInputs( | ||
dockerfile.putative_target_addresses, | ||
owning_address=original_tgt.target.address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think it makes sense, agreed.
…n_pex_binaries [ci skip-rust]
Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
dockerfile: InitVar[str] | ||
commands: tuple[Command, ...] = field(init=False) | ||
|
||
pex_target_regexp: str = r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a field and not a "compile-time" constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was being moved around and changed quite a bit, so haven't actually spent that much time reflecting on its final resting place. As it came from a global compiled regexp, and moving it from there, I moved the entire thing, instead of turning it into a string that was later compiled.
This way it keeps the file cleaner, but true, it doesn't need to be a field, perhaps that is weird and gives the wrong impression of being there for purposes of customization or some such..
Can move it out to global scope again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, right now there would be a separate member for each instance, which doesn't really make sense.
(?P<name>(?:\w[.0-9_-]?)+) \.pex$ | ||
""" | ||
|
||
def __post_init__(self, dockerfile: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing real work (such as parsing) in a ctor seems weird to me. How about a parse()
classmethod that does the parsing work and then creates the instance? Then the @dataclass
can be frozen=True
, which is actually necessary if this class ends up being in the signature of a @rule
(I know it isn't at the moment, but we might as well future-proof). It's generally best to have immutable objects where possible. Or at least what passes for immutable in Python...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if we rename to parsed, how about the ctor takes the list of parsed commands as input, and have a classmethod do the parsing, returning the instance as result?
@dataclass(frozen=True)
class ParsedDockerfile:
commands: ...
@classmethod
def from_dockerfile(cls, dockerfile: str) -> "ParsedDockerfile":
return cls(parse_string(dockerfile))
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is pretty much what I had in mind!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have sped read your comment, I can see that it is what you practically spelled out :P
|
||
|
||
@dataclass | ||
class DockerfileParser: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really more of a ParsedDockerfile
right? It represents the result of parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
Edit: or not. Mm.. as it takes the Dockerfile to parse as input to the constructor, so it does the parsing. Then it has an interface to present the parsed result.. but that is the effect of parsing, so...
Edit 2: if we go with a helper class method, from_dockerfile
, then the ParsedDockerfile
name makes sense to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I like
yield command | ||
|
||
@staticmethod | ||
def translate_to_address(value: str, pex_target_regexp: Pattern) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the regex in is a bit weird. Probably better to encapsulate the compiled regex in an instance of this class, and not have this be a staticmethod after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was trying a bunch of different stuff for that, and at one point had it as a classmethod, but came to the conclusion that that would not improve load times at all, so, regular method it is.
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to go back and forth, but I think the regex thing got more complicated than it was worth. My bad!
pex_target_regexp: ClassVar[Pattern] | ||
|
||
def __post_init__(self): | ||
# Compile regexp when creating the first instance, and store it on the class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me that this is thread-safe. Assignment to a simple variable is atomic in Python, but there is no clear answer on whether assignment to a member (which involves the object's member dictionary) is atomic. In practice it probably depends on the implementation.
My apologies for making the regex issue more complicated than it needed to be, but I'm now thinking we should do the simple thing you did all along and compile at module load time, storing the compiled regex directly in the module. In v2 there are so few regexes that this is not going to be a performance issue. Again, sorry for spinning you in circles on this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, now we've explored the alternatives and can motivate the (small) cost of load-time compiling the regexp. I can leave a short comment about it so it is not lost and forgotten.
Running in circles is common for me 🐶
ParsedDockerfile.pex_target_regexp = re.compile(_pex_target_regexp, re.VERBOSE) | ||
|
||
@classmethod | ||
def parse(cls, dockerfile: str) -> "ParsedDockerfile": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe dockerfile_contents
? So it's obvious that you pass the contents, not a path?
yield command | ||
|
||
def translate_to_address(self, value: str) -> str | None: | ||
# Technically this could be a classmethod, but we need at least one instance created first, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After simplifying the regex, this can be a staticmethod again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
|
||
|
||
# We pay for the up front time it takes to compile this at load time, as the alternative complicates | ||
# the implementation. See PR #12920. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this comment TBH, since we're doing the normal, obvious thing. I think comments explaining why the code is like it is and not like X only really make sense when X is the obvious thing...
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
🚀 |
* [internal] Run pyupgrade on src/python/pants/backend/python ([#13073](#13073)) * [internal] Re-enable some skipped JVM tests. ([#13074](#13074)) * [internal] Use `DownloadedExternalModules` when analyzing external Go packages ([#13076](#13076)) * [internal] Use `DownloadedExternalModules` during Go target generation ([#13070](#13070)) * [internal] Replace deprecated use of `[pytest] junit_xml_dir` with `[test] xml_dir. ([#13069](#13069)) * [internal] Add `DownloadedExternalModules` for Go ([#13068](#13068)) * [internal] Always use jars on the user classpath, and generalize transitive classpath building ([#13061](#13061)) * Add failing tests for Go external modules ([#13065](#13065)) * [internal] java: fix version in test ([#13064](#13064)) * [internal] Skip additional inference tests ([#13062](#13062)) * [internal] java: enable cycles for file-level targets generated by `java_sources` ([#13058](#13058)) * [internal] Add a `@logging` decorator for tests. ([#13060](#13060)) * [internal] Improve compatibility of nailgun with append only caches, and use them for Coursier ([#13046](#13046)) * [internal] Stop using `go.sum` when generating `_go_external_package` targets ([#13052](#13052)) * [internal] Rename `go_module` target to `go_mod` ([#13053](#13053)) * [internal] Refactor `go/util_rules/external_module.py` ([#13051](#13051)) * [internal] go: add analyzer and rules for test sources ([#13041](#13041)) * [Internal] Refactoring how we integrate with dockerfile ([#13027](#13027)) * [internal] Simplify `go/package_binary.py` ([#13045](#13045)) * [internal] Refactor `OwningGoMod` ([#13042](#13042)) * [internal] Refactor `go_mod.py` ([#13039](#13039)) * [internal] Record metadata on engine-aware params ([#13040](#13040)) * [internal] Test discovery of `go` binary ([#13038](#13038)) * [internal] Extract directory setup for terraform linters / formatters into a separate rule. ([#13037](#13037)) * [internal] java: register dependency inference rules ([#13035](#13035)) * [internal] Add `strutil.bullet_list()` to DRY formatting ([#13031](#13031)) * Minor cleanups for the autoflake linter / formatter. ([#13032](#13032)) * Ensure XML results recorded for both pytest and junit ([#13025](#13025)) * [internal] go: refactor compilation into separate rule ([#13019](#13019)) * [internal] go: refactor link step into separate rule ([#13022](#13022)) * [internal] go: enable plugin in repo and cleanup test project ([#13018](#13018)) * [internal] go: use colon to separate binary name and version ([#13020](#13020)) * [internal] tweak formatting of help text for sourcefile-validation subsystem. ([#13016](#13016)) * [internal] Use system-installed Go rather than installing via Pants ([#13007](#13007)) * Move the `process-execution-local-cleanup` hint to a more specific location. ([#13013](#13013)) * [internal] Split shell targets into atom vs generator ([#12957](#12957)) * Install Go in CI ([#13011](#13011)) * Refresh maintainers list. ([#13004](#13004)) * [internal] Refactor setup of GOROOT and `import_analysis.py` ([#13000](#13000)) * Infer dependencies on COPY of pex binaries for `docker_image`s. ([#12920](#12920)) * Prepare 2.7.0. ([#12995](#12995)) * [internal] jvm: skip JDK tests unless env var set ([#12994](#12994)) * [internal] jvm: limit caching of JDK setup processes ([#12992](#12992)) * [internal] Async-ify `NailgunPool::connect` and `nailgun::CommandRunner`. ([#12990](#12990)) * [internal] Replace `java_library` with `java_source` and `java_sources`, and add `java_test` ([#12976](#12976)) * Prepare 2.7.0rc5. ([#12987](#12987)) * [internal] terraform: refactor parser script into its own file ([#12985](#12985)) * [internal] jvm/java: ensure JDK downloaded in one process ([#12972](#12972)) * add JDK to GitHub Actions CI ([#12977](#12977)) * [internal] Re-enable the `clippy::used_underscore_binding` check. ([#12983](#12983)) * [internal] Use target generation for `_go_external_package` ([#12929](#12929)) * [internal] Bump CI token expiration threshold. ([#12974](#12974)) * [internal] Re-enable the Java backend. ([#12971](#12971)) * [internal] Implement `@union`s via `Query`s ([#12966](#12966)) * Remove `Enriched*Result` classes in favor of `EngineAwareReturnType.cacheable` ([#12970](#12970)) * [internal] Remove spurious `python_tests` directive ([#12968](#12968)) * [internal] Python coverage report generation uses precomputed addresses. ([#12965](#12965)) * Add PackageRootedDependencyMap for mapping inferred Java dependencies ([#12964](#12964))
Initial support for dependency inference based on Dockerfile instructions.
This first step looks for potential packaged pex binaries referenced in the Dockerfile by inspecting the
COPY
instructions.Plan is to follow up with detection for sources in
files
targets, and otherdocker_images
referenced inFROM
instructions.This implementation leverages dockerfile for parsing (which in turn relies on the official moby/buildkit go library), which seems to have turned out pretty well, after all. ;)