-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash copied files #2055
Hash copied files #2055
Conversation
core/src/main/java/org/testcontainers/containers/GenericContainer.java
Outdated
Show resolved
Hide resolved
# Conflicts: # core/src/test/java/org/testcontainers/containers/ReusabilityUnitTests.java
# Conflicts: # core/src/test/java/org/testcontainers/containers/ReusabilityUnitTests.java
# Conflicts: # core/src/test/java/org/testcontainers/containers/ReusabilityUnitTests.java
checksum.update(MountableFile.getUnixFileMode(file.toPath())); | ||
if (file.isDirectory()) { | ||
try (Stream<Path> stream = Files.walk(file.toPath())) { | ||
stream.filter(it -> !Files.isDirectory(it)).forEach(path -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the file mode included in the checksum at line 506 which is cool...
But here, when we walk the contents, are we only walking sub_files_ (i.e. direct child files and child files of subdirectories)?
If that's the case then we'd not capture the file mode of subdirectories - is that the correct interpretation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Fixed.
try { | ||
return (int) Files.getAttribute(path, "unix:mode"); | ||
return (int) Files.readAttributes(path, "unix:mode").get("mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious as to why we'd need to use readAttributes
here - we're still only fetching the mode, so getAttribute
perhaps doesn't really have a performance disadvantage.
However, now this makes me wonder if we should also be hashing basic file attributes like created/modified timestamps etc 😭
WDYT? I don't mind deferring this, TBH...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we checksum content & file mode (the only bits that actually go into the tar archive), I guess we can defer the basic file attributes until someone reports that the hashing works incorrectly for them.
I will also check what Dockerfile builder is using for hashing COPY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, sounds sensible 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/a/59073724/1826422
For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.
okay, looks pretty aligned with what we're doing
depends on #2051