Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: use heuristic for determining comparison result #305

Closed

Conversation

wyattanderson
Copy link

@wyattanderson wyattanderson commented May 16, 2020

Note: there's more cleanup to be done and tests to be written/updated if folks are on board with this idea; this is just a proposal for the sake of discussion

I've been trying to debug a scenario where dive doesn't list files as modified, despite the layer size being upwards of 100MB. I started out following in the footsteps of #142, where I added ModTime to the FileInfo struct to handle a case such as this:

# base image
FROM alpine:latest
RUN dd if=/dev/zero of=zero.bin bs=1M count=1

# derived image
FROM base
RUN touch zero.bin

Currently, in dive, the layer list will show that the layer created by touch zero.bin is 1MB, but will not show the zero.bin file as a modified file. The file attributes that are currently tracked haven't changed, but the modification time (mtime) has, so Docker includes zero.bin in the layer created by touch zero.bin, and you're stuck carrying around 2MB instead of 1MB if that's what you desire.

The OCI image layer specification lists the file attributes that are tracked; this includes not only modification time, but extended attributes (xattrs). The latter, I found, is actually the root cause of my problem, because I have an image build process that modifies the ctime (changed time, ChangeTime in the TAR header).

I started to look in to comparing xattrs or ChangeTime in FileInfo.Compare, but ran into another problem: the Docker image as published contains the change time in the TAR header for the files in question, but exporting the Docker image from the engine (when dive starts) does not include these attributes. I don't know why that's the case, but it's definitely a lossy process.

So, I'm proposing redefining "modified" as "present in both the upper and lower layer", essentially leaving the decision up to whatever built the image instead of trying to make a comparison in dive itself. From a UX perspective, I could see value in the comparison logic for indicating why something was modified (the hash changed, the UID/GID changed, etc.), but regardless of the underlying reason, you're still carrying around the baggage if whatever built the image decided that something changed.

Copy link
Owner

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point, and I agree... ultimately if the file is there then it's probably been modified. Or put another way, if the path has been seen again semantically it has been modified in some way --we don't care how/why. Once the tests are in a passing state I'll merge away!

@wagoodman wagoodman deleted the branch wagoodman:master July 6, 2023 15:24
@wagoodman wagoodman closed this Jul 6, 2023
@wagoodman
Copy link
Owner

wagoodman commented Jul 6, 2023

Sorry! I didn't mean to close this, I renamed the master branch to main... I should have switched the target here first.

Edit: didn't think to just recreate the master branch temporarily... I'll walk this one across the finish line.

@wagoodman wagoodman reopened this Jul 7, 2023
@wagoodman wagoodman deleted the branch wagoodman:master July 7, 2023 14:06
@wagoodman wagoodman closed this Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants