Proposal: use heuristic for determining comparison result #305
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: there's more cleanup to be done and tests to be written/updated if folks are on board with this idea; this is just a proposal for the sake of discussion
I've been trying to debug a scenario where
dive
doesn't list files as modified, despite the layer size being upwards of 100MB. I started out following in the footsteps of #142, where I addedModTime
to theFileInfo
struct to handle a case such as this:Currently, in
dive
, the layer list will show that the layer created bytouch zero.bin
is 1MB, but will not show thezero.bin
file as a modified file. The file attributes that are currently tracked haven't changed, but the modification time (mtime
) has, so Docker includeszero.bin
in the layer created bytouch zero.bin
, and you're stuck carrying around 2MB instead of 1MB if that's what you desire.The OCI image layer specification lists the file attributes that are tracked; this includes not only modification time, but extended attributes (
xattrs
). The latter, I found, is actually the root cause of my problem, because I have an image build process that modifies thectime
(changed time,ChangeTime
in the TAR header).I started to look in to comparing
xattrs
orChangeTime
inFileInfo.Compare
, but ran into another problem: the Docker image as published contains the change time in the TAR header for the files in question, but exporting the Docker image from the engine (whendive
starts) does not include these attributes. I don't know why that's the case, but it's definitely a lossy process.So, I'm proposing redefining "modified" as "present in both the upper and lower layer", essentially leaving the decision up to whatever built the image instead of trying to make a comparison in
dive
itself. From a UX perspective, I could see value in the comparison logic for indicating why something was modified (the hash changed, the UID/GID changed, etc.), but regardless of the underlying reason, you're still carrying around the baggage if whatever built the image decided that something changed.