-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsequent COPY instructions re-adds all files in every layer instead of only the files that have changed #21950
Comments
Docker already does this. When docker generates a new layer, it calculates the difference. It doesn't store the entire filesystem. This can be demonstrated in the following example:
Notice how the size of the last layer (first in the list since docker outputs reversed history) is 10mb, which is the size of the file we changed. All the other 90mb worth of files were unchanged, and thus not added to the layer. |
@phemmer Whoa, great news! But in what Docker version was this fixed? I am using 1.10.3 on OSX (latest available from https://www.docker.com/products/docker-toolbox) and getting:
Notice how the size of the last layer (first in the list since docker outputs reversed history) is 100mb, meaning that all files were re-added in the last layer. |
@motin the difference is that in the example @phemmer gave, he split the image up into a "base" image (using Dockerfile) and an image that extends the image (Dockerfile.update). If you only rebuild the second one, docker compares the change with the layers above it (which are in the base-image), and only adds files that are changed. The building is still done on the daemon, so docker will upload the whole build-context to the daemon on each build (#9553 is a proposal to make this smarter) |
@thaJeztah I too split up the image in the exact same way. Actually, I re-ran @phemmer's example from top to bottom and got the 100MB-sized layer outcome. Maybe there is some bug causing these differences in outcome? On what system and with what Docker version does it work? It sure does not work on 1.10.3 on OSX. |
hm, although, that shouldn't matter the first time you run it (it should just untag the image) |
@thaJeztah Thanks, it did not change the outcome however. I set up a gist and the following one-liner to replicate the issue locally:
Here is my output from the one-liner |
hm, looking into this now; it seems that for some reason it's indeed including all files again in the last layer, not just the changed file |
@thaJeztah Thanks, I renamed the issue to better reflect the underlying issue. |
Yes, these were my steps to reproduce; Dockerfile.base;
Dockerfile
Then save the image (
So (testing on 1.11.0-rc4) |
@thaJeztah What is your output from the one-liner above and one what system are you using Docker? |
Thanks for the clarification. The gist has been updated to use the same tags as in your example: https://gist.github.com/motin/cca880c647263eb5e98d9f1e0d60a3c5 Still the same 100MB-sized layer outcome regardless of the tags used. |
To answer the question about what version I'm using, 1.8.3 (the underlying os is always linux, even on mac. Though the client is linux, just in case that for some unlikely reason makes a difference). |
@phemmer Thanks. I now ran the one-liner on Docker 1.8.3 running on Debian, however the issue remains. Apparently, this has nothing to do with Docker version or if it is running in a native Linux environment or in OSX. What do you get if you run the one-liner on your box? |
@phemmer having the same on 1.9.1, now trying docker 1.8.3 |
I also ran the script on 1.7 (Debian), 1.10 (Ubuntu), 1.11-rc4 (Mac) on Mac, via SSH and a swarm, all with the 100 MB / 100 MB outcome. [update] |
Can replicate on aufs, but on overlay I get 10MB/100MB. |
I was beginning to wonder if this was specific to the storage driver. I'm using btrfs. (and I get the proper behavior) |
looks to be aufs yes |
Slightly OT: But how to test other storage drivers (easily) - I looked through the docs and also tried to create machines with And (bonus question) what's the current state or plan about docker and it's preferred storage driver? |
@schmunk42 Docker for Mac can switch between aufs and overlay, although it is not yet documented I see. Generally those are easy to switch as they do not involve any reformatting, you can just change the In terms of advice well there is @jfrazelle view here https://blog.jessfraz.com/post/the-brutally-honest-guide-to-docker-graphdrivers/ - it heavily depends on what base distro you are using what choice you make. |
Confirmed to be working as mentioned (10 MB / 100 MB) on |
@schmunk42 This page is also great: https://github.com/docker/docker/blob/master/docs/userguide/storagedriver/selectadriver.md#future-proofing It says AUFS and Devicemapper(direct-lvm) are "production-ready" status. |
I would agree with AUFS but you've been warned about devicemapper On Tue, Apr 12, 2016 at 9:04 AM, Akihiro Suda notifications@github.com
Jessie Frazelle |
This is happening because of the different methods storage drivers use to compute layer differences. Everything but aufs uses |
3 years, is there any update on this? :) |
Using dive to debug why some images are so large and came across this issue. In the Dockerfile I'm copying from a base image and it results in multiple copies of the same file.
If anyone has a workaround to skip duplicate files during the COPY, that'd be great. Thanks. |
If the target stage already has those files then depending on the storage driver, those files may be copied over it (even if they would possibly be the same). More details about that in #35280 |
To join in on the fun, I'm running into this issue on aufs and overlay2fs as well. Is there any more direct issue with the storage implementations that we should move the discussion to? I expect many other users are hitting this issue, but don't realize the bloated size of their images. |
I'm having this issue on OSX Catalina latest version.. Guess this issue wont be fixed if its already open serveral years now 🙈 😢 |
Is there no workaround for this? We are pushing 200mb each time, even if we only change a 1 bit file.. 🙄 |
@h0jeZvgoxFepBQ2C do you have BuildKit enabled ( |
@h0jeZvgoxFepBQ2C our workaround is to use the same base stage, in the one that we are modifying and will copy from to the final layer, perform a Something like the following (note we're using buildkit to do some caching of packages)
This is not ideal in that we're installing packages in the second stage that might end up being different versions compared to the libraries due to caching of layers, but in practice we haven't been bitten by it yet. I subsequently found a better way in that the python install of ansible is relocatable so I can instead install using But the idea should work in the meantime as a way of brute forcing ignore existing files. @thaJeztah to confirm we bumped into this about 2 months ago, and have buildkit enabled by default for our builds. My storage is:
If there is anything you'd like me to test to add more data, let me know. |
Is this bug ever going to be fixed? |
I'm getting messages from the docker daemon that the overlay driver is deprecated and will be removed. Is there a different driver that has the correct behavior of not copying unchanged files? |
I just noticed this issue as well.
-- from https://docs.docker.com/storage/storagedriver/select-storage-driver/ |
We are still having this problem... For me on OSX it works fine now with the depreacted overlay driver. But my team members with Windows / Ubuntu WSL2 still have this problem, even with overlay driver... Any hints? |
This is still an issue and it's really a pain to always send 600MB even if one one small text file changes. I cannot understand how Today I switched back to overlay2 again and the issue is still persisting. @thaJeztah Our workflow with the old driver, which works fine, was to build a base image with all files included, and another image which is based on the base image, but adds all files again. With |
I have tested one of the scenarious above. The drivers I used: vfs, btrfs, overlay, devicemppper, fuse-overalyfs,, zfs are behaving the same way. They do not re-add all files. Only Overlay2 is having this issue. So I think that if yu do not want to rely on overlay (because it is deprecated despite the fact that overlay2 cannot be considered as a replacement I would suggest to try the btrfs or zfs) or "throw" away docker for building and use directly buildkit. |
@Withy14 what do you mean with "use directly builtkit"? |
Workaround for moby/moby#21950 Somehow COPY of the /usr/local wastes a lot of space. (every unmodified files is added too to the layer, wasting space) This commit adds a workaround by selectively COPY files from /usr/local Before: postgis test 3120829531b9 53 seconds ago 475MB After postgis test d5628d771874 19 minutes ago 425MB
I really don't understand why docker is not looking into this, I guess there are so many companies out there which are pushing thousands of GB all the time just because of this bug. I guess they could save really a huge amount of bandwidth / traffic. |
@thaJeztah are you aware that this bug probably causes Petabyte of unnecessary traffic for your company? |
@thaJeztah i really can't emphasize the importance of this issue. I really can't understand why this issue gets ignored for such a long time. Is there anyone from the docker or moby team here which took a look? Docker is losing soo much money due to traffic due to this issue, it's really suprising for me. Would be great to get at least a statement? Also why the overlay1 driver was removed, since this has not been fixed at all? |
This may actually be resolved with the containerd image-store integration (https://docs.docker.com/storage/containerd/) enabled; also see this other ticket; Without the containerd image-store integration enabled, the example from #21950 (comment) above produces; docker history test
IMAGE CREATED CREATED BY SIZE COMMENT
8842f6ab9dda 1 second ago COPY . / # buildkit 100MB buildkit.dockerfile.v0
<missing> 2 seconds ago COPY . / # buildkit 100MB buildkit.dockerfile.v0
<missing> 9 months ago BusyBox 1.36.1 (glibc), Debian 12 4.26MB
docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
test latest 8842f6ab9dda About a minute ago 204MB
<none> <none> 6af04fbd25b7 About a minute ago 104MB With the containerd image-store integration enabled, the second copy only adds 10MB, not the full 100MB; docker history test
IMAGE CREATED CREATED BY SIZE COMMENT
00842ea4df96 About a minute ago COPY . / # buildkit 10MB buildkit.dockerfile.v0
<missing> About a minute ago COPY . / # buildkit 100MB buildkit.dockerfile.v0
<missing> 9 months ago BusyBox 1.36.1 (glibc), Debian 12 4.31MB
docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
test latest 00842ea4df96 2 minutes ago 117MB To verify, do you have an example reproducer? Also if you have an environment to test in, it might be worth trying if the containerd image-store integration resolves your issue (https://docs.docker.com/storage/containerd/). It's possible that additional changes are still needed in BuildKit (https://github.com/moby/buildkit) which is used as default builder for building images. |
@h0jeZvgoxFepBQ2C that's the checkbox that enables it, yes. But take note of the description under it; the containerd image store uses a store that's separate from the "graph driver" store; it won't delete those existing images and containers, but they will be "invisible" (and still take up space in your docker desktop storage). So if you don't have "pet" containers and images, you could choose to first remove your containers and images, or do a "purge data" or "factory reset" (from the "troubleshoot" page in Docker Desktop's UI) |
Sorry for the long delay in my reply.. I try to enable the checkbox, but then docker stops completely. In the logs I can only see:
|
Under certain circumstances, subsequent COPY instructions re-adds all files in every layer instead of only the files that have changed.
ISSUE RENAMED: The original suggestion was to add a SYNC instruction, but the COPY instruction should already cover the use cases for which the SYNC instructions was intended. The original post was as follows:
Suggestion:
The SYNC instruction compares and and performs the necessary changes for to become identical to .
This would alleviate a lot of pain present when building images that contain application source code that changes incrementally and often, since the whole source code needs to be COPIED (
COPY . /app
) to the image again and again, leading to longer deploy cycles and tons of unnecessary data pushed to and stored in the registries.In extreme cases, the source code directory can reach up to gigabytes in size, and in order to publish a some smaller changes in a few hundred files, all gigabytes needs to be pushed and subsequently pulled by those who wish to get access to the changes, or the servers that are to serve the new source code.
A SYNC command would instead allow for new image layers to include only the files that actually changed in comparison to the existing image contents, which probably won't amount to more than a few hundred kilobytes in most cases.
(Optionally, an ability to access the build context from the RUN instruction would make this particular instruction unnecessary by allowing for instance rsync to compare build context against image contents)
The only currently feasible workaround today seems to be to use rsync to analyze the differences between two images and then use the changelog output to craft a tar-file containing the relevant changes.
The text was updated successfully, but these errors were encountered: