Using --reproducible loads entire image into memory #862

nathan-c · 2019-11-15T21:17:57Z

Actual behavior
When running with --reproducible=true the entire image is loaded into memory.

Expected behavior
Memory profile should remain stable regardless of size of image being built.

To Reproduce
Steps to reproduce the behavior:

Create following Dockerfile

FROM ubuntu:latest
RUN dd if=/dev/zero of=output.dat  bs=1M  count=1000

Execute kaniko with the following parameters

--context=/some-empty-directory
--dockerfile=/some-empty-directory/Dockerfile
--destination=<any destination>
--cache=true
--reproducible=true

Watch as memory consumption climbs dramatically near the end of the build.

Additional Information

Having done some investigation, this is down to go-containerregistry not kaniko code so maybe this issue belongs there? The offending code is: https://github.com/google/go-containerregistry/blob/master/pkg/v1/mutate/mutate.go#L300
Image used: gcr.io/kaniko-project/executor@sha256:fe07c91e9342a097a7ee7bf90d0d8b49b285a725315758e92de03e9f5debbb5c

Triage Notes for the Maintainers

Description	Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use `--cache` flag
Please check if your dockerfile is a multistage dockerfile

The text was updated successfully, but these errors were encountered:

nathan-c · 2019-11-16T08:21:15Z

I should probably mention that this means we cannot use --reproducible for our builds because the containers running the build are being killed by k8s on our dev cluster and increasing the node RAM capacity is not a cost effective solution

macrotex · 2020-01-02T02:58:05Z

We are having this issue as well. We need to use reproducible builds so that our daily builds do not get added to our container registry if there is no "real" change to the docker image.

darkdragon-001 · 2021-10-19T11:25:18Z

@Phylu Will this also be closed in the 1.7.0 release as of #1722?

Phylu · 2021-10-19T11:28:13Z

@Phylu Will this also be closed in the 1.7.0 release as of #1722?

I am not sure. Did not try this flag with my changes. If you are lucky, it could be that way.

This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime. As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that should really be addressed in go-containerregistry itself).

bh-tt · 2023-04-21T08:20:27Z

@zx96 do you plan to create a PR for the linked commit? It would save many people a lot of frustration if our image builds would stop crashing at the end of a build due to being killed.

zx96 · 2023-04-22T01:18:45Z

@zx96 do you plan to create a PR for the linked commit? It would save many people a lot of frustration if our image builds would stop crashing at the end of a build due to being killed.

I certainly can. I was planning on merging it shortly after I made the branch but ran into another bug (that also broke my use case) and got frustrated. I'll rebase my branch tomorrow and see if the other issue is still around - can't recall what issue number it was.

This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime. As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that should really be addressed in go-containerregistry itself).

This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime.

zx96 · 2023-04-22T04:43:31Z

After rebasing my branch onto the latest main, it looks like I'm able to build an image with a 16GB test file in 512MB of memory with --compressed-caching=false --zero-file-timestamps! 🥳

#2477 opened - thanks for reminding me about this @bh-tt .

philippe-granet · 2023-12-23T17:33:40Z

With kaniko-project/executor:v1.19.2-debug, building the same image:

with --reproducible flag, build took 5m 36s and use 7690Mb
without --reproducible flag, build took 1m 38s and use 350Mb

Activing profiling (https://github.com/GoogleContainerTools/kaniko#kaniko-builds---profiling) I see lot of traces inflate/deflate with --reproducible flag :
kaniko.zip

peter-volkov · 2024-03-20T15:17:12Z

Still causes problems

Enables caching from the qemu-project repository. Uses a dedicated "$NAME-cache" tag for caching, to address limitations. See issue "when using --cache=true, kaniko fail to push cache layer [...]": GoogleContainerTools/kaniko#1459 Does not specify a context since no Dockerfile is using COPY or ADD instructions. Does not enable reproducible builds as that results in builds failing with an out of memory error. See issue "Using --reproducible loads entire image into memory": GoogleContainerTools/kaniko#862 Previous attempts, for the records: - Alex Bennée: https://lore.kernel.org/qemu-devel/20230330101141.30199-12-alex.bennee@linaro.org/ - Camilla Conte (me): https://lore.kernel.org/qemu-devel/20230531150824.32349-6-cconte@redhat.com/ Signed-off-by: Camilla Conte <cconte@redhat.com> Message-Id: <20240516165410.28800-3-cconte@redhat.com>

bh-tt · 2024-10-22T09:52:30Z

To make the code causing the problem a bit clearer:

during the DoBuild method, once the build is finished and reproducible flag is set, kaniko calls mutate.Canonical
this function does 2 things: stripping out configuration (such as build hostname, docker version) and setting all layer times to epoch (through the Time function in the same file)
the Time function iterates over each layer sets times to 0 through the layerTime helper function
this function is where the problem lies: it creates a bytes.Buffer, uncompresses the original layer, reads each tar entry, changes timestamps and writes it to this buffer. A new layer is then returned backed by an in-memory Buffer

There are several options:

1. @zx96's strips out this metadata during creation of the original tar, which is likely the most efficient but appears to have some issues with layer caching?
1. replacing the memory-backed buffer with a file would solve Using --reproducible loads entire image into memory #862 but likely not help with --reproducible flag massively increases build time #1960 and may also decrease performance
1. minimum update would be to presize the in-memory buffer to be the same size as the original layer, which would likely reduce build time a little bit due to a reduced number of buffer copies, but I'm not sure this size is available
1. as an extra we could wrap the writer around the buffer with a gzip writer, which may reduce the size of the buffer somewhat (as we zip before writing to the buffer) and may create a small performance improvement as the tarball.ReadFromOpener may run gzip twice now, once for the hash and once for the compressed layer

I'm having some trouble running kaniko in my dev setup so it is hard to see the gains of 2-4, but until metadata is stripped out during build I think implementing them could save a lot of memory (at the cost of disk usage) and result in a small performance gain.

Fixes GoogleContainerTools#862, may mitigate GoogleContainerTools#1960 The layerTime function returns a layer backed by an in-memory buffer, which means during the stripping of timestamps the entire image is loaded into memory. Additionally, it runs gzip after the layer is created, resulting in even an even larger in-memory blob. This commit changes this method to use a temporary file instead of an in-memory buffer, and to use gzip compression while writing to this layer file, instead of compressing during read. Signed-off-by: bh <bert@transtrend.com>

bh-tt · 2024-10-23T12:41:11Z

I've found a 5th option: submitting a PR to go-containerregistry making the layerTime function a lazy transformation. Since the problem lies in a dependency, that may be the best way to achieve this result.

cvgw added area/performance issues related to kaniko performance enhancement kind/question Further information is requested priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Nov 15, 2019

tmaier mentioned this issue Apr 22, 2020

Memory issue with v0.19.0 #1173

Closed

This was referenced Jul 28, 2020

registry: add disk-backed storage google/go-containerregistry#488

Closed

kaniko build using too many memory #909

Open

jonjohnsonjr mentioned this issue Jul 29, 2020

mutate: use streaming layers google/go-containerregistry#749

Closed

YevheniiSemendiak mentioned this issue Jun 22, 2021

[B] Building large Docker images in Kaniko leads to OOMKilled job neuro-inc/neuro-extras#283

Open

jglynn mentioned this issue Jan 31, 2023

Reproducible builds broken in 1.8.0 #2005

Closed

zx96 mentioned this issue Apr 22, 2023

Add --zero-file-timestamps flag #2477

Open

4 tasks

bh-tt linked a pull request Oct 22, 2024 that will close this issue

Speedup --reproducible layer stripping #3347

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using --reproducible loads entire image into memory #862

Using --reproducible loads entire image into memory #862

nathan-c commented Nov 15, 2019

nathan-c commented Nov 16, 2019

macrotex commented Jan 2, 2020

darkdragon-001 commented Oct 19, 2021

Phylu commented Oct 19, 2021

bh-tt commented Apr 21, 2023

zx96 commented Apr 22, 2023

zx96 commented Apr 22, 2023

philippe-granet commented Dec 23, 2023 •

edited

Loading

peter-volkov commented Mar 20, 2024

bh-tt commented Oct 22, 2024

bh-tt commented Oct 23, 2024

Using --reproducible loads entire image into memory #862

Using --reproducible loads entire image into memory #862

Comments

nathan-c commented Nov 15, 2019

nathan-c commented Nov 16, 2019

macrotex commented Jan 2, 2020

darkdragon-001 commented Oct 19, 2021

Phylu commented Oct 19, 2021

bh-tt commented Apr 21, 2023

zx96 commented Apr 22, 2023

zx96 commented Apr 22, 2023

philippe-granet commented Dec 23, 2023 • edited Loading

peter-volkov commented Mar 20, 2024

bh-tt commented Oct 22, 2024

bh-tt commented Oct 23, 2024

philippe-granet commented Dec 23, 2023 •

edited

Loading