Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using --reproducible loads entire image into memory #862

Open
nathan-c opened this issue Nov 15, 2019 · 11 comments · May be fixed by #3347
Open

Using --reproducible loads entire image into memory #862

nathan-c opened this issue Nov 15, 2019 · 11 comments · May be fixed by #3347
Labels
area/performance issues related to kaniko performance enhancement kind/question Further information is requested priority/p3 agreed that this would be good to have, but no one is available at the moment.

Comments

@nathan-c
Copy link

Actual behavior
When running with --reproducible=true the entire image is loaded into memory.

Expected behavior
Memory profile should remain stable regardless of size of image being built.

To Reproduce
Steps to reproduce the behavior:

  1. Create following Dockerfile
    FROM ubuntu:latest
    RUN dd if=/dev/zero of=output.dat  bs=1M  count=1000
    
  2. Execute kaniko with the following parameters
    --context=/some-empty-directory
    --dockerfile=/some-empty-directory/Dockerfile
    --destination=<any destination>
    --cache=true
    --reproducible=true
    
  3. Watch as memory consumption climbs dramatically near the end of the build.

Additional Information

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@cvgw cvgw added area/performance issues related to kaniko performance enhancement kind/question Further information is requested priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Nov 15, 2019
@nathan-c
Copy link
Author

I should probably mention that this means we cannot use --reproducible for our builds because the containers running the build are being killed by k8s on our dev cluster and increasing the node RAM capacity is not a cost effective solution

@macrotex
Copy link

macrotex commented Jan 2, 2020

We are having this issue as well. We need to use reproducible builds so that our daily builds do not get added to our container registry if there is no "real" change to the docker image.

@darkdragon-001
Copy link

@Phylu Will this also be closed in the 1.7.0 release as of #1722?

@Phylu
Copy link
Contributor

Phylu commented Oct 19, 2021

@Phylu Will this also be closed in the 1.7.0 release as of #1722?

I am not sure. Did not try this flag with my changes. If you are lucky, it could be that way.

zx96 added a commit to zx96/kaniko that referenced this issue Jan 2, 2023
This change adds a new flag to zero timestamps in layer tarballs without
making a fully reproducible image.

My use case for this is maintaining a large image with build tooling.
I have a multi-stage Dockerfile that generates an image containing
several toolchains for cross-compilation, with each toolchain being
prepared in a separate stage before being COPY'd into the final image.
This is a very large image, and while it's incredibly convenient for
development, making a change as simple as adding one new tool tends to
invalidate caches and force the devs to download another 10+ GB image.

If timestamps were removed from each layer, these images would be mostly
unchanged with each minor update, greatly reducing disk space needed for
keeping old versions around and time spent downloading updated images.

I wanted to use Kaniko's --reproducible flag to help with this, but ran
into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

This solution works around these problems by stripping out timestamps as
the layer tarballs are built. It removes the need for a separate
postprocessing step, and preserves image metadata so we can still see
when the image itself was built.

An alternative solution would be to use mutate.Time much like Kaniko
currently uses mutate.Canonical to implement --reproducible, but that
would not be a satisfactory solution for me until
[issue 1168](google/go-containerregistry#1168)
is addressed by go-containerregistry. Given my lack of Go experience, I
don't feel comfortable tackling that myself, and this seems like a
simple and useful workaround in the meantime.

As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that
should really be addressed in go-containerregistry itself).
@bh-tt
Copy link

bh-tt commented Apr 21, 2023

@zx96 do you plan to create a PR for the linked commit? It would save many people a lot of frustration if our image builds would stop crashing at the end of a build due to being killed.

@zx96
Copy link

zx96 commented Apr 22, 2023

@zx96 do you plan to create a PR for the linked commit? It would save many people a lot of frustration if our image builds would stop crashing at the end of a build due to being killed.

I certainly can. I was planning on merging it shortly after I made the branch but ran into another bug (that also broke my use case) and got frustrated. I'll rebase my branch tomorrow and see if the other issue is still around - can't recall what issue number it was.

zx96 added a commit to zx96/kaniko that referenced this issue Apr 22, 2023
This change adds a new flag to zero timestamps in layer tarballs without
making a fully reproducible image.

My use case for this is maintaining a large image with build tooling.
I have a multi-stage Dockerfile that generates an image containing
several toolchains for cross-compilation, with each toolchain being
prepared in a separate stage before being COPY'd into the final image.
This is a very large image, and while it's incredibly convenient for
development, making a change as simple as adding one new tool tends to
invalidate caches and force the devs to download another 10+ GB image.

If timestamps were removed from each layer, these images would be mostly
unchanged with each minor update, greatly reducing disk space needed for
keeping old versions around and time spent downloading updated images.

I wanted to use Kaniko's --reproducible flag to help with this, but ran
into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

This solution works around these problems by stripping out timestamps as
the layer tarballs are built. It removes the need for a separate
postprocessing step, and preserves image metadata so we can still see
when the image itself was built.

An alternative solution would be to use mutate.Time much like Kaniko
currently uses mutate.Canonical to implement --reproducible, but that
would not be a satisfactory solution for me until
[issue 1168](google/go-containerregistry#1168)
is addressed by go-containerregistry. Given my lack of Go experience, I
don't feel comfortable tackling that myself, and this seems like a
simple and useful workaround in the meantime.

As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that
should really be addressed in go-containerregistry itself).
zx96 added a commit to zx96/kaniko that referenced this issue Apr 22, 2023
This change adds a new flag to zero timestamps in layer tarballs without
making a fully reproducible image.

My use case for this is maintaining a large image with build tooling.
I have a multi-stage Dockerfile that generates an image containing
several toolchains for cross-compilation, with each toolchain being
prepared in a separate stage before being COPY'd into the final image.
This is a very large image, and while it's incredibly convenient for
development, making a change as simple as adding one new tool tends to
invalidate caches and force the devs to download another 10+ GB image.

If timestamps were removed from each layer, these images would be mostly
unchanged with each minor update, greatly reducing disk space needed for
keeping old versions around and time spent downloading updated images.

I wanted to use Kaniko's --reproducible flag to help with this, but ran
into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

This solution works around these problems by stripping out timestamps as
the layer tarballs are built. It removes the need for a separate
postprocessing step, and preserves image metadata so we can still see
when the image itself was built.

An alternative solution would be to use mutate.Time much like Kaniko
currently uses mutate.Canonical to implement --reproducible, but that
would not be a satisfactory solution for me until
[issue 1168](google/go-containerregistry#1168)
is addressed by go-containerregistry. Given my lack of Go experience, I
don't feel comfortable tackling that myself, and this seems like a
simple and useful workaround in the meantime.
@zx96
Copy link

zx96 commented Apr 22, 2023

After rebasing my branch onto the latest main, it looks like I'm able to build an image with a 16GB test file in 512MB of memory with --compressed-caching=false --zero-file-timestamps! 🥳

#2477 opened - thanks for reminding me about this @bh-tt .

@philippe-granet
Copy link

philippe-granet commented Dec 23, 2023

With kaniko-project/executor:v1.19.2-debug, building the same image:

  • with --reproducible flag, build took 5m 36s and use 7690Mb
  • without --reproducible flag, build took 1m 38s and use 350Mb

Activing profiling (https://github.com/GoogleContainerTools/kaniko#kaniko-builds---profiling) I see lot of traces inflate/deflate with --reproducible flag :
kaniko.zip

@peter-volkov
Copy link

Still causes problems

patchew-importer pushed a commit to patchew-project/qemu that referenced this issue May 16, 2024
Enables caching from the qemu-project repository.

Uses a dedicated "$NAME-cache" tag for caching, to address limitations.
See issue "when using --cache=true, kaniko fail to push cache layer [...]":
GoogleContainerTools/kaniko#1459

Does not specify a context since no Dockerfile is using COPY or ADD instructions.

Does not enable reproducible builds as
that results in builds failing with an out of memory error.
See issue "Using --reproducible loads entire image into memory":
GoogleContainerTools/kaniko#862

Previous attempts, for the records:
  - Alex Bennée: https://lore.kernel.org/qemu-devel/20230330101141.30199-12-alex.bennee@linaro.org/
  - Camilla Conte (me): https://lore.kernel.org/qemu-devel/20230531150824.32349-6-cconte@redhat.com/

Signed-off-by: Camilla Conte <cconte@redhat.com>
Message-Id: <20240516165410.28800-3-cconte@redhat.com>
@bh-tt
Copy link

bh-tt commented Oct 22, 2024

To make the code causing the problem a bit clearer:

  • during the DoBuild method, once the build is finished and reproducible flag is set, kaniko calls mutate.Canonical
  • this function does 2 things: stripping out configuration (such as build hostname, docker version) and setting all layer times to epoch (through the Time function in the same file)
  • the Time function iterates over each layer sets times to 0 through the layerTime helper function
  • this function is where the problem lies: it creates a bytes.Buffer, uncompresses the original layer, reads each tar entry, changes timestamps and writes it to this buffer. A new layer is then returned backed by an in-memory Buffer

There are several options:

    1. @zx96's strips out this metadata during creation of the original tar, which is likely the most efficient but appears to have some issues with layer caching?
    1. replacing the memory-backed buffer with a file would solve Using --reproducible loads entire image into memory #862 but likely not help with --reproducible flag massively increases build time #1960 and may also decrease performance
    1. minimum update would be to presize the in-memory buffer to be the same size as the original layer, which would likely reduce build time a little bit due to a reduced number of buffer copies, but I'm not sure this size is available
    1. as an extra we could wrap the writer around the buffer with a gzip writer, which may reduce the size of the buffer somewhat (as we zip before writing to the buffer) and may create a small performance improvement as the tarball.ReadFromOpener may run gzip twice now, once for the hash and once for the compressed layer

I'm having some trouble running kaniko in my dev setup so it is hard to see the gains of 2-4, but until metadata is stripped out during build I think implementing them could save a lot of memory (at the cost of disk usage) and result in a small performance gain.

bh-tt added a commit to bh-tt/kaniko that referenced this issue Oct 22, 2024
Fixes GoogleContainerTools#862, may mitigate GoogleContainerTools#1960

The layerTime function returns a layer backed by an in-memory buffer, which means during
the stripping of timestamps the entire image is loaded into memory. Additionally, it runs
gzip after the layer is created, resulting in even an even larger in-memory blob.

This commit changes this method to use a temporary file instead of an in-memory buffer,
and to use gzip compression while writing to this layer file, instead of compressing during read.

Signed-off-by: bh <bert@transtrend.com>
@bh-tt bh-tt linked a pull request Oct 22, 2024 that will close this issue
4 tasks
@bh-tt
Copy link

bh-tt commented Oct 23, 2024

I've found a 5th option: submitting a PR to go-containerregistry making the layerTime function a lazy transformation. Since the problem lies in a dependency, that may be the best way to achieve this result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance issues related to kaniko performance enhancement kind/question Further information is requested priority/p3 agreed that this would be good to have, but no one is available at the moment.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants