Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaniko should exit immediately on multi-stage builds if --cache-copy-layers=true #2065

Closed
dradetsky opened this issue Apr 28, 2022 · 11 comments · Fixed by #2227
Closed

Kaniko should exit immediately on multi-stage builds if --cache-copy-layers=true #2065

dradetsky opened this issue Apr 28, 2022 · 11 comments · Fixed by #2227

Comments

@dradetsky
Copy link
Contributor

This is a Feature Request, I guess, but you don't have a template for that.

As I understand it, Kaniko does not properly support caching copy layers in multi-stage, and supporting this is not a priority. For example, #1244 has been open since May 2020, and in v1.2.0, Kaniko actually dropped support for caching copy layers, although it later added it back.

I'm not suggesting that supporting this should be a priority. However, given the severity of the issues that can result from this, I think it would be a really, really good idea to just add a startup check where if

  1. The Dockerfile we're trying to build is multi-stage
  2. opts.CacheCopyLayers is true

we immediately exit & print "Kaniko does not support caching copy layers in multi-stage builds" or something.

In my own case, we had tons of developers coming to us saying "I pushed new code, the new code got built, but it's still running the old code. What's going on?" My boss thought that Kaniko must just be a broken, immature project & told me to migrate us to buildkit ASAP. Fortunately, I figured out what was going on, which was good because buildkit had all sorts of other issues for our setup. I'd much rather be using Kaniko with --cache-copy-layers=false.

I realize that for a lot of potential users, caching copy layers isn't something that even makes sense in the first place, and they'll probably just leave it off if they notice it in the first place. However, some users who know that in their particular case they have a lot of expensive COPY operations (say, they get Dockerfiles from node.js developers who don't do any kind of bundling of their backend apps) will see that there's an flag --cache-copy-layers and assume it just works as advertised. They won't ask themselves if it works in combination with multi-stage builds because those are just a Dockerfile best practice; mentally they will equate "works with multi-stage builds" with "works with Dockerfiles." They'll roll it out and nobody will notice any issues until weeks later, and it'll take even longer to make the connection. This is extremely poor developer UX.

Anyway, it's too late to save me & my organization from this pain, but maybe we can save others.

I'd contribute a PR, but

  1. Maybe people want to discuss it
  2. Maybe some core dev will just immediately fix & merge it, given how simple it is
  3. I'm a terrible go developer

But I might if this is favorably received & I'm not too busy at work

@baznikin
Copy link

Sad, we use this feature. I emulate COPY monorepo/**/package.json ./ with intermediate stage:

FROM node:18 as PACKAGES
WORKDIR /app
COPY monorepo monorepo
RUN find monorepo/ \! -name "package.json" -type f -delete

FROM node:18 as BUILD
WORKDIR /app
COPY .yarn ./.yarn
COPY .yarnrc.yml .
COPY yarn.lock .
COPY package.json .
COPY --from=PACKAGES /app .
RUN yarn install --immutable

@dradetsky
Copy link
Contributor Author

@baznikin Can you explain more?

It looks like you want to copy a bunch of files, but exclude the package.json files (or at least that's what the dockerfile ref says that does). Is that correct? But you don't want to literally do this, but instead emulate it somehow. Is that correct? If so, why do you want to emulate it? Am I even close to understanding what's going on here?

On a separate note, when you say "we use this feature" can you say more about like how long you've been using this & at what scale? The reason is that I & others have experienced very severe bugs related to allowing cache-copy-layers in multistage builds (see above). Basically, unless you feel you really understand what's going on here, I would basically say "Just stop trying to cache copy layers and be glad you never ran into the issues I had."

Also, leaving aside the are-we-actually-getting-correct-builds issue aside, keep in mind that we're talking about caching here, so there's no reason you can't just stop using cache-copy-layers except that it might have some performance impact. So can you measure & report what that impact is using whatever kaniko version you were using before?

@pharaujo
Copy link

@dradetsky what I believe @baznikin is saying is that because COPY doesn't support the ** arbitrary dir depth syntax, they use a multi-stage build to create a layer that deletes all files but the package.json they want to keep, and then copy the layer in the final stage.

That said, I agree with you that the correctness aspect is more important than any performance gain.

@baznikin
Copy link

@pharaujo is right, it is because of lack of ** support. However I just realized I do not need intermediate layer for this, result will be the same.
We use this feature for few months and do not run into any issues yet. Thanks for saving us :)

Full Dockerfile was like this, it is actually 3 stage build. Repository is monorepo used to build 2 different images. Final stage used to make thinner image. Because of disabled COPY caching there were significant rebuild every time. However I do not see such behavior now, with --cache-copy-layers=false, maybe there were some other improvements in kaniko code somewhere else, maybe I experience great performance benefit with copy caching in some other repository. Anyway, I see the ways to overcome this limitation (incorporate code from 1st stage directly into 2nd; detach 3rd stage to completely separate build).

# emulate COPY monorepo/**/package.json ./
#
FROM node:18 as PACKAGES
WORKDIR /app
COPY monorepo monorepo
RUN find monorepo/ \! -name "package.json" -type f -delete

# build app here
#
FROM node:18 as BUILD
WORKDIR /app
ENV PATH /app/node_modules/.bin:$PATH
COPY .yarn ./.yarn
COPY .yarnrc.yml .
COPY yarn.lock .
COPY package.json .
COPY --from=PACKAGES /app .
RUN yarn install --immutable
COPY . .
RUN REACT_APP_API_URL=${REACT_APP_API_URL} yarn build-client
RUN npm prune --production

# final container
#
FROM node:18
RUN npm install -g \
    npm@8.2.0
WORKDIR /app
COPY --from=BUILD /app/monorepo monorepo
COPY --from=BUILD /app/build-client build-client
COPY --from=BUILD /app/node_modules node_modules
EXPOSE 3006
CMD ["node", "./build-client/index.js"]

@dradetsky
Copy link
Contributor Author

@pharaujo @baznikin It sounds like the real issue is that Kaniko doesn't actually support the ** behavior of COPY as described in the dockerfile reference. If so, it might be worth making a separate issue noting this. That said, don't expect anyone to fix it for you anytime soon. It's just that it might be worth recording this deviation.

@baznikin FWIW, while I hate to say "well just don't do that" when someone says what he wants to do with his computer on his project, it's worth pointing out that while nodejs in containers does tend to create massive COPY operations that you might want to cache (by putting 10M files in node_modules), it also has some ways to mitigate it, such as bundling. Rather than copying all of your modules and local code from the build image to the runtime image, you could bundle all the files in the build image and just copy a single js file to the runtime image. I haven't actually done this lately, but I believe vercel/ncc, vercel/pkg, and even good old webpack are all plausible ways of doing this. Not only does this mean you aren't creating monster COPY's that you want to cache, but it also means you end up with a much smaller image at the end. Also, it's probably possible to bundle your dependencies and your own code separately, and copy a few files instead of just one, although you'll have to figure out how to make this work for your project yourself.

As I implied above, the people who make container tooling tend to be golang people, and in golang land, the normal thing to do is compile your code into one (or maybe a few) binaries, and then do something like

FROM scratch as runtime
COPY --from=build /path/to/build/binary /binary
ENTRYPOINT ["/binary"]

In this case, it makes no sense to cache the copy layer, since all it does is cause you to store the binary you're going to copy to your runtime image twice. It's not like it's any faster to copy the file from the last cache layer of the build image than from the copy layer of the runtime image. So basically, it makes sense that they didn't put a ton of effort into making copy layer caching work correctly. Obviouisly, for container tooling which is meant to be language-agnostic, this is a mistake, and copy layer caching does make sense for people developing for nodejs. But on the other hand, if as a nodejs developer you have the opportunity to make your build process look more like the typical golang build process (by using a bundler), this may lead to better results with less effort.

@dradetsky
Copy link
Contributor Author

I should have made this more clear above, but: I don't actually know whether Kaniko supports ** in COPY or not, and haven't tested. In fact, I didn't even know what ** meant in COPY until I looked it up when replying to this initially. So confirming this would be useful.

Even better would be to contribute a (broken) integration test which asserts that kaniko does handle ** correctly, although I barely understand the kaniko integration test setup myself, so I wouldn't blame you for not doing this.

@dradetsky
Copy link
Contributor Author

dradetsky commented Oct 15, 2022

Nevermind, the meaning of the ** syntax I was referring to is dockerignore syntax, not copy syntax. I got confused.

In light of that, the command COPY monorepo/**/package.json ./ makes no sense; it's like saying "find every package.json file under monorepo or its subdirs, and then copy them all to ./" (whereupon they will overwrite one another in turn, and the final file obtained will depend on how fs subdir recursion is implemented). However, if you add monorepo/**/package.json to your dockerignore, then just COPY monorepo ./ should do what you want. I think. Something like that.

@baznikin
Copy link

baznikin commented Oct 17, 2022 via email

@dradetsky
Copy link
Contributor Author

I'm not sure I understand/agree, but either way if kaniko's COPY doesn't do what you want, that belongs in a separate issue.

@papb
Copy link

papb commented Nov 30, 2022

but maybe we can save others.

Hi @dradetsky thank you very much for this. You did.

In this case, it makes no sense to cache the copy layer, since all it does is cause you to store the binary you're going to copy to your runtime image twice. It's not like it's any faster to copy the file from the last cache layer of the build image than from the copy layer of the runtime image. So basically, it makes sense that they didn't put a ton of effort into making copy layer caching work correctly. Obviouisly, for container tooling which is meant to be language-agnostic, this is a mistake, and copy layer caching does make sense for people developing for nodejs. But on the other hand, if as a nodejs developer you have the opportunity to make your build process look more like the typical golang build process (by using a bundler), this may lead to better results with less effort.

Thank you so much for this explanation, really. It makes a lot of sense now.

Just a follow up: we agree that this is a bug, and you gave a good theory as to why fixing it has not been seen as a priority. However, I was wondering: is it hard to fix? do you have any idea what is the cause of the bug? maybe it's actually fixable fairly easily?

@mecampbellsoup
Copy link

In this case, it makes no sense to cache the copy layer, since all it does is cause you to store the binary you're going to copy to your runtime image twice.

The issue is that all subsequent Dockerfile commands are not cached after COPY misses cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants