Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial "multi-stage" support in bashbrew #5929

Merged
merged 12 commits into from
Jun 13, 2019

Conversation

tianon
Copy link
Member

@tianon tianon commented May 17, 2019

This allows bashbrew to properly handle cross-repository and cross-tag dependencies even in the face of multiple FROM instructions or COPY --from=.

This also provides the scaffolding necessary to implement this in scripts using bashbrew cat.

As fallback behavior, the *DockerFrom functions should return the FROM of the last stage in the Dockerfile (which is essentially the FROM of the final image).

Also, the output of bashbrew from is now a space-separated list.

This is the first step towards #3383 -- there's still a lot more shell script work to be done (which is frankly harder to find since we don't get compiler errors when they aren't updated), but this is the first step that makes that work possible to do. 👍

@@ -90,6 +92,8 @@ func cmdBuild(c *cli.Context) error {
}
defer archive.Close()

// TODO use "meta.StageNames" to do "docker build --target" so we can tag intermediate stages too for cache (streaming "git archive" directly to "docker build" makes that a little hard to accomplish without re-streaming)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this catch is worth pointing out explicitly here too -- we don't ever have a "local" copy of the context, and instead stream git archive directly to docker build, so even if docker build --target=N were possible to do for a numbered stage, we'd still have an optimization problem implementing it (and I'd love to at least tag named stages that way, but the git archive | docker build optimization makes that a little harder to swallow).

The practical implication of this is that build cache for these untagged stages will still be considered "ripe" by our tooling and thus we will end up spending extra time rebuilding things after each cleaning of "ripe" images on our build servers.

For this reason, I think that for the official images, we should plan to still discourage multi-stage image use unless the gains are clear and/or the final artifacts are reproducible (and thus can avoid unnecessary image updates), and even then encourage use of explicitly named stages so that we can hopefully tag them in the future to help avoid this issue.

@tianon tianon force-pushed the bashbrew-multistage branch from 05a4747 to fa01936 Compare May 18, 2019 00:12
@@ -53,7 +53,7 @@ _arches() {
_froms() {
bashbrew cat --format '
{{- range .TagEntries -}}
{{- $.DockerFrom . -}}
{{- $.DockerFroms . | join "\n" -}}
{{- "\n" -}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For folks affected by the DockerFrom deprecation, in my experience porting my own scripts, this is the most common fix. (There was one place in the docs we needed to switch to .ArchLastStageFrom instead, but it's pretty rare.)

@@ -115,7 +115,7 @@ template='
{{- "\n" -}}
{{- range $.Entries -}}
{{- $arch := .HasArchitecture arch | ternary arch (.Architectures | first) -}}
{{- $from := $.ArchDockerFrom $arch . -}}
{{- $froms := $.ArchDockerFroms $arch . -}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has me wondering if we should add an explicit .EnsureGitFetched function, since we use this function in a lot of places to mean that. 😅

(Room for future improvement -- I don't think we should block this on that.)

@yosifkit
Copy link
Member

I approve except we first need to have a document that will provide the very limited set on which we would accept a multi-stage build.


The general idea is that if it can be reasonably done without multi-stage, it should be (long RUN lines or auto-excluding build dependencies is not a good enough excuse).

@yosifkit
Copy link
Member

Initial rough list of allowable images (to help minimize problems due to #5929 (comment) / cache):

  1. only has COPY --from=foo:tag ...
    • i.e. image that copies a built binary from another tagged official image
    • it is impossible or close-to-impossible to do within the image
    • there is only one FROM instruction in the Dockerfile
    • examples:
      • nanoserver copying from a windowsservercore-based image
        • rationale: nanoserver doesn't have System.Net.WebClient or Invoke-WebRequest to download things from the internet
      • tomcat:*jre variant that copies tomcat native from respective tomcat:*jdk image

Maybe:

  1. two stage build where one builds a static binary and the other contains it
    • this should only be done if the build dependencies could not be reasonably installed and removed within a single build layer (i.e. the compiler exists in a supported official image but doesn't exist in Debian/Alpine packages)
    • It would be better if the upstream project just had published release binaries that could used instead of compiling in the build image since this will be subject to unknown number of rebuilds when the "ripe" images (and thus the build cache) are cleaned out
  2. jlink to create a minimal jre for the specified application? (are there official images that could use this?)

Pitfalls:

tianon added 11 commits June 12, 2019 08:06
This allows bashbrew to properly handle cross-repository and cross-tag dependencies even in the face of multiple `FROM` instructions or `COPY --from=`.

This also provides the scaffolding necessary to implement this in scripts using `bashbrew cat`.

As fallback behavior, the `*DockerFrom` functions should return the `FROM` of the last stage in the `Dockerfile` (which is essentially the `FROM` of the final image).

Also, the output of `bashbrew from` is now a space-separated list.
…e that's what's really necessary externally from that internal structure)
… of the given image (needed for docs generation)
… "bashbrew from" handling (especially in the case of no "--apply-constraints" flag)
…for a very dramatic speed increase (especially during dependency calculation)
@tianon tianon force-pushed the bashbrew-multistage branch from 98be20b to 22c68d5 Compare June 12, 2019 15:06
tianon added a commit to infosiftr/faq that referenced this pull request Jun 12, 2019
See docker-library/official-images#5929 (this is the initial supporting documentation for that PR).
@tianon
Copy link
Member Author

tianon commented Jun 12, 2019

Initial documentation PR up at docker-library/faq#6. 👍 🎉 🍰

@yosifkit yosifkit merged commit 7a0b286 into docker-library:master Jun 13, 2019
@yosifkit yosifkit deleted the bashbrew-multistage branch June 13, 2019 23:27
tianon added a commit to docker-library/oi-janky-groovy that referenced this pull request Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants