Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Layering for Multi-Project Setups #1436

Closed
remmeier opened this issue Jan 19, 2019 · 15 comments
Closed

Optimize Layering for Multi-Project Setups #1436

remmeier opened this issue Jan 19, 2019 · 15 comments
Assignees
Milestone

Comments

@remmeier
Copy link

remmeier commented Jan 19, 2019

Both Gradle and Maven are well-suited for multi-project setups. In this case an application consists of multiple jars that are put into the image. Every one of those JARs may frequently change. jib only puts the jar of the project jib has been applied to into a dedicated layer, while all other project dependencies get treated as third-party dependency and put into a layer together with those third-party dependencies. This makes the layering of jib rather inefficient for multi-project setups.

A multi project setup may also produce multiple images. we typically have something like a "service-common" project where we ideally would be able to create a base image for all the others.
This in turn is related related to #403 where one would have more control of the third-party dependencies.

@briandealwis
Copy link
Member

Thanks for the example, @remmeier. Note that we put snapshot releases in a separate layer too. Do your other service-common projects have the same groupId? Are they frequently released?

@loosebazooka
Copy link
Member

Our current design handles this with the build systems' SNAPSHOT versioning. During development, we would generally expect frequently changing libraries to versioned with SNAPSHOT until they are stabilized and a fixed version is released?

@remmeier
Copy link
Author

remmeier commented Jan 26, 2019

we got rid of snapshot releases based on the "every commit is a release" approach. That would explain why it is not working for us. We always build release version with the git commit hash or timestamp in the version. After it runs through the testing-related pipeline steps. If it passes that, it can get promoted to a "real" release without rebuilding.

  • group ids are the same
  • the computed version is used to tag the image
  • JARs themselves do not carry a version to gain reproducablity (as the version changes with every commit). Next to that there are few other optimizations like removing zip modification timestamps.

@remmeier
Copy link
Author

remmeier commented Mar 19, 2019

I looked at the sources, JavaContainerBuild:

file.getFileName().toString().contains("SNAPSHOT")
              ? LayerType.SNAPSHOT_DEPENDENCIES
              : LayerType.DEPENDENCIES,

and GradleLayerConfigurations:

boolean isSnapshot = dependencyFile.getName().contains("SNAPSHOT");
LayerType layerType = isSnapshot ? LayerType.SNAPSHOT_DEPENDENCIES : LayerType.DEPENDENCIES;

The containers("SNAPSHOT") check seems duplicated, but the one below is in use for Gradle and provides a good angle to fix the issue. There seem to be three kinds of dependencies:

  • local project dependencies in a multi-project Gradle/Maven build (=> this issue)
  • third-party SNAPSHOT dependencies
  • third-party release dependencies

It seems hard in Gradle to decide whether it is a project dependency with the Gradle API. sourceSet.runtimeClassPath is a convient API but does not specify this. And the configurations object with the underlying dependencies gives the information but is a bit early and hard to translate to the runtime classpath. But a second heuristic could solve the problem:

dependencyFile.absolutePath.contains(rootProject.projectDir.absolutePath)

=> external third-party dependencies will be hosted in the .m2, gradle cache or other directories
=> project dependencies will reside in the build/libs dir (could be a second check)

This would allow proper layering for multi-project setups without assuming a SNAPSHOT setup. Could also provide a fix for this. Currently it causes a bit of an issue for us because we have a multi-project with around 10 docker images, each one currently not being properly layered since it misses out on all the local project JARs, resulting in about a Gigabyte rather than a few Megabytes of updates and matching performance penality.

@chanseokoh
Copy link
Member

Thanks for the considering.

local project dependencies in a multi-project Gradle/Maven build (=> this issue)
third-party SNAPSHOT dependencies
third-party release dependencies

#403 is very closely related. (#403 got us to create the SNAPSHOT layer as the first step.) #403 is still open to have more control over fine-grained layers for dependencies. Perhaps some potential ideas suggested in #403 are more general and extensible, not just for multi-project setup. Your heuristic idea is nice as well, but I think fixing #403 will naturally cover this use case. Unfortunately, fixing #403 will require considerable design changes and a lot of internal discussions, and we've put it on a back burner for some time.

The containers("SNAPSHOT") check seems duplicated

FYI, the plugins are not using JavaContainerBuild yet (#1373). We plan to eventually get rid of GradleLayerConfigurations (or most of its contents).

@remmeier
Copy link
Author

remmeier commented Mar 25, 2019

Potentially #403 is not necessary when A multi project setup may also produce multiple images. we typically have something like a "service-common" project where we ideally would be able to create a base image for all the others. is resolved. Or at least it much less necessary and may feel a bit more natural in its usage. It seems simpler to organize layers according to the project structure, rather than adding more options to the plugin and forcing files manually into certain layers.

Maybe it would not even need much redesign since most things remain unchanges, the logic is implemented through a second jib providing a base layer rather than making a single jib instance much more flexible/customizable. All that would be necessary to avoid adding jars in upper layers already contained in the base layer.

But at some point similar to #403 it would be nice to be able to force any kind of file into a predefined layer, just for the use cases the plugin has no out-of-the-box solution. This goes also into direction of #1020.

@chanseokoh chanseokoh added this to the v1.2.0 milestone Mar 25, 2019
@hendrikhalkow
Copy link

hendrikhalkow commented Apr 25, 2019

I think Jib should allow every user to build the layer structure that he wants. Instead of making too many assumptions, we should provide a DSL that allows me customize the layering. We could use a DSL like this. It's a fully working Groovy example with the suggested DSL at the end:

#!/usr/bin/env groovy

class CustomLayerSpec {
  void include(String pattern) {
    println "    ${pattern}"
  }
}

class LayersSpec {
  void dependencies() {
    println "dependencies"
  }
  void snapshotDependencies() {
    println "snapshot dependencies"
  }
  void resources() {
    println "resources"
  }
  void classes() {
    println "classes"
  }

  void custom(String name, Closure customLayer) {
    println "custom layer `${name}`"
    def customLayerSpec = new CustomLayerSpec()
    def code = customLayer.rehydrate(customLayerSpec, this, this)
    code.resolveStrategy = Closure.DELEGATE_ONLY
    code()
  }
}

class JibSpec {
  void layers(Closure images) {
    def layersSpec = new LayersSpec()
    def code = images.rehydrate(layersSpec, this, this)
    code.resolveStrategy = Closure.DELEGATE_ONLY
    code()
  }
}

def jib(Closure cl) {
    def jib = new JibSpec()
    def code = cl.rehydrate(jib, this, this)
    code.resolveStrategy = Closure.DELEGATE_ONLY
    code()
}

jib {
  // actually, this should go into the jib.image.to block
  layers {
    custom('shared') {
      include 'spring-boot.jar'
      include 'commons.jar'
    }
    dependencies()
    snapshotDependencies()
    resources()
    classes()
    custom('extra') {
      include 'config.yaml'
    }
  }
}

@chanseokoh
Copy link
Member

@hendrikhalkow Jib plugins' aim is to hide underlying technologies, low-level concepts, and implementation details of container images, so that people who have never heard of or used Docker can just containerize a Java app by simply adding the Jib plugin, without any knowledge about various Docker concepts. The container "layer" is one of such concepts that we do not want to expose to users to deal with. Basically, the goal is to containerize a Java app in an optimal and opinionated way to free people from having to think about tweaking low-level details. Although I admit exposing layer configurations for advanced usage is certainly possible, we still want to make Jib's config space simple and concise to not overwhelm first time users of Jib.

But not all is lost actually. For out-of-scope or advanced usage such as yours, you could consider using our general-purpose image-building library jib-core. See the following examples to get the sense of it:

#654 (comment)
#1452 (comment)

@hendrikhalkow
Copy link

Hi @chanseokoh I agree that you should be able to use this without knowing about these underlying things. However, this should be an optional advanced feature that the first-time container user doesn't need to know about.

If you omit that layers block, Jib should behave exactly as it behaves now. If you want to customize your image, I don't see any reason why one should not be able to do it.

Other container-specific things like entry point are customizable, too, so why not the layers?

I was just looking to solve this and all the related issues in a convenient way.

@remmeier
Copy link
Author

how to proceed with this ticket? is there an interest in having one jib instance depending on another from within the same project? This to be able to establish a base image for multi-project setups as outlined above? I think it would be quite natural and aligned with the concepts of Gradle without having to go into the details of Dockers/layering. Or is it more something users are expected to do on their own with jib-core?

@chanseokoh
Copy link
Member

chanseokoh commented Jun 21, 2019

As I understand, this issue is about putting "project dependencies" (dependency JARs from other sub-projects/modules in a multi-project/module setup) into a dedicated layer, as opposed to putting all (non-SNAPSHOT) dependency JARs into the same layer (the current behavior).

#1780 (Maven) and #1785 (Gradle) fix this, putting those "project dependencies" (regardless of whether they are SNAPSHOT or not) into a new layer of its own on top of the SNAPSHOT layer. Therefore, there will be three layers starting from the next release 1.4.0:

  • project dependencies (the new one)
  • SNAPSHOT dependencies
  • all other dependencies

Closing the issue.

@remmeier
Copy link
Author

thx! I created a follow-up with #1807 that goes a bit deeper into the topic.

@TadCordle
Copy link
Contributor

@remmeier v1.4.0 has been released with this optimization!

@remmeier
Copy link
Author

that is great, thx!

@chanseokoh
Copy link
Member

@hendrikhalkow @remmeier

The Jib Extension Framework is now available with the latest Jib versions. You can easily extend and tailor the Jib plugins behavior to your liking.

We've written a general-purpose layer-filter extension that enables fine-grained layer control, including deleting files and moving files into new layers.

For general information about using and writing extensions, take a look at the Jib Extensions repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants