Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduces new mechanism to fully scan package dependencies #3934

Merged
merged 2 commits into from
Jul 1, 2024

Conversation

rene
Copy link
Contributor

@rene rene commented May 23, 2024

This PR aims to help batch upgrades of eve-alpine and/or other container tags, like the one provided by PR #3928

Introduction of get-deps

This PR introduces introduces the script get-deps tool, which scans the pkg directory and process each Dockerfile (or Dockerfile.in) file in order to find all build dependencies (containers) of each package. It can also check package dependencies for rootfs images. The final package dependency tree can be generated as a PNG image using dot tool or as a Makefile that can be included to the main Makefile.

The usage of the script is the following:

./tools/get-deps/get-deps [-r] <-i|-m> <output_file>

  -i string
        Generate a PNG image file
  -m string
        Generate a Makefile auxiliary file
  -r    Also generates dependencies for rootfs image

Examples:

    ./tools/get-deps/get-deps -i deptree.png
    ./tools/get-deps/get-deps -r -i deptree.png
    ./tools/get-deps/get-deps -m pkg-deps.mk

The generated dependency tree can be used to easily find conflicts while doing batch upgrades of eve-alpine and/or other container tags. The following image is an example of the output (for master branch):

deps

Scan package dependencies

Container packages might import other containers (build/run dependencies, etc) using FROM command inside their Dockerfile. However, these dependencies are not exposed to the main Makefile. Using the script tools/get-deps.sh the dependency graph can be generated into a file that can be included in the main Makefile in order to take these package dependencies into account. The script is also able to scan rootfs image container dependencies.

This commit changes the main Makefile to call get-deps tool in order to generate a file called pkg-deps.mk with the package dependency included, so we can ensure all dependencies from a package will be built (when changed) in the correct order to avoid build failures, as can be experienced specially when doing batch updates of the eve-alpine container
to multiple packages.

Unfortunately, rootfs images usually have a lot of package dependencies, leading to a lot of docker build calls when all dependencies are checked, which can increase substantially the build time. In order to void that, the main Makefile will scan rootfs image dependencies only when the ROOTFS_DEPS=y is specified, for example:

    make ROOTFS_DEPS=y rootfs

    make ROOTFS_DEPS=y pkgs

In summary:

    - ROOTFS_DEPS not defined: Only package dependencies are scanned. For
      example, if pkg/pillar depends on pkg/uefi and pkg/dnsmasq to be
      built and one or both of these packages are changed, then when doing
      make pkg/pillar, make will ensure the dependencies will be built
      before pkg/pillar.

    - ROOTFS_DEPS defined: The dependencies of all packages and the
      rootfs images will be scanned, which means that any dependency
      package that was changed will be built before the main target (pkgs,
      rootfs, live, etc).

@rene rene requested a review from deitch as a code owner May 23, 2024 11:59
@github-actions github-actions bot requested review from eriknordmark and jsfakian May 23, 2024 11:59
@rene rene requested a review from christoph-zededa May 23, 2024 12:00
@rene rene force-pushed the dep-script branch 2 times, most recently from 7dd83d8 to 2ce5893 Compare May 23, 2024 12:06
@rene rene marked this pull request as draft May 24, 2024 07:37
@rene
Copy link
Contributor Author

rene commented May 24, 2024

Converted to draft after our EVE's TSC call yesterday... I had a discussion with @christoph-zededa and @rucoder , we did some tests and it's possible to provide the dependency tree to the Makefile, so I will improve the script to generate a file to be included in the main Makefile, it should be possible to achieve full dependency check, so we will not need to build any package manually after changes....

@rene rene changed the title tools: Introduces get_deps.sh script Introduces new mechanism to fully scan package dependencies May 24, 2024
@rene rene marked this pull request as ready for review May 24, 2024 12:01
@rene
Copy link
Contributor Author

rene commented May 24, 2024

The PR is now ready for review.... your comments are really appreciated 😃

Makefile Outdated Show resolved Hide resolved
tools/get_deps.sh Outdated Show resolved Hide resolved
@rene
Copy link
Contributor Author

rene commented May 27, 2024

Updates in this PR:

  • Rebased on top of master
  • Addressed @christoph-zededa's comment
  • Changed the name of the script from get_deps.sh to get-deps.sh, since pretty much all scripts from tools/ follow this pattern

Copy link
Contributor

@rouming rouming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, we missed proper dependencies detection!

@rene
Copy link
Contributor Author

rene commented May 27, 2024

The parser implemented through shell script (in get-deps.sh) has some limitations:

  • It cannot expand variables inside the Dockerfile
  • It cannot exclude "FROM lfedge/..." occurrences inside strings (although this is extremely unlikely to happen)
  • It doesn't support parameters from FROM command, like --platform

@christoph-zededa collaborated providing some libraries and sample code to parse Dockerfile and YML files accordingly in GO, so I'll convert get-deps.sh to a GO application in order to eliminate such limitations. Converting this PR again to DRAFT while the application is not ready. Notice that the PR is still 100% functional, it can be used locally.

@rene rene marked this pull request as draft May 27, 2024 19:51
@eriknordmark
Copy link
Contributor

@christoph-zededa collaborated providing some libraries and sample code to parse Dockerfile and YML files accordingly in GO, so I'll convert get-deps.sh to a GO application in order to eliminate such limitations. Converting this PR again to DRAFT while the application is not ready. Notice that the PR is still 100% functional, it can be used locally.

Check with @deitch as well - I think he found a dockerfile parser as part of the SBoM work a while back.

@deitch
Copy link
Contributor

deitch commented May 28, 2024

We have been doing a lot of this for sbom purposes for a while.

First, we use the buildkit sbom creator (which is based on syft) to scan the entire OCI image for each package every time it builds. We attach the sbom to the image itself using the sbom attestations mechanism used by buildkit (in-toto).

It is a requirement of ours that all dependencies are included in the SBoM.

Second, there are sometimes things that are added to an image that are not obviously scannable. For those, we use dockerfile-add-scanner which is part of lf-edge/eve-build-tools to scan Dockerfiles for ADD statements. That is used primarily for the kernel. Everything else in eve is fairly reliably caught by syft.

I guess the first question I would ask is, can our requirements be met by using what already is in place?

If the answer is no, can we use or extend the tools to get what we need?

If not, and we need to add a new tool, we should put it in lf-edge/eve-build-tools, as that is the common location for those things.

Finally, if none of the above works, and you have to actually scan Dockerfiles, use the buildkit libraries to do so. Dockerfiles are far more complex than you realize, and the ruleset to parsing it can be rough, to put it mildly. dockerfile-add-scanner already parses the dockerfile, so you will have a much easier time just expanding the capabilities of it.

In any case, if we can get what we need out of the already generated sbom, your life will be immensely easier. If the sbom is almost there, it is worth seeing if a small change can bring it there. Else go for the dockerfile scanner, or leverage it.

@rene rene force-pushed the dep-script branch 5 times, most recently from 9d5238a to ca4b485 Compare May 29, 2024 13:25
@christoph-zededa
Copy link
Contributor

We have been doing a lot of this for sbom purposes for a while.

First, we use the buildkit sbom creator (which is based on syft) to scan the entire OCI image for each package every time it builds. We attach the sbom to the image itself using the sbom attestations mechanism used by buildkit (in-toto).

You mean like this: rene@715e60b ?

Second, there are sometimes things that are added to an image that are not obviously scannable. For those, we use dockerfile-add-scanner which is part of lf-edge/eve-build-tools to scan Dockerfiles for ADD statements. That is used primarily for the kernel. Everything else in eve is fairly reliably caught by syft.

I guess the first question I would ask is, can our requirements be met by using what already is in place?

I don't think so.

If the answer is no, can we use or extend the tools to get what we need?

The code in dockerfile-add-scanner looks quite different to this code here, so I don't think it would fit.
IMHO ideally linuxkit should care about the dependencies, but I think that is beyond this PR.

If not, and we need to add a new tool, we should put it in lf-edge/eve-build-tools, as that is the common location for those things.

But those are: "Library and Tools to interact with Edge Virtualization Engine(EVE)" and this is not a library or tool to interact with EVE but a tool to interact with building EVE - the same as other tools that are in eve/tools, e.g. https://github.com/lf-edge/eve/tree/master/tools/compare-sbom-sources

Finally, if none of the above works, and you have to actually scan Dockerfiles, use the buildkit libraries to do so. Dockerfiles are far more complex than you realize, and the ruleset to parsing it can be rough, to put it mildly. dockerfile-add-scanner already parses the dockerfile, so you will have a much easier time just expanding the capabilities of it.

You mean like this: rene@715e60b ?

@deitch
Copy link
Contributor

deitch commented Jun 3, 2024

I guess the first question I would ask is, can our requirements be met by using what already is in place?

I don't think so.

Well, there's the answer, then.

The code in dockerfile-add-scanner looks quite different to this code here, so I don't think it would fit.

The one thing it does well is load up a dockerfile and parse it. It uses the buildkit libs, like you linked to above, specifically these lines to scan the file. Those lines particularly look for URLs, but the basics for parsing a file are there. Remember that you need to track for ARG and ENV as set, and interpolated vars inside the file.

IMHO ideally linuxkit should care about the dependencies, but I think that is beyond this PR.

I don't understand. Can you describe further?

You mean like this: rene@715e60b

Yes, with all of the above (arg, env, etc.).

@christoph-zededa
Copy link
Contributor

IMHO ideally linuxkit should care about the dependencies, but I think that is beyond this PR.

I don't understand. Can you describe further?

When I looked up how to actually parse those Dockerfile I found https://github.com/earthly/earthly - it is combining Makefiles and Dockerfiles into one.

But actually I thought instead of calling build-tools/bin/linuxkit pkg build for every package, one should call it for all packages at once and then linuxkit creates the dependency graph and builds one after the other.

You mean like this: rene@715e60b

Yes, with all of the above (arg, env, etc.).

Oh, I thought it would do this already; now I added https://github.com/rene/eve/pull/32/files#diff-189b815f942d48c913748f26946e4c9b28e003c0191cb81ac8452cb78ad1a72fR114 .

@deitch
Copy link
Contributor

deitch commented Jun 4, 2024

When I looked up how to actually parse those Dockerfile I found https://github.com/earthly/earthly - it is combining Makefiles and Dockerfiles into one

Oh that is interesting. I need to spend some time taking a look at it. I always am both excited and hesitant about new build systems. Excited, because they may solve issues; hesitant, because so many come and go, when they just end up reproducing existing problems with a new syntax. In the end, build dependency graphs are a hard problem.

But actually I thought instead of calling build-tools/bin/linuxkit pkg build for every package, one should call it for all packages at once and then linuxkit creates the dependency graph and builds one after the other.

Hmm, there is an interesting conversation to be had here. I don't think it belongs on PR comments, but you raising some really interesting higher-level ideas. I can lay the groundwork at least.

The only reason we build these packages as OCI/docker images, is because we consume them as such in our final build image (linuxkit build). We do that for 2 main reasons:

  • OCI images are an excellent, and fast becoming standard, packaging format, with digests and reliable reproducibility.
  • Many of them actually are run as containers (anything in onboot or services or onshutdown)

Given that, the reason we use lkt pkg build and not something else, is for a few reasons:

  1. we are building these packages user docker in a very opinionated fashion, i.e. specific docker options, specific tags/labels, specific runner patterns (we figure out if you have arch-specific places to run the build), etc.
  2. docker image storage currently does not support multi-arch indexes, nor storing the original layers (it expands them all). That support is coming down the pipe, currently experimental.

Ideally, pkg build would some day be replaced with docker build, if docker had some "config file" (other than Dockerfile) that allowed setting these options.

In terms of the dependency graph, what do you refer to? Mostly, our pkg/ do not depend on other pkg/. These all are standalone (with a few exceptions). What did you mean by the graph?

@christoph-zededa
Copy link
Contributor

In terms of the dependency graph, what do you refer to? Mostly, our pkg/ do not depend on other pkg/. These all are standalone (with a few exceptions). What did you mean by the graph?

I mean this graph: #3934 (comment)

@deitch
Copy link
Contributor

deitch commented Jun 4, 2024

Oh, OK I understand.

linuxkit itself has no notion of the dependencies graph, at least in principle. It just calls buildkit build on a passed directory pkg/. It is docker that then resolves the Dockerfile, and does everything.

linuxkit does have some knowledge. Since it manages its own cache, if a Dockerfile references foo/bar:1234, and that is in the linuxkit cache, it needs to find it and pass it. So I guess in theory it could build a dependency graph. But it really isn't part of its functionality. It just is told, "build this directory" and then "build this directory". What you are describing is a dependency tree among multiple dockerfiles, i.e. linking multiple package builds. I think it is cool, but it is not what docker (or buildkit) does; it is a collation level above it.

@rene rene force-pushed the dep-script branch 2 times, most recently from 0305ab9 to 6b820c1 Compare June 20, 2024 13:47
Copy link
Contributor

@christoph-zededa christoph-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rene rene marked this pull request as ready for review June 26, 2024 10:45
@rene
Copy link
Contributor Author

rene commented Jun 26, 2024

Updates in this PR:

  • Rebased on top of master
  • Addressed reviewers comments
  • Added @christoph-zededa as co-author since he provided the full Dockerfile parsers, test code and improvements as well
  • Marked as ready for review

rene and others added 2 commits June 27, 2024 11:22
This commit introduces the tool get-deps (under tools/get-deps/), which
scans the pkg directory and process each Dockerfile (or Dockerfile.in) file
in order to find all build dependencies (containers) of each package. It
can also check package dependencies for rootfs images. The final package
dependency tree can be generated as a PNG image (using dot tool) or as a
Makefile that can be included to the main Makefile.

The usage of the tool the following:

    ./tools/get-deps/get-deps [-r] <-i|-m> <output_file>

  -i string
    	Generate a PNG image file
  -m string
    	Generate a Makefile auxiliary file
  -r	Also generates dependencies for rootfs image

Examples:

    ./tools/get-deps/get-deps -i deptree.png
    ./tools/get-deps/get-deps -r -i deptree.png
    ./tools/get-deps/get-deps -m pkg-deps.mk

Co-authored-by: Christoph Ostarek <christoph@zededa.com>
Signed-off-by: Renê de Souza Pinto <rene@renesp.com.br>
Container packages might import other containers (build/run dependencies,
etc) using FROM command inside their Dockerfile. However, these
dependencies are not exposed to the main Makefile. Using the tool get-deps
(tools/get-deps/get-deps) the dependency graph can be generated into a file
that can be included in the main Makefile in order to take these package
dependencies into account. The script is also able to scan rootfs image
container dependencies.

This commit changes the main Makefile to call get-deps tool in order to
generate a file called pkg-deps.mk with the package dependency included, so
we can ensure all dependencies from a package will be built (when changed)
in the correct order to avoid build failures, as can be experienced
specially when doing batch updates of the eve-alpine container to multiple
packages.

Unfortunately, rootfs images usually have a lot of package dependencies,
leading to a lot of docker build calls when all dependencies are checked,
which can increase substantially the build time. In order to void that, the
main Makefile will scan rootfs image dependencies only when the
ROOTFS_DEPS=y is specified, for example:

    make ROOTFS_DEPS=y rootfs

    make ROOTFS_DEPS=y pkgs

In summary:

    - ROOTFS_DEPS not defined: Only package dependencies are scanned. For
      example, if pkg/pillar depends on pkg/uefi and pkg/dnsmasq to be
      built and one or both of these packages are changed, then when doing
      make pkg/pillar, make will ensure the dependencies will be built
      before pkg/pillar.

    - ROOTFS_DEPS defined: The dependencies of all packages and the
      rootfs images will be scanned, which means that any dependency
      package that was changed will be built before the main target (pkgs,
      rootfs, live, etc).

Signed-off-by: Renê de Souza Pinto <rene@renesp.com.br>
@rene
Copy link
Contributor Author

rene commented Jun 27, 2024

Updates in this PR:

  • Rebased on top of master in order to get the fixes for Eden tests

@eriknordmark eriknordmark merged commit 5894895 into lf-edge:master Jul 1, 2024
31 checks passed
@rene rene deleted the dep-script branch August 2, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants