Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ANE-1967] Recursive jars in containers #1478

Merged
merged 13 commits into from
Nov 1, 2024

Conversation

spatten
Copy link
Contributor

@spatten spatten commented Oct 31, 2024

Overview

Delivers ANE-1967 - recursively detect jars.

This PR updates Millhone so that when it encounters a jar, it unzips the jar and looks for other jars in it. It does this recursively.

Acceptance criteria

When you run fossa container analyze, we should find jars that are inside of jars

Testing plan

First, try the test container that I created as part of this PR. This container contains top.jar, which contains middle.jar, which contains deepest.jar. It also has middle.jar and deepest.jar at the top level, so that we can confirm that the fingerprints of the nested jars match the fingerprints of the top-level jars.

make install-dev
fossa-dev container analyze test/App/Fossa/Container/testdata/nested_jars.tar --output | jq > /tmp/nested_jars.json

We should find 6 observations. Three for top.jar and its nested jars, two for middle.jar and its nested jar and one for deepest.jar.

Confirm that the fingerprints for the nested jars are the same as the fingerprints for when they are at the top-level.

Then, run with the example from the ticket.

fossa-dev container analyze hazelcast/management-center:5.3.1 --output | jq > /tmp/hazelcast.json

Note that there are a ton of observations, where previously there was only one.

Also take a look at how the paths to the nested jars are constructed. For example, this one:

opt/hazelcast/management-center/hazelcast-management-center-5.3.1.jar/BOOT-INF/lib/jakarta.annotation-api-1.3.5.jar

That's showing that we're finding a jar in opt/hazelcast/management-center/hazelcast-management-center-5.3.1.jar, and when we unzip that we find another jar in /BOOT-INF/lib/jakarta.annotation-api-1.3.5.jar.

Risks

We were a little worried that this might surface a lot more jars, causing a lot of load on Sparkle.

To understand the risk of this, I downloaded 116 of the most popular jars (as listed by https://mvnrepository.com/popular) and unzipped them.

I only found one jar inside of those jars. org.projectlombok:lombok-1.18.34 contained a jar called mavenEcjBootstrapAgent.jar

https://repo1.maven.org/maven2/org/projectlombok/lombok/1.18.34/

So I think we're safe in that we're not going to get a ton of jars from jars on Maven central.

I could imagine that we'll get more from custom jars made by other teams and embedded in their codebase, but we should definitely be getting those jars no matter what.

Metrics

References

Checklist

  • I added tests for this PR's change (or explained in the PR description why tests don't make sense).
  • If this PR introduced a user-visible change, I added documentation into docs/.
    - [] If this PR added docs, I added links as appropriate to the user manual's ToC in docs/README.ms and gave consideration to how discoverable or not my documentation is.
  • If this change is externally visible, I updated Changelog.md. If this PR did not mark a release, I added my changes into an ## Unreleased section at the top.
    - [ ] If I made changes to .fossa.yml or fossa-deps.{json.yml}, I updated docs/references/files/*.schema.json AND I have updated example files used by fossa init command. You may also need to update these if you have added/removed new dependency type (e.g. pip) or analysis target type (e.g. poetry).
    - [ ] If I made changes to a subcommand's options, I updated docs/references/subcommands/<subcommand>.md.

@spatten spatten changed the title Ane 1967 recursive jars in containers [ANE-1967] Recursive jars in containers Oct 31, 2024
@spatten spatten force-pushed the ANE-1967-recursive-jars-in-containers branch from 6c4d50f to 2a55401 Compare October 31, 2024 22:24

### Container JAR analysis

The container analyzer will try to find Java Archive (Jar) files inside each layer.
It will then report them to FOSSA which will try to match the Jar files to the project they are a build artifact from.

The container analyzer will also expand each Jar file that it encounters and report any Jar files that it finds in the expanded Jar file. This is done recursively.
Copy link
Contributor Author

@spatten spatten Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only non-automated change in this file

@spatten spatten marked this pull request as ready for review November 1, 2024 00:03
@spatten spatten requested a review from a team as a code owner November 1, 2024 00:03
Changelog.md Show resolved Hide resolved
docs/references/subcommands/container/scanner.md Outdated Show resolved Hide resolved
extlib/millhone/src/cmd/analyze_container.rs Outdated Show resolved Hide resolved
}

// recursively find more jars
let mut discovered_in_jars = recursive_jars_in_jars(&buffer, joined_path, depth + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit, Optional] It may be more efficient to turn recursive_jars_in_jars into something that returns vectors of vectors and then collects them into the result vector at the end. I'm not sure if there's some better way to do this in Rust.

I think it's probably not a huge deal here since I don't expect the recursion to be very deep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to leave it as is, as I don't think we'll ever see recursion beyond three or so levels deep. Good point, though. That probably would be more performant.

@spatten spatten force-pushed the ANE-1967-recursive-jars-in-containers branch from 173ee5f to 6a2b41a Compare November 1, 2024 18:50
@spatten spatten enabled auto-merge (squash) November 1, 2024 18:51
@spatten spatten disabled auto-merge November 1, 2024 18:51
@spatten spatten enabled auto-merge (squash) November 1, 2024 18:51
@spatten spatten merged commit e9e8ade into master Nov 1, 2024
19 checks passed
@spatten spatten deleted the ANE-1967-recursive-jars-in-containers branch November 1, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants