Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure accurate java main artifact name retrieval for multi-JARs and refine fallback approach #3054

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

dor-hayun
Copy link
Contributor

@dor-hayun dor-hayun commented Jul 22, 2024

Improving Accuracy of Package Name Retrieval in Java Archives

This section outlines enhancements to accurately retrieve the main package artifact names and their corresponding SHA1 values. It focuses on resolving issues where incorrect package names were being retrieved, especially for JAR files with multiple internal JARs. The improvements include:

  • Correctly retrieving the main package name when the main POM file is present.
  • Addressing issues with incorrect package names for specific JAR files, such as those containing multiple JARs.
  • Ensuring accurate results by checking prefixes and suffixes, with a fallback mechanism if no exact matches are found.

@dor-hayun dor-hayun changed the title fix: update 'guessMainPackageNameAndVersionFromPomInfo' and 'artifactIDMatchesFilename' Ensure accurate main artifact name retrieval for multi-JARs and refine fallback approach Jul 22, 2024
@dor-hayun dor-hayun changed the title Ensure accurate main artifact name retrieval for multi-JARs and refine fallback approach Ensure accurate java main artifact name retrieval for multi-JARs and refine fallback approach Jul 22, 2024
@spiffcs
Copy link
Contributor

spiffcs commented Jul 30, 2024

Thanks for the contribution @dor-hayun!

I think for us to accept this we need to check on some of the downstream implications here and see if it changes any of our fixtures in vulnerability testing for the better.

Also, let me see about adding a test or two that can show the kind of behavior this prevents. Do you have an current cases you're running into that would serve as a good example?

@dor-hayun
Copy link
Contributor Author

HI @spiffcs ,
Please try to run syft on the following public image:
public.ecr.aws/docker/library/gradle@sha256:70da12adf27e83bcc125af9d2bc6f9432590e89c96609625aa688135b27e75fb

and then you can check what happens for 'jansi' package, it is the main artifact, you can see that there are 10 pom properties found inside the jar and my fix is to ensure Correctly retrieving the main package name when the main POM file is present.

image

@kzantow
Copy link
Contributor

kzantow commented Jul 31, 2024

Hey, @dor-hayun. It looks like I made a change that conflicts with this one a bit. Would you want to rebase this PR? Or I could push an update, if you don't mind.

@dor-hayun
Copy link
Contributor Author

@kzantow i'm rebasing it

…IDMatchesFilename'

- Correct retrieval of package name when main POM file exists
- Address issue where wrong package name was retrieved for certain jars
- Example case: 'jansi' jar containing multiple jars like 'jansi-win32'
- Ensure true is returned when filename matches the artifact ID, prevent random retrieval by checking prefix and suffix
- Use fallback check with suffix and prefix if no POM properties file matches the exact artifact name

Signed-off-by: dor-hayun <dor.hayun@mend.io>
@spiffcs spiffcs merged commit 48f1e97 into anchore:main Aug 1, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants