Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ANE-1967] Recursive jars in containers #1478

Merged
merged 13 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- Microsoft SQL Server 2019 Developer, 2019 Evaluation, and 2019 Express
- Microsoft SQL Server 2022 Enterprise, Standard, Web
- Viskoe.dk Terms of Use
- Container scanning: Recursively find jars within jars ([#1478](https://github.com/fossas/fossa-cli/pull/1478))
spatten marked this conversation as resolved.
Show resolved Hide resolved

## 3.9.37

Expand Down
54 changes: 28 additions & 26 deletions docs/references/subcommands/container/scanner.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# FOSSA's container scanner

- [FOSSA's container scanner](#fossas-new-container-scanner)
- [What's new in this scanner?](#whats-new-in-this-scanner)
- [FOSSA's container scanner](#fossas-container-scanner)
- [What's supported in FOSSA's container scanner?](#whats-supported-in-fossas-container-scanner)
- [Documentation](#documentation)
- [Container image source](#container-image-source)
- [1) Exported docker archive](#1-exported-docker-archive)
- [2) From Docker Engine](#2-from-docker-engine)
- [3) From registries](#3-from-registries)
- [Container image analysis](#container-image-analysis)
- [Container Jar analysis](#container-jar-analysis)
- [Container JAR analysis](#container-jar-analysis)
- [Distroless Containers](#distroless-containers)
- [Supported Container Package Managers](#supported-container-package-managers)
- [View detected projects](#view-detected-projects)
Expand All @@ -19,7 +19,7 @@
- [How do I scan multi-platform container images with `fossa-cli`?](#how-do-i-scan-multi-platform-container-images-with-fossa-cli)
- [How can I only scan for system dependencies (alpine, dpkg, rpm)?](#how-can-i-only-scan-for-system-dependencies-alpine-dpkg-rpm)
- [How do I exclude specific projects from container scanning?](#how-do-i-exclude-specific-projects-from-container-scanning)
- [Limitations & Workarounds](#limitations--workarounds)
- [Limitations \& Workarounds](#limitations--workarounds)

## What's supported in FOSSA's container scanner?

Expand Down Expand Up @@ -50,9 +50,9 @@ To scan a container image with `fossa-cli`, use the `container analyze` command:
# This command uses the repository name as project name, and image digest as the revision.
# Like standard FOSSA analysis, the project name is customizable via `--project` and revision via `--revision`:
#
# >> fossa container analyze <IMAGE> --project <PROJECT-NAME> --revision <REVISION-VALUE>
# >> fossa container analyze <IMAGE> --project <PROJECT-NAME> --revision <REVISION-VALUE>
#
fossa container analyze <IMAGE>
fossa container analyze <IMAGE>

# Similar to the above, but instead of uploading the results they are instead written to the terminal in JSON format.
#
Expand Down Expand Up @@ -89,13 +89,13 @@ By default `fossa-cli` attempts to identify `<IMAGE>` source in the following or

```bash
docker save redis:alpine > redis_alpine.tar
fossa container analyze redis_alpine.tar
fossa container analyze redis_alpine.tar
```

### 2) From Docker Engine

```bash
fossa container analyze redis:alpine
fossa container analyze redis:alpine
```

For this image source to work, `fossa-cli` requires docker to be running and accessible.
Expand All @@ -118,7 +118,7 @@ curl --unix-socket /var/run/docker.sock -X GET "http://localhost/v1.28/images/re
### 3) From registries

```bash
fossa container analyze ghcr.io/fossas/haskell-dev-tools:9.0.2
fossa container analyze ghcr.io/fossas/haskell-dev-tools:9.0.2
```

This step works even if you do not have docker installed or have docker engine accessible.
Expand All @@ -138,17 +138,17 @@ If `<IMAGE>` is not a docker image archive and is not accessible via the docker
| `quay.io/org/image:tag` | `quay.io` | `org/image` | `tag` |

Note:
- When the domain is not present, `fossa-cli` defaults to the registry `index.docker.io`.
- When digest or tag is not present, `fossa-cli` defaults to the tag `latest`.
- When the registry is `index.docker.io`, and repository does not contain the literal `/`, `fossa-cli` infers that this is official image stored under `library/<image>`.
- When a multi-platform image is provided (e.g. `ghcr.io/graalvm/graalvm-ce:ol7-java11-21.3.3`), `fossa-cli` defaults to selecting image artifacts for current runtime platform.
- When the domain is not present, `fossa-cli` defaults to the registry `index.docker.io`.
- When digest or tag is not present, `fossa-cli` defaults to the tag `latest`.
- When the registry is `index.docker.io`, and repository does not contain the literal `/`, `fossa-cli` infers that this is official image stored under `library/<image>`.
- When a multi-platform image is provided (e.g. `ghcr.io/graalvm/graalvm-ce:ol7-java11-21.3.3`), `fossa-cli` defaults to selecting image artifacts for current runtime platform.

Analyzing the container image for a platform other than the one currently running is possible by specifying the digest for the image on a different platform.

For example, the following command analyzes the `arm64` platform image of `ghcr.io/graalvm/graalvm-ce@sha256` regardless of the platform running `fossa container analyze`:

```bash
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
```

**Private registries**
Expand All @@ -171,18 +171,18 @@ This is done in following steps:
}
```

If any of the steps above fail, `fossa-cli` defaults to connecting without user credentials.
If any of the steps above fail, `fossa-cli` defaults to connecting without user credentials.

To explicitly provide a username and password, use HTTP-style authentication in the image URL.
For this to work the host value must be present in the image URL:

```bash
fossa container analyze user:secret@quay.io/org/image:tag
fossa container analyze user:secret@quay.io/org/image:tag
```

**Retrieving image from registry**

`fossa-cli` uses `/v2/` registry api (per OCI distribution spec) for retrieving
`fossa-cli` uses `/v2/` registry api (per OCI distribution spec) for retrieving
image manifests, and image artifacts from registry. It does so in following manner:

1) `HEAD <repository>/manifests/<tag-or-digest>` (to see if the manifests exists)
Expand All @@ -194,20 +194,22 @@ image manifests, and image artifacts from registry. It does so in following mann
4) Download all blobs using `GET /v2/<repository>/blobs/<digest>` (if blobs are tar.gzip, they will be gzip extracted)
5) From artifacts downloaded representative image tarball will be created.

All `GET` request from step 2 to step 5, will make `HEAD` call prior to confirm existence of resource. If
All `GET` request from step 2 to step 5, will make a `HEAD` call prior to confirm existence of resource. If
401 status is received new access token will be generated using auth flow mentioned in step (1).

## Container image analysis

The container scanner scans in two steps:
1. The base layer.
2. The rest of the layers, squashed.
2. The rest of the layers, squashed.

### Container JAR analysis

The container analyzer will try to find Java Archive (Jar) files inside each layer.
It will then report them to FOSSA which will try to match the Jar files to the project they are a build artifact from.

The container analyzer will also expand each Jar file that it encounters and report any Jar files that it finds in the expanded Jar file. This is done recursively.
Copy link
Contributor Author

@spatten spatten Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only non-automated change in this file


This process relies on there being a back-end that can perform that analysis.
SaaS customers should have this functionality available but on-prem customers may need to contact FOSSA support to have it enabled.

Expand Down Expand Up @@ -264,7 +266,7 @@ and if desired can inform [analysis target configuration](../../files/fossa-yml.

Example output:
```bash
; fossa container list-targets ghcr.io/tcort/markdown-link-check:stable
; fossa container list-targets ghcr.io/tcort/markdown-link-check:stable

[ INFO] Discovered image for: ghcr.io/tcort/markdown-link-check:stable (of 137610196 bytes) via docker engine api.
[ INFO] Exporting docker image to temp file: /private/var/folders/hb/pg5d0r196kq1qdswr6_79hzh0000gn/T/fossa-docker-engine-tmp-f7af2b5d1ec5173d/image.tar! This may take a while!
Expand Down Expand Up @@ -296,7 +298,7 @@ exclude:

### Debugging

`fossa-cli` supports the `--debug` flag and debug bundle generation with the container scanner.
`fossa-cli` supports the `--debug` flag and debug bundle generation with the container scanner.

```bash
fossa container analyze redis:alpine --debug
Expand All @@ -315,7 +317,7 @@ Images can be exported to archives using Docker:
docker pull <IMAGE>:<TAG> # or docker pull <IMAGE>@<DIGEST>
docker save <IMAGE>:<TAG> > image.tar

fossa container analyze image.tar --container scanner
fossa container analyze image.tar --container scanner

rm image.tar
```
Expand All @@ -328,7 +330,7 @@ By default when `fossa-cli` is analyzing multi-platform image it prefers using t
If a specific platform is desired, use the digest for that platform:

```bash
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
```

### How can I only scan for system dependencies (alpine, dpkg, rpm)?
Expand All @@ -342,7 +344,7 @@ fossa container analyze <IMAGE> --only-system-deps
### How do I exclude specific projects from container scanning?

Use a FOSSA configuration file to perform exclusion of projects or paths.
Refer to the [configuration file](./../../files/fossa-yml.md) documentation for more details.
Refer to the [configuration file](./../../files/fossa-yml.md) documentation for more details.

As an example, the following configuration file only analyzes `setuptools`, and `alpine` packages:

Expand Down Expand Up @@ -371,7 +373,7 @@ The recommended workaround is to export the image to an archive, then analyze th
docker pull quay.io/org/image:tag
docker save quay.io/org/image:tag > img.tar

fossa container analyze img.tar
fossa container analyze img.tar
rm img.tar
```

Expand Down
1 change: 1 addition & 0 deletions extlib/millhone/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ tracing-subscriber = { version = "0.3.17", features = ["json"] }
lazy-regex = { version = "3.0.2", features = ["std", "regex"] }
fingerprint = { git = "https://github.com/fossas/lib-fingerprint.git", tag = "v3.0.0", default-features = false, features = ["fp-content-serialize-base64"] }
tar = "0.4.41"
zip = "2.1.3"

[dev-dependencies]
maplit = "1.0.2"
Expand Down
156 changes: 153 additions & 3 deletions extlib/millhone/src/cmd/analyze_container.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use std::{
collections::{HashMap, HashSet},
fs::File,
io::{BufWriter, Read},
path::PathBuf,
path::{Path, PathBuf},
};

use clap::Parser;
Expand Down Expand Up @@ -125,10 +125,15 @@ fn jars_in_layer(entry: Entry<'_, impl Read>) -> Result<Vec<DiscoveredJar>> {
debug!("fingerprinting");
let entry = buffer(entry).context("read jar file")?;

match Combined::from_buffer(entry) {
Ok(fingerprints) => discoveries.push(DiscoveredJar::new(path, fingerprints)),
match Combined::from_buffer(entry.clone()) {
Ok(fingerprints) => {
discoveries.push(DiscoveredJar::new(path.clone(), fingerprints))
}
Err(e) => warn!("failed to fingerprint: {e:?}"),
}
let mut discovered_in_jars =
recursive_jars_in_jars(&entry, path, 0).context("recursively discover jars")?;
discoveries.append(&mut discovered_in_jars);

Ok(())
})?;
Expand All @@ -137,6 +142,56 @@ fn jars_in_layer(entry: Entry<'_, impl Read>) -> Result<Vec<DiscoveredJar>> {
Ok(discoveries)
}

const MAX_JAR_DEPTH: u32 = 100;

#[tracing::instrument(skip(jar_contents))]
fn recursive_jars_in_jars(
jar_contents: &[u8],
containing_jar_path: PathBuf,
depth: u32,
) -> Result<Vec<DiscoveredJar>> {
if depth > MAX_JAR_DEPTH {
return Ok(vec![]);
}
let mut discoveries = Vec::new();
let mut archive =
zip::ZipArchive::new(std::io::Cursor::new(jar_contents)).context("unzipping jar")?;
for path in archive.clone().file_names() {
debug!("file_name: {path}");
if !path.ends_with(".jar") {
continue;
}

debug!(?path, "jar file found");
let mut zip_file = archive
.by_name(path)
.context("getting zip file info by path")?;
if !zip_file.is_file() {
debug!(?path, "skipped: not a file");
continue;
}
let mut buffer: Vec<u8> = Vec::new();
zip_file
.read_to_end(&mut buffer)
.context("reading jar from zip into buffer")?;
let joined_path = Path::new(&containing_jar_path).join(path);

// fingerprint the jar
match Combined::from_buffer(buffer.clone()) {
Ok(fingerprints) => {
discoveries.push(DiscoveredJar::new(joined_path.clone(), fingerprints))
}
Err(e) => warn!("failed to fingerprint: {e:?}"),
}

// recursively find more jars
let mut discovered_in_jars = recursive_jars_in_jars(&buffer, joined_path, depth + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit, Optional] It may be more efficient to turn recursive_jars_in_jars into something that returns vectors of vectors and then collects them into the result vector at the end. I'm not sure if there's some better way to do this in Rust.

I think it's probably not a huge deal here since I don't expect the recursion to be very deep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to leave it as is, as I don't think we'll ever see recursion beyond three or so levels deep. Good point, though. That probably would be more performant.

.context("recursively discover jars")?;
discoveries.append(&mut discovered_in_jars);
}
Ok(discoveries)
}

#[tracing::instrument]
fn list_container_layers(layer_path: &PathBuf) -> Result<HashSet<PathBuf>> {
let mut layers = HashSet::new();
Expand Down Expand Up @@ -250,4 +305,99 @@ mod tests {
let expected: Value = serde_json::from_str(MILLHONE_OUT).expect("Parse expected json");
pretty_assertions::assert_eq!(expected, res);
}

// This container contains top.jar which contains middle.jar, which contains deepest.jar
// It also includes middle.jar and deepest.jar
// So we should find 6 total jars: three from top.jar and its nested jars, two from middle.jar and its nested jar and then deepest.jar
// We are also testing that the fingerprints from the nested jars are equal to the fingerprints when they are at top-level
// See test/App/Fossa/Container/testdata/nested-jar/README.md for info on how nested_jars.tar was made
#[test]
fn it_finds_nested_jars() {
let nested_jars_millhone_out: String = format!(
r#"
{{
"discovered_jars": {{
"blobs/sha256/3af1c7e331a4b6791c25101e0c862125a597d8d75d786aead62de19f78a5a992": [
{{
"kind": "v1.discover.binary.jar",
"path": "jars/deepest.jar",
"fingerprints": {{
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM="
}}
}}
],
"blobs/sha256/5ee98bff2cf0e70d115677fc37f734d26848435eef5fe52e905229ff7a7d87fb": [
{{
"kind": "v1.discover.binary.jar",
"path": "jars/middle.jar",
"fingerprints": {{
"sha_256": "nKFXVngFtkHIv4FC/rr5o4k+v/KSKzWJ0B9uBuRb+4k=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"v1.mavencentral.jar": "2XA3GFJJkvvpEbAM9nLnAypojEo=",
"v1.raw.jar": "36i3JNvrLMWCMfjB2c9bjQt4Vhmvfq29cb+Hqrb6XeI="
}}
}},
{{
"kind": "v1.discover.binary.jar",
"path": "jars/middle.jar{separator}deepest.jar",
"fingerprints": {{
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
}}
}}
],
"blobs/sha256/6979b741102e5c5c787f94ad8bfdebeee561b1b89f21139d38489e1b3d6f9096": [],
"blobs/sha256/931c525b52485e01ab5e2926a4b3c884f1c7325782dca13bd11e345f46cc34c3": [],
"blobs/sha256/10bb0e91eb016af401369ecaadccfea9f4768776e54d46ad4e9a0309c82f1d7f": [
{{
"kind": "v1.discover.binary.jar",
"path": "jars/top.jar",
"fingerprints": {{
"v1.raw.jar": "TNW7ezd3fqw3MULVTrexg68Q1x2PTDGk2DkltAqUefk=",
"v1.mavencentral.jar": "TtwsgEXwLd/8UFTohsFhJqYMJ74=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"sha_256": "l9XTA5PwWJhnFlz9t0SWKvr2cHDmcytIVvPsr6vqFis="
}}
}},
{{
"kind": "v1.discover.binary.jar",
"path": "jars/top.jar{separator}middle.jar",
"fingerprints": {{
"v1.mavencentral.jar": "2XA3GFJJkvvpEbAM9nLnAypojEo=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"v1.raw.jar": "36i3JNvrLMWCMfjB2c9bjQt4Vhmvfq29cb+Hqrb6XeI=",
"sha_256": "nKFXVngFtkHIv4FC/rr5o4k+v/KSKzWJ0B9uBuRb+4k="
}}
}},
{{
"kind": "v1.discover.binary.jar",
"path": "jars/top.jar{separator}middle.jar{separator}deepest.jar",
"fingerprints": {{
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM=",
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
}}
}}
]
}}
}}
"#,
separator = std::path::MAIN_SEPARATOR_STR.replace("\\", "\\\\")
);
let image_tar_file =
PathBuf::from("../../test/App/Fossa/Container/testdata/nested_jars.tar");
let res = jars_in_container(&image_tar_file)
.expect("Read jars out of container image.")
.pipe(serde_json::to_value)
.expect("encode as json");
let expected: Value =
serde_json::from_str(&nested_jars_millhone_out).expect("Parse expected json");
pretty_assertions::assert_eq!(expected, res);
}
}
Loading
Loading