-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Nix cataloger #1696
Add Nix cataloger #1696
Conversation
Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com>
Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com>
Add Basic Nix Cataloger
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
a32b8ad
to
063b95d
Compare
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
JSON schema diff: # $ diff schema/json/schema-7.0.1.json schema/json/schema-7.0.2.json
848a849,868
> "NixStoreMetadata": {
> "properties": {
> "hash": {
> "type": "string"
> },
> "output": {
> "type": "string"
> },
> "files": {
> "items": {
> "type": "string"
> },
> "type": "array"
> }
> },
> "type": "object",
> "required": [
> "hash"
> ]
> },
1010a1031,1033
> },
> {
> "$ref": "#/$defs/NixStoreMetadata" |
input: "/nix/store/h0cnbmfcn93xm5dg2x27ixhag1cwndga-glibc-2.34-210-bin", | ||
wantIdx: 50, | ||
wantEx: "2.34-210-bin", | ||
wantPr: "210-bin", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bin
is just the name of the output, it's not part of the version. There can be multiple outputs with the same "version number", but different output names. I'd probably just drop the optional -{bin,lib,dev,doc,devdoc} (these are the most common ones) output name from the version number - or is there a notion of multiple package outputs in syft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is glibc-2.34
really the "package" and the single package has multiple outputs? If that's the case, there is more work to be done -- the cataloger should be merging these into a single package, and the metadata for each needs to account for expressing multiple outputs (and surrounding metadata).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bin
output for the glibc 2.34 build. It includes binaries produced by the build, like bin/getent
. The glibc .so file would be in a separate output (lib
or the default output, where the prefix is omitted).
It makes sense to distinguish between these outputs (so the info should not be merged), and it also makes sense to keep the output hash as an identifier.
We can have multiple glibc-2.34 in the same "closure" / filesystem (let's say one carrying a patch, the other not), and we can ship the bin
outputs or not, depending on whether we want getent
or not.
If there's a vulnerability, the output_path allows distinguishing between the patched version or unpatched version, and knowing which outputs you ship also helps determining whether you might be affected by a certain vulnerability and it's limited to the CLI parser, not the library.
I'm having a hard time calling this package, because it's nothing bundled up, and these are the output paths produced by a build. After évaluation, Nix uses "derivations" as build recipes. Each derivation is a single build, and it may produce multiple outputs (with output paths).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the best way forward is to collect these definitions into a single package, where the metadata would be able to track the multiple outputs:
{
"name": "glibc"
"version": "2.34"
...
"metadata": {
"outputs": [
{
"output": "man"
"outputhash": "cfmmhxh0c5nxnag1bixdg7d3g2c9wan2"
"files": [...]
}
{
"output": "bin"
"outputhash": "h0cnbmfcn93xm5dg2x27ixhag1cwndga"
"files": [...]
}
]
}
}
In this way there is still a single logical glibc@2.34
package and grype downstream would be able to look at the outputhash
for each output on the metadata.
If there's a vulnerability, the output_path allows distinguishing between the patched version or unpatched version
it sounds like the output_path
would be really important to try and capture on the metadata directly then? Just to double check... is this output path the path to the store (e.g. /nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo
)? Or is this referring to another value?
I'm having a hard time calling this package, because it's nothing bundled up, and these are the output paths produced by a build.
I can see what you're saying here (I think). I started with the assumption of what you might find on search.nixos.org ... say for glibc https://search.nixos.org/packages?channel=22.11&show=glibc&from=0&size=1&sort=relevance&type=packages&query=glibc
I feel this orients the results in terms of the logical name + version and outputs available. I've been equating the name + version as the idea of a singular logical package and the list of outputs installed as something that is attached metadata to that (single) package. I see what you're saying with nothing is being bundled up but syft doesn't necessarily require that packaging ecosystems do any bundling of builds/files at all.
Is this a kosher way to think about "packages" in the Nix ecosystem? Or said another way, could representing singular packages with possibly multiple outputs be inaccurate or misleading in downstream analysis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds like the output_path would be really important to try and capture on the metadata directly then? Just to double check... is this output path the path to the store (e.g. /nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo)? Or is this referring to another value?
Let me pick a better example:
If I take a current checkout of nixpkgs (4bb072f0a8b267613c127684e099a70e1f6ff106), there's multiple outputs of the openssl
package, described in the "build recipe" (.drv):
$ nix repl .
nix-repl> openssl
«derivation /nix/store/3jbv3scpmj2kxhshfz544jq37dzywbss-openssl-3.0.8.drv»
nix-repl> openssl.outputs
[ "bin" "dev" "out" "man" "doc" "debug" ]
nix-repl> :b openssl
This derivation produced the following outputs:
bin -> /nix/store/05qna7girdhnzx38jkhr4yyh7gmnizv3-openssl-3.0.8-bin
debug -> /nix/store/spa8fwjwd6r4n6d4gqg4s1ylijmkc0x7-openssl-3.0.8-debug
dev -> /nix/store/ggsmy3rwsr6q6ri45ck3jdqnz9cf6dm6-openssl-3.0.8-dev
doc -> /nix/store/bsabnghysgdmfhjlj2pal3i25q9w2brf-openssl-3.0.8-doc
man -> /nix/store/4ac38nfgh4whllggv4zwp14va4mfirr7-openssl-3.0.8-man
out -> /nix/store/s8vg2h8xzqmjd72f3g3p1jqy2lbbapc6-openssl-3.0.8
So, all these output paths are produced by the same build recipe (/nix/store/3jbv3scpmj2kxhshfz544jq37dzywbss-openssl-3.0.8.drv
).
- The
bin
output of that derivation has an output hash of05qna7girdhnzx38jkhr4yyh7gmnizv3
(nixbase32 encoding). - The
debug
output has an output hash ofspa8fwjwd6r4n6d4gqg4s1ylijmkc0x7
- The
dev
output has an output hash ofggsmy3rwsr6q6ri45ck3jdqnz9cf6dm6
- ...
If I now go back in time to a previous nixpkgs checkout (2 weeks before, 1df7332c722cc8d235b979cfa8f1bbe949b722fb), I still have the same openssl version in that case, but the build recipe is a different one (and due to this, all the hashes).
$ nix repl .
nix-repl> openssl
«derivation /nix/store/y43ka1zh6icrf515hc5qzb643wsscxmd-openssl-3.0.8.drv»
nix-repl> openssl.outputs
[ "bin" "dev" "out" "man" "doc" "debug" ]
nix-repl> :b openssl
This derivation produced the following outputs:
bin -> /nix/store/b1rd646j6yc22qnhiydb5a1hi5hc4dky-openssl-3.0.8-bin
debug -> /nix/store/8zw06pv5v5l7ggklyls77n5fw03vgdr7-openssl-3.0.8-debug
dev -> /nix/store/gs1841l5x53q3r5l196ld9kxq45pvf82-openssl-3.0.8-dev
doc -> /nix/store/42krirjplr0a4z3w2myhrawg5kbgmy40-openssl-3.0.8-doc
man -> /nix/store/rrz6hdzxsxg3yadk02gcypib36bq2zqm-openssl-3.0.8-man
out -> /nix/store/fgrj06y1x83fwh8hqgg02v9abc7a7b65-openssl-3.0.8
Now the bin
output hash is b1rd646j6yc22qnhiydb5a1hi5hc4dky
.
Both these packages can coexist in /nix/store.
In more complicated closures / "container contents", it's very well possible for, let's say, two slightly different flavors of libraries to be linked against two different binaries.
Key takeaways:
- just the "name" (openssl-3.0.8) doesn't uniquely identify a package, the outputhash is crucial
- The output hash uniquely identifies an output, as soon as the build recipe (or any of its dependencies) changes it'll differ
- Just by looking at the paths, without any additional database (that's what syft is doing) it's not possible to map / group certain output paths to the .drv / build recipe producing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example / input here -- much appreciated! I think the takeaways make sense given the fact that syft is guessing from a directory structure instead of package metadata information. In the future I feel it would be ideal to improve this in some way to make it more obvious which outputs are related... but I don't see a safe way to do this given the information available and the non-guarentees on the sqlite cache that's there. Thanks for helping to navigate these traps @flokli !
Ok, back to your question (I tooks us on a little bit of a tangent):
bin is just the name of the output, it's not part of the version. There can be multiple outputs with the same "version number", but different output names. I'd probably just drop the optional -{bin,lib,dev,doc,devdoc} (these are the most common ones) output name from the version number - or is there a notion of multiple package outputs in syft?
This function first extracts the version as if it were a semver, then later in processing (in parseNixStorePath
) the last non-numeric, dash-delimited string is assumed to be the nix output. That is, the package ultimately reflects your suggestion (to not include output names in the version).
There isn't a notion of a "package output" in syft, however, what it will look like to most users is that there are multiple instances of the package name-version
installed, and only when they look at the .metadata
or .purl
will they see the specific output this package represents.
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
@flokli Thanks a lot for the review 🙌 . There are a couple questions I had, the biggest one being this one #1696 (comment) . |
Co-authored-by: Florian Klink <flokli@flokli.de> Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
1246641
to
96617c4
Compare
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
As a bit of a side note, this thread was interesting and helpful in understanding more about the future of Nix in terms of capturing the right details in an SBOM and how that would relate to vulnerability analysis https://discourse.nixos.org/t/the-future-of-the-vulnerability-roundups/22424 . |
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
@flokli I think this is ready for merging, but I wanted to hold off and get your take first. Are there any other changes relative to the Nix ecosystem that should be considered before considering this done? (or any nuance that I might have missed in your previous comments that is not addressed?) |
@wagoodman still looks about right. I'd probably make it clear somewhere that this only looks at the store paths to do all the cataloguing, in case we add other ways to convey more information. |
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
* main: (35 commits) Fix kernel cataloger test fixtures (#1742) feat: Support scanning license files in golang packages over the network (#1630) Add package-to-file location evidence relationships (#1698) Add Linux Kernel cataloger (#1694) Add annotations for evidence on package locations (#1723) add format make target (#1733) Update tests to not fail on Mac M1's. (#1730) chore(deps): update bootstrap tools to latest versions (#1728) Add support for nar files. (#1727) add highlevel details about catalogers (#1726) chore(deps): bump golang.org/x/net from 0.8.0 to 0.9.0 (#1722) chore(deps): update stereoscope to e95d60a265e384df29b7a139f5c5402d6ad72e06 (#1721) feat: gradle lockfile support (#1719) chore(deps): bump github.com/docker/docker (#1715) chore(deps): bump golang.org/x/mod from 0.9.0 to 0.10.0 (#1713) chore(deps): bump golang.org/x/term from 0.6.0 to 0.7.0 (#1714) chore(deps): bump github.com/spf13/cobra from 1.6.1 to 1.7.0 (#1716) chore(deps): bump peter-evans/create-pull-request from 4 to 5 (#1712) chore: update tools-golang to v0.5.0 (#1717) Add Nix cataloger (#1696) ... Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Cool addition! I took the liberty of starting a Discourse thread about the use of purl for Nix packages at https://discourse.nixos.org/t/package-urls-purl-for-nix-packages |
* Add Basic Nix Cataloger Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com> * Update nix def for the latest syft definition Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com> * capture nix package files on pkg.NixStoreMetadata Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * fix unit tests and linting Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update JSON schema Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * address review comments Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * Update syft/pkg/cataloger/nix/parse_nix_store_path_test.go Co-authored-by: Florian Klink <flokli@flokli.de> Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * support unstable version conventions Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update json schema relative to main branch Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update syft json with v7.1.1 schema Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * fix CLI tests Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * remove extra continue statement Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add Nix to list of supported ecosystems Signed-off-by: Alex Goodman <alex.goodman@anchore.com> --------- Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com> Signed-off-by: Alex Goodman <alex.goodman@anchore.com> Co-authored-by: Julio Tain Sueiras <juliosueiras@gmail.com> Co-authored-by: Florian Klink <flokli@flokli.de>
This builds on the original work from @juliosueiras on #1107 . The changes include:
I'm not very familiar with the Nix ecosystem so review from someone more familiar with Nix would be very helpful! (cc: @flokli if you have the time 🙏 )
Details
I want to be clear on how this is working and what the consequences are. This implementation looks at only what directories are found in
/nix/store
and will construct packages for each output directory found (which means derivations are not included). This means that packages that have multiple outputs will seemingly be unrelated. This is because there is not enough metadata locally in the scan target to determine what outputs should be related.For example, if you install python37 from nixpkgs, you will see two outputs,
out
(the default, so has no name) anddebug
:When running syft, you will see two
python3@3.7.16
packages, one for each output:The pURLS are nicely distinct:
But the package names are not unique, which is technically correct, but it would be more ideal to try and relate or merge these packages in the future.
Closes #462