-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support specifying what parser to use in --lockfile
#94
Conversation
Thank you for your PR. @another-rex is on vacation and we will review it early next year and then release as part of the next minor release 1.0.x release. |
yup all good - fwiw I might do another PR or two over the holiday break; btw I think you mean 1.x release, since 1.0.x is for fixes not minors :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton!
Note that while this helps #67, I wouldn't consider this to fully "resolve" the issue. The intention of that issue was to automatically detect such cases without user intervention.
README.md
Outdated
files with the `-parse-as` flag: | ||
|
||
```bash | ||
$ osv-scanner --parse-as 'requirements.txt' --lockfile=/path/to/your/extra-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of this (applies to every single file you scan) seems a bit limiting.
Some alternatives that make this more flexible:
1. Explicit argument for every lockfile type
osv-scanner --requirements-txt=/path/to/requirements.txt
osv-scanner --cargo-lock=/path/to/...
Downsides: This results in multiple flags that do the same thing (scanning a lockfile), which doesn't seem ideal. The format of the flag itself is also not obvious (i.e. "Cargo.lock" corresponds to "--cargo-lock").
2. A delimiter in the existing lockfile
value.
osv-scanner --lockfile=PARSE-AS:requirements.txt:/path/to/....
Downsides: The string format is hard to read and easy to get wrong.
3. Allow --parse-as to be overridable
--parse-as
would apply to all subsequent --lockfile
arguments, and can be specified multiple times.
osv-scanner --parse-as=requirements.txt --lockfile=1/2.txt --lockfile=3/4.txt --parse-as=Cargo.lock --lockfile=5/Cargo.lock
The first two lockfiles (1/2.txt
, 3/4.txt
) are parsed using requirements.txt
. 5/Cargo.lock
is parsed using Cargo.lock
because we passed a subsequent --parse-as
that overrides this.
Option 3. Seems like the best one to me, in a way that's consistent with our existing CLI format, without introducing redundancies. WDYT?
I thought about that when implementing the In addition, currently the Finally, there are also cases where a user might not want a So I personally think it's better to mirror the behaviour of One improvement that could be made to the
Sounds ok, but note it's not actually consistent with the existing CLI flags - currently everything is based on "last value wins", i.e. doing the equivalent as your example comment but with Note that there is a tradeoff too: you're removing the ability for users to confidently override/enforce certain options which can be useful in automatic tooling where the command line arguments are being generated (i.e. ensuring that the Given that, I'd prefer to land this PR as-is and then follow it up with another PR changing the overall flag logic to work as you described (if that is still something you'd like). |
Thanks for explaining. Yeah, it seems like a difficult problem to solve perfectly. One compromise solution here is to account for any filename matching This particular issue came up when we tried scanning https://github.com/home-assistant/core, where they have a bunch of This is also the approach that syft takes: https://github.com/anchore/syft/blob/e3d6ffd30e44428b898675922a0474221a7f7dc7/syft/pkg/cataloger/python/cataloger.go#L16
Good call on the inconsistency with That said, with
Not sure I fully understand this in the case of
Unfortunately as above we need to ensure backwards compatibility for all of our releases, so we can't change the behaviour of a flag like so after we merge (and release this). |
I assume you mean that provided
Yes, but you increase the amount of work required to ensure that outcome because now users and tools have to examine the command more closely, figure out what rules are in place, and modify the whole command to ensure their outcome; whereas currently, if you just stick I think in the case of
Actually so long as the behaviour is not documented (which it currently isn't), you can make the argument that the change is a bugfix as it was always the desired behaviour - of course I agree care should be taken to minimize disruptions, but I personally think it would be pretty confusing to have some CLI flags behave so differently/inversely to others; it is of course your call though, so I can try and find some time to change this. |
Yep!
Thanks for the detailed rationale. I see your point on the inconsitency wrt flags, but I also see a global
Adding a Thinking more -- is |
I've done some initial looking and it seems that
I don't mind saying that this solves a separate issue, but I think there is value in having the flag in either form (and even if its hidden) because it makes the tool more flexible - consider that right now you have to name your file based on the lockfile for a particular ecosystem but people could use exotic systems which could use alternate names (for better or worse), and we are already doing that here with the fixtures for |
Another option similar to option 1 is to have parse-as argument where you specify the additional mapping in value rather than the flag itself, e.g.:
And you can then specify multiple of these similar to how it can be done with other flags. |
This seems like a reasonable compromise that allows the flexibility we want without being inconsistent with our CLI examples. I'd probably invert the order though: --parse-as=requirements.txt:/path/to/requirements_extra.txt To make this more consistent/easier to parse in case there are ":" characters in the input paths. |
that actually also enables the original form of this i.e. |
This could, but I think we should try to have a single way of doing something (i.e. the full |
Yeah I wasn't meaning supporting both of those ways (especially since the second way is not possible with the current CLI library being used), but that because |
@oliverchang @another-rex I've implemented this per our discussions - it's come out pretty nicely; a couple of thoughts:
|
Thanks! Probably for a follow up PR, but something we can add as well is instead of passing in the full path, you only have to pass in the file name, or even a glob to match against. This PR seems structured quite nicely to support those cases. |
README.md
Outdated
tell the scanner what parser to use for specific files using the `--parse-as` flag: | ||
|
||
```bash | ||
$ osv-scanner --parse-as 'requirements.txt:/path/to/your/extra-requirements.txt' --lockfile=/path/to/your/extra-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead make --parse-as
just do the parsing itself as well? It seems a little clunky to have to specify the path twice in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also rather indirect -- it sets some global state that then needs to be utilised later by specifying the same path again. Not sure if there are benefits to this, and it seems much simpler/easier to use to just streamline this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fussed, but note what you're describing sounds based solely on the use of -L
, when you can pass directories and files as args too.
While still less useful in this PR, as @another-rex pointed out this could be extended to support globs which make this make more sense, i.e. osv-scanner --parse-as requirements.txt:requirements-*.txt my/project
.
I don't think there's a major loss so like I said I'm not too fussed as I agree it'd be nice to reduce the verbosity, but it might mean there's some edge-cases in future going down this path (I can play around with it tomorrow of course).
A middleground could be to assume -L
unless ones actually present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the general globbing/directory case, I think that's something that would be better suited as a config file option, rather than trying to support this complex mapping as a CLI option.
I think we should make --parse-as
less verbose and just have it do the scanning as well.
@another-rex thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see with making parse-as do scanning as well is that it won't be able to support directory scanning at all, which we currently advertise as the easiest way to use osv-scanner.
osv-scanner --parse-as requirements.txt:alt_requirement.txt -r my/project
will turn into
osv-scanner --parse-as requirements.txt:my/project/alt_requirement.txt --parse-as requirements.txt:my/project/inner/alt_requirement.txt ...etc.
which doesn't look that good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was one of the original proposals in #94 (comment) :)
My concern there was that it was hard to parse perfectly in all cases. e.g. what if we had ':' characters in the given pathnames? Or if we had a filename called "requirements.txt:foo" ? Unlikely, but it does feel like a gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 I completely missed that somehow.
I think it would be fine, and I provided a way to handle edgecases: if we say that the split is always on the first :
and that an empty "parse-as" means "use default/infer", then that should prevent any edgecases - even if your file started with an :
, you'd just provide -L=::my-file
(and repeat for as many :
s your filename starts with)
If that doesn't make sense, but you like the idea I can switch and then you can see it in actual action via the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm still not sure that handles the edge case where the full filename is "requirements.txt:foo".
In this case, perhaps we can expand this into multiple files if needed. i.e.
- if "requirements.txt:foo" exists, scan that through the default/infer rules.
- if "foo" also exists, scan "foo" as "requirements.txt" as well.
- or both, if both cases are satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should because you'd be providing -L=:requirements.txt:foo
(and to be clear, that's only required in these cases - otherwise the starting :
can be omitted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I think that would be a slight behaviour change, because previously scanning "requirements.txt:foo" would scan that file directly.
That said, it's enough of an edge case that I don't think anyone would actually be affected by this, so what you suggested seems good to me.
(I realise I may not have communicated very clearly about what I had thought |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Looks good to me, just minor nit about import order being moved.
@oliverchang is out today but let's get this merged in soon on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!! just some comments.
Also resolves #124 |
@G-Rath I think we just have two minor comments here to resolve, and we're good to go? |
@oliverchang yup I've been a bit slammed between personal and work stuff, so not had a chance to do anything big but tomorrow's 20% time so should be able to pick it up then |
note the documentation is a little ham-y right now, but I think that'll be easier to improve once #168 is landed because then there'll be two cases like this. |
--parse-as
flag--lockfile
…or and this is the better code
thanks a ton as always @G-Rath ! |
Hi, |
@agmond A new version is planned to be released this week, we will also publish a releasing schedule soon. |
Resolves #67
Resolves #124