-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improvements to known CPE index construction #2801
Conversation
Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways Signed-off-by: Weston Steimel <commits@weston.slmail.me>
c135885
to
508c07d
Compare
Previously the struct definition for CpeItem caused only the last URL reference in the list to be kept and processed for inclusion in the index Signed-off-by: Weston Steimel <commits@weston.slmail.me>
064627c
to
0cf0f56
Compare
@@ -3,8 +3,8 @@ package main | |||
type CpeItem struct { | |||
Name string `xml:"name,attr"` | |||
Title string `xml:"title"` | |||
References []struct { | |||
Reference struct { | |||
References struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonsteimel can you help me understand this change?
Why is it more correct to model references as a struct holding a slice of structs than just as a slice of structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the previous struct definition we only got a single url (the final one from the list) after unmarshalling. If there is a better way to do this, I'm happy to update this, but the go xml unmarshalling examples I found all seemed to show this was the way to make it work
Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways
There was also a bug with the struct definition that caused only the final reference url in the list to be unmarshaled and considered when constructing the index