-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Stringer interface on structs, add FromString method where relevant #137
Conversation
Signed-off-by: Ian Ling <ian@iancaling.com>
@swinslow this one's up next. The methods added here should also probably be used in the tag-value and RDF parsers/writers, for consistency. But that can get taken care of in another PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ianling, thanks for this and apologies again for my delay in reviewing it.
I have a few comments below, mostly around how this handles strings that might have extra whitespace and whether to leverage some of the tag-value parsing functions that I think already address some of these. Let me know your thoughts. Thanks!
// ExternalDocumentRef2_1 is a reference to an external SPDX document | ||
// as defined in section 2.6 for version 2.1 of the spec. | ||
type ExternalDocumentRef2_1 struct { | ||
// DocumentRefID is the ID string defined in the start of the | ||
// reference. It should _not_ contain the "DocumentRef-" part | ||
// of the mandatory ID string. | ||
DocumentRefID string `json:"externalDocumentId"` | ||
DocumentRefID DocElementID `json:"externalDocumentId"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, I believe the reason I had used string
rather than DocElementID
as the type is that DocElementID
is essentially a tuple of an SPDX Document ID and an SPDX Element ID within that Document.
By contrast, External Document Refs only consist of a Document ID. In essence, it is defining a Document ID corresponding to the namespace of the External Document, so that the Document ID can be used throughout the rest of this current Document to make reference to elements in the External Document.
Because of that, I'm not sure we should switch this to DocElementID
. It might be confusing to folks since the ElementRefID
part of the struct would have no meaning, and shouldn't be rendered here. I think it might be preferable to keep it as just a string. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I changed it is because we lose all of the special handling for JSON/YAML exports if we keep this as a normal string and will need to re-implement those cases again.
In other words, if we export DocumentRefID string
to JSON, we'll just get myDoc
, but if we make it DocumentRefID DocElementID
and export it to JSON, we automatically get DocumentRef-myDoc
. All of the logic is already in place for the library to properly handle situations where only the DocumentRefID
field has a value and not ElementRefID
or SpecialID
, so only DocumentRef-myDoc
would be rendered.
I think it is simplest just to make it clear in comments (and future document validation functions) what the correct usage of DocElementID is inside of an external document ref.
We could also provide a constructor for the ExternalDocumentRef structs if we're worried about making one from scratch being confusing. Something like func MakeExternalDocumentRef(documentRef, uri, algorithm, checksumValue string) ExternalDocumentRef2_1 {...}
spdx/identifier.go
Outdated
case 2: | ||
// "SPDXRef-" prefix was present | ||
*e = ElementID(idFields[1]) | ||
case 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow this case: if the SPDXRef-
prefix is not present, I think we'd want to treat it as an invalid ID rather than ignore that the prefix isn't present? Unless there's a use case here I'm misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be useful for this library to accept slightly malformed data and sorta "clean it up" if the user chooses to export it again. In this case, if the user parsed a file and an SPDXID field was missing the SPDXRef-
prefix, they could export the file again after parsing it and the file would then pass validation (assuming that was the only issue with the file).
If that happens though, we should probably at least issue a warning rather than just charging ahead, or force the user to opt out of validation in order to parse a document that contains problems like this.
I'll give this one some more thought. We frequently have users providing us slightly invalid SPDX documents like this, and we have to walk them through fixing their documents since the parser throws errors for small issues like this, so my preference is just that the library makes a decent guess as to what the user intended, but I also see the value in ensuring that the we uphold the integrity of the spec by at least telling the user they did something wrong.
func MakeDocElementID(docRef string, eltRef string) DocElementID { | ||
docRef = strings.Replace(docRef, "DocumentRef-", "", 1) | ||
eltRef = strings.Replace(eltRef, "SPDXRef-", "", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit uncomfortable with having this function be able to take strings which might or might not have the DocumentRef-
or SPDXRef-
prefixes.
This is an edge case probably, but in an SPDX Document you could have two different SPDX elements with the following SPDX Identifiers:
SPDXRef-File1
SPDXRef-SPDXRef-File1
I wouldn't recommend anyone actually build an SPDX Document this way, but I think it would be syntactically valid.
So if we've got functions that are optionally stripping the first instance of SPDXRef-
that it sees, I feel like that will lead to unexpected behavior for edge cases like the one above. I'm having trouble putting my finger on a concrete example, but I guess I'm thinking that it'll be easier to avoid doing the wrong thing if a function like this always requires its inputs to either have the prefix or omit the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your example should just work, due to the , 1)
at the end of the strings.Replace(...)
call.
strings.Replace("SPDXRef-File1", "SPDXRef-", "", 1) -> "File1"
strings.Replace("SPDXRef-SPDXRef-File1", "SPDXRef-", "", 1) -> "SPDXRef-File1"
strings.Replace("SPDXRef-DocumentRef-File1", "SPDXRef-", "", 1) -> "DocumentRef-File1"
strings.Replace("File1-SPDXRef", "SPDXRef-", "", 1) -> "File1-SPDXRef"
If we can come up with edge cases we're worried about, I can add test cases for them to ensure that this function is handling them the way we'd want.
But as I mentioned in another comment, this is a situation we have run into repeatedly with real SPDX documents from users, where sometimes they just omit the SPDXRef-
prefix on their identifiers, and it's not sustainable for us to fix that manually every time it happens. I think this is a case where the user clearly meant to add the prefix but just forgot/missed it and it's safe for us to make an assumption.
Thank you again for your time @swinslow. I moved some code up from the |
|
||
// following tests are copy pasted from tvloader/parser2v2/util_test.go | ||
|
||
func TestCanExtractDocumentAndElementRefsFromID(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made sure to hang on to non-duplicate tests and move them to the proper place after my re-arranging, so we did not lose any coverage.
Signed-off-by: Ian Ling <ian@iancaling.com>
Hi yet again @ianling -- are you interested in updating this PR based on the latest refactoring? |
Hiya @ianling -- thank you for this (and the other) contributions. I'm going to close this for now, since it's been a few months open without movement. We will gladly reopen this if you decide to update/continue it! |
Allows structs to be reliably and consistently converted to strings (e.g. for tag-value and spreadsheet export/writing)
Allows structs to be filled in by parsing them from strings (e.g. for tag-value and spreadsheet import/parsing)
Note: this PR does not completely refactor the tag-value/RDF parsers/writers to use these methods. They are used by the spreadsheet parser in #138.
Depends on and contains commits from #133 and #134