Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Fetch the purl types dynamically #1071

Open
antgamdia opened this issue Jul 19, 2023 · 9 comments
Open

[feature] Fetch the purl types dynamically #1071

antgamdia opened this issue Jul 19, 2023 · 9 comments
Labels
enhancement New feature or request
Milestone

Comments

@antgamdia
Copy link

Is your feature request related to a problem? Please describe.

When importing files, like an SPDX, including unsupported purl types, the ingestion fails with a message like:

unable to ingest doc tree: unable to parse purl pkg:bitnami/apache@2.4.57

This is happening because of:

case "alpine", "alpm", "apk", "huggingface", "githubactions", "mlflow", "qpkg", "pub", "swid", PurlTypeGuac:
fallthrough
// PURL types defined in purl library handled generically
case purl.TypeBitbucket, purl.TypeCocoapods, purl.TypeCargo,
purl.TypeComposer, purl.TypeConan, purl.TypeConda, purl.TypeCran,
purl.TypeDebian, purl.TypeGem, purl.TypeGithub,
purl.TypeGolang, purl.TypeHackage, purl.TypeHex, purl.TypeMaven,
purl.TypeNPM, purl.TypeNuget, purl.TypePyPi, purl.TypeRPM, purl.TypeSwift,
purl.TypeGeneric:
// some code
r := pkg(p.Type, p.Namespace, p.Name, p.Version, p.Subpath, p.Qualifiers.Map())
return r, nil

Since the purl type is not hardcoded there, then it will yield an error.

It shouldn't be a problem except for the fact that the purl type list is dynamic (for example, we are adding the bitnami type at package-url/purl-spec#231), so hardcoding the possible values in the GUAC code does not seem to scale.

Describe the solution you'd like

Ideally, if there is any way to query all the possible values of a purl type, then it would be great. I don't know how feasible it could be, though.
Alternatively, a simple workaround could be an opt-in config flag telling the parser to accept "unknown" package types, maybe. However, I don't know the side effects of this.

Describe alternatives you've considered
I have locally added my custom purl type and I have successfully been able to ingest my SPDX, but I'm not sure about the side effects yet.

Additional context
N/A

@antgamdia antgamdia added the enhancement New feature or request label Jul 19, 2023
@lumjjb
Copy link
Contributor

lumjjb commented Jul 19, 2023

Hi @antgamdia ! Thanks for opening this! I think you've described the situation around PURLs pretty well. Our decision at the time to handle each case was deliberate due to PURLs not necessarily being consistent in the meaning they convey across ecosystems (example: OCI). Thus we wanted to be intentional when a new ecosystem comes up that we handle it.

Would you be able to open that PR to handle that bitnami PURL type and we'll get it merged in?

Your opt-in "handle unknown PURL" flag sounds very tempting but I think we'd want to be careful that it doesn't become the default. I think this would be something i'd like to discuss, since i've had my fair share of PURLs which aren't actually PURLs but the information is useful. Perhaps there's a middleground to handle this to say that this is a heuristic or unhandled purl type.

@pxp928
Copy link
Collaborator

pxp928 commented Jul 19, 2023

I would be against adding a flag for "unknown PURLs" as the helper function converts to a model.PkgInputSpec and this is used to determine the mapping into the backend databases. If we allow "unknown" purls to be mapped, we could end up with a bunch of useless data in the database that does not query properly. Rather we should make updates to the helper function as needed and open issues as we encounter new additional types to ensure proper parsing of the new purl types.

@antgamdia
Copy link
Author

antgamdia commented Jul 20, 2023

Our decision at the time to handle each case was deliberate due to PURLs not necessarily being consistent in the meaning they convey across ecosystems (example: OCI). Thus we wanted to be intentional when a new ecosystem comes up that we handle it.

Fair point. Manually handling each new type that may pop up in the ecosystem seems to be the safest way to ensure GUAC is actually compatible.

Your opt-in "handle unknown PURL" flag sounds very tempting but I think we'd want to be careful that it doesn't become the default.

(...) I think this would be something i'd like to discuss, since i've had my fair share of PURLs which aren't actually PURLs but the information is useful.
I would be against adding a flag for "unknown PURLs" (...) Rather we should make updates to the helper function as needed and open issues as we encounter new additional types to ensure proper parsing of the new purl types.

Yep, that's my point. Rather than rejecting the information, we could still ingest it and use it somehow.... but this could clearly lead to what you mention: a bunch of "un-queryable" information stored in the DB.

However, I'm not familiar yet with the ontology used in GUAC, but maybe handling unknown package types as another legit source of information could potentially be useful for some organizations. What if the org is generating some internal packages but they are not published nor do they want to create a new purl type?
Perhaps it is too specific and it is not really worth it, I kind of understand it.

Another idea, rather than accepting anything unknown, could be an opt-in option to ignore it instead. It is safer than ingesting the unknown, yet useful for people using unsupported types.
For instance, in the test I was performing, I had a quite large SPDX with plenty of packages, but, just ~10 were "bitnami" purl type ones: if I had had the ignore option, I would have been able to ingest the SPDX and use it. However, instead, I had to manually look up those offending purls and delete them manually in a text editor in order to get my file ingested by GUAC.

As GUAC continues to be an active project, with people vigilant on any unsupported format, then it's ok... but I'm not that sure that blocking the ingest if something unsupported comes is the best/more scalable way to go :S

A quick win, perhaps, could be just sth like:

	default:
		if _, ok := purl.SupportedTypes[p.Type]; !ok {
			// If the type is supported by the purl library, but not handled above, then
			// we should add a case for it above.
			return nil, fmt.Errorf("unhandled existing PURL type: %q. Please, report it at https://github.com/guacsec/guac/issues/new", p.Type)
		}

		// unhandled types should throw an error so we can make sure to review the
		// implementation of newly introduced PURL types.
		return nil, fmt.Errorf("unhandled unknown PURL type: %q", p.Type)
	}

I could check with purl folks to see if the purl.SupportedTypes map could actually be a thing - currently, it is not (edit: I've filed an issue here: package-url/packageurl-go#59)

Would you be able to open that PR to handle that bitnami PURL type and we'll get it merged in?

Yes, absolutely. As soon as the new PURL type gets officially merged, I'll raise a new PR here.

Thanks for your input and quick response!

@pxp928
Copy link
Collaborator

pxp928 commented Jul 20, 2023

Great write up @antgamdia. I like the idea you proposed of returning fmt.Errorf("unhandled existing PURL type: %q. Please, report it at https://github.com/guacsec/guac/issues/new", p.Type). That would make it easy for a user to know which type is unsupported and can be quickly addressed. For now, we could have a guac "supportedTypes" until the upstream purl folks support SupportedTypes.

However, I'm not familiar yet with the ontology used in GUAC, but maybe handling unknown package types as another legit source of information could potentially be useful for some organizations. What if the org is generating some internal packages but they are not published nor do they want to create a new purl type?

So we currently handle that where we use heuristics to create a GUAC-specific purl based on the name and version of the unknown package (purl is not provided in the SBOM). That might be a safer option rather than creating a purl that is not accepted by the community.

func GuacPkgPurl(pkgName string, pkgVersion *string) string {
escapedName := SanitizeString(pkgName)
if pkgVersion == nil {
return fmt.Sprintf(PurlPkgGuac+"%s", escapedName)
}
return fmt.Sprintf(PurlPkgGuac+"%s@%s", escapedName, *pkgVersion)
}
func GuacFilePurl(alg string, digest string, filename *string) string {
s := fmt.Sprintf(PurlFilesGuac+"%s:%s", strings.ToLower(alg), digest)
if filename != nil {
s += fmt.Sprintf("#%s", SanitizeString(*filename))
}
return s
}
func GuacGenericPurl(s string) string {
return fmt.Sprintf("pkg:guac/generic/%s", SanitizeString(s))
}

Another idea, rather than accepting anything unknown, could be an opt-in option to ignore it instead. It is safer than ingesting the unknown, yet useful for people using unsupported types.

This can also be an option that still outputs error messages of unknown purl types (similar to the error message you provided above) without blocking. We just have to ensure that the user understands that using this option could result in missed information.

@lumjjb
Copy link
Contributor

lumjjb commented Jul 20, 2023 via email

@lumjjb lumjjb added this to the GUAC v0.2 milestone Jul 20, 2023
@antgamdia
Copy link
Author

Thanks again for your input on this!

Just for the record, the bitnami purl was recently accepted and merged in the spec. Besides, I have created two PRs in the packageurl-go package they provide to add the type and the SupportedType we talked about - don't know when they will get merged, though-.
They are package-url/packageurl-go#60 and package-url/packageurl-go#61.

@pxp928
Copy link
Collaborator

pxp928 commented Jul 31, 2023

Great! Thanks, @antgamdia. Looking forward to getting this added to guac once it merges upstream.

@antgamdia
Copy link
Author

Both PRs were just merged! The upcoming packageurl-go should include both the bitnami purl as well as two exported maps KnownTypes and CandidateTypes holding each possible accepted value in the spec :)

@rvanider
Copy link

The purl specification refers to known types versus stating allowed types. It articulates the rules for the allowed values for the type attribute with encoding rules. Only expressly supporting those types expressed makes the tools unable to handle non-public package types or future types.

I would ask that you accept any type, based on the parsing rules. Falling back to the generic type or any limited capability but ingest the purl so its relationships can be utilized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants