-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use url.URL to encode and decode PURLs #52
Conversation
Hi @tommyknows 👋 First all thank you for your contribution. I totally agree with you here. If purls are considered URLs we should try using the go url package. There are a few test cases failing, can you have a look at this? |
Hi @shibumi, thanks, should be fixed! Missed to lowercase the type before giving it to the normalisation-functions :) |
This commit refactors the `ToString` and `FromString` functions / methods to use the `url.URL` type, instead of trying to parse URLs directly. The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url) explicitly says: > A purl is a URL: > A purl is a valid URL and URI that conforms to the URL definitions or > specifications ... So why not actually use Go's URL type to do operations on it? This commit does exactly that, and removes a lot of the previously dense parsing-logic with simpler checks based on the URL's fields. Especially `ToString()` has gotten a lot simpler through that. Additionaly, by switching to the URL package, this commit fixes a couple of outstanding bugs: - When a qualifier contained `+` signs, which are valid in URL paths, but not in URL queries, the sign did not get escaped. The previous code relied on `url.PathUnescape` to unescape keys and values, which should have used `url.QueryUnescape` instead. Further, encoding Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This could have led to pURLs losing `+` signs if they were properly encoded. - Fixes package-url#51, where spaces in names have not gotten encoded correctly. - Fixes most test-cases from package-url#22 that are round-trip-save (e.g. all cases where input == output), except the "pre-encoded qualifier value is unchanged" testcase that is wrong - a qualifier shouldn't be encoded with `%20` for a space, but with a plus-sign (query encoding). - Fixes most cases from package-url#41 as well, except: - where the query encoding in the test-cases are wrong (" " -> "+", "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged` and `unencoded_qualifier_value_is_encoded`), - `characters_are_unencoded_where_allowed`: `<` should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine. - `explicit_characters_are_encoded`: `@` does not need to be encoded. When reading through that list and all the nuances, it makes it clear that we shouldn't do this ourselves! And this commit doesn't, it simply relies on the Go standard library to handle all of these cases correctly. All of the aforementioned issues should have test-cases, presumably added to the test-suite-data.
Added a few comments, mostly nit picks |
Thank you for contribution @tommyknows , great work!! 💯 |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/google/go-cmp](https://github.com/google/go-cmp) | require | minor | `v0.5.9` -> `v0.6.0` | | [github.com/jedib0t/go-pretty/v6](https://github.com/jedib0t/go-pretty) | require | patch | `v6.4.7` -> `v6.4.8` | | [github.com/package-url/packageurl-go](https://github.com/package-url/packageurl-go) | require | patch | `v0.1.1` -> `v0.1.2` | | golang.org/x/exp | require | digest | `9212866` -> `7918f67` | | golang.org/x/mod | require | minor | `v0.12.0` -> `v0.13.0` | | golang.org/x/sync | require | minor | `v0.3.0` -> `v0.4.0` | | golang.org/x/term | require | minor | `v0.12.0` -> `v0.13.0` | --- ### Release Notes <details> <summary>google/go-cmp (github.com/google/go-cmp)</summary> ### [`v0.6.0`](https://github.com/google/go-cmp/releases/tag/v0.6.0) [Compare Source](https://github.com/google/go-cmp/compare/v0.5.9...v0.6.0) New API: - ([#​340](https://github.com/google/go-cmp/issues/340)) Add `cmpopts.EquateComparable` Documentation changes: - ([#​337](https://github.com/google/go-cmp/issues/337)) Use of hotlinking of Go identifiers Build changes: - ([#​325](https://github.com/google/go-cmp/issues/325)) Remove purego fallbacks Testing changes: - ([#​322](https://github.com/google/go-cmp/issues/322)) Run tests for Go 1.20 version - ([#​332](https://github.com/google/go-cmp/issues/332)) Pin GitHub action versions - ([#​327](https://github.com/google/go-cmp/issues/327)) set workflow permission to read-only </details> <details> <summary>jedib0t/go-pretty (github.com/jedib0t/go-pretty/v6)</summary> ### [`v6.4.8`](https://github.com/jedib0t/go-pretty/releases/tag/v6.4.8) [Compare Source](https://github.com/jedib0t/go-pretty/compare/v6.4.7...v6.4.8) ### Features - **table** - `RenderTSV()` to render table in TSV format ([#​277](https://github.com/jedib0t/go-pretty/issues/277)) // thanks [@​rafiramadhana](https://github.com/rafiramadhana) </details> <details> <summary>package-url/packageurl-go (github.com/package-url/packageurl-go)</summary> ### [`v0.1.2`](https://github.com/package-url/packageurl-go/releases/tag/v0.1.2) [Compare Source](https://github.com/package-url/packageurl-go/compare/v0.1.1...v0.1.2) #### What's Changed - Add Julia by [@​Octogonapus](https://github.com/Octogonapus) in [https://github.com/package-url/packageurl-go/pull/44](https://github.com/package-url/packageurl-go/pull/44) - feat: add missing purl types by [@​mcombuechen](https://github.com/mcombuechen) in [https://github.com/package-url/packageurl-go/pull/43](https://github.com/package-url/packageurl-go/pull/43) - Pull test data from upstream instead of maintaining a local copy by [@​Octogonapus](https://github.com/Octogonapus) in [https://github.com/package-url/packageurl-go/pull/49](https://github.com/package-url/packageurl-go/pull/49) - Add simple fuzz test by [@​imjasonh](https://github.com/imjasonh) in [https://github.com/package-url/packageurl-go/pull/34](https://github.com/package-url/packageurl-go/pull/34) - Test using supported Go versions by [@​imjasonh](https://github.com/imjasonh) in [https://github.com/package-url/packageurl-go/pull/50](https://github.com/package-url/packageurl-go/pull/50) - Remove deprecated usage of ioutil by [@​noqcks](https://github.com/noqcks) in [https://github.com/package-url/packageurl-go/pull/40](https://github.com/package-url/packageurl-go/pull/40) - fix: use url.URL to encode and decode PURLs by [@​tommyknows](https://github.com/tommyknows) in [https://github.com/package-url/packageurl-go/pull/52](https://github.com/package-url/packageurl-go/pull/52) - fix: escape and unescape name by [@​tommyknows](https://github.com/tommyknows) in [https://github.com/package-url/packageurl-go/pull/55](https://github.com/package-url/packageurl-go/pull/55) - fix: escape everything with modified QueryEscape by [@​tommyknows](https://github.com/tommyknows) in [https://github.com/package-url/packageurl-go/pull/58](https://github.com/package-url/packageurl-go/pull/58) - Add `pub` and `bitnami` types by [@​antgamdia](https://github.com/antgamdia) in [https://github.com/package-url/packageurl-go/pull/60](https://github.com/package-url/packageurl-go/pull/60) - Add known types and candidate types by [@​antgamdia](https://github.com/antgamdia) in [https://github.com/package-url/packageurl-go/pull/61](https://github.com/package-url/packageurl-go/pull/61) - Add PackageURL.Normalize by [@​wetterjames4](https://github.com/wetterjames4) in [https://github.com/package-url/packageurl-go/pull/65](https://github.com/package-url/packageurl-go/pull/65) #### New Contributors - [@​mcombuechen](https://github.com/mcombuechen) made their first contribution in [https://github.com/package-url/packageurl-go/pull/43](https://github.com/package-url/packageurl-go/pull/43) - [@​imjasonh](https://github.com/imjasonh) made their first contribution in [https://github.com/package-url/packageurl-go/pull/34](https://github.com/package-url/packageurl-go/pull/34) - [@​noqcks](https://github.com/noqcks) made their first contribution in [https://github.com/package-url/packageurl-go/pull/40](https://github.com/package-url/packageurl-go/pull/40) - [@​tommyknows](https://github.com/tommyknows) made their first contribution in [https://github.com/package-url/packageurl-go/pull/52](https://github.com/package-url/packageurl-go/pull/52) - [@​antgamdia](https://github.com/antgamdia) made their first contribution in [https://github.com/package-url/packageurl-go/pull/60](https://github.com/package-url/packageurl-go/pull/60) - [@​wetterjames4](https://github.com/wetterjames4) made their first contribution in [https://github.com/package-url/packageurl-go/pull/65](https://github.com/package-url/packageurl-go/pull/65) **Full Changelog**: package-url/packageurl-go@v0.1.1...v0.1.2 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 6am on monday" in timezone Australia/Sydney, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/google/osv-scanner). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4wLjMiLCJ1cGRhdGVkSW5WZXIiOiIzNy44LjEiLCJ0YXJnZXRCcmFuY2giOiJtYWluIn0=-->
This commit refactors the
ToString
andFromString
functions / methods to use theurl.URL
type, instead of trying to parse URLs directly.The PURL Spec explicitly says:
So why not actually use Go's URL type to do operations on it?
This commit does exactly that, and removes a lot of the previously dense parsing-logic with simpler checks based on the URL's fields. Especially
ToString()
has gotten a lot simpler through that.Additionaly, by switching to the URL package, this commit fixes a couple of outstanding bugs:
+
signs, which are valid in URL paths, but not in URL queries, the sign did not get escaped. The previous code relied onurl.PathUnescape
to unescape keys and values, which should have usedurl.QueryUnescape
instead. Further, encoding Qualifiers usedurl.PathEscape
instead ofurl.QueryEscape
. This could have led to pURLs losing+
signs if they were properly encoded.%20
for a space, but with a plus-sign (query encoding).pre-encoded_qualifier_value_is_unchanged
andunencoded_qualifier_value_is_encoded
),characters_are_unencoded_where_allowed
:<
should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine.explicit_characters_are_encoded
:@
does not need to be encoded.When reading through that list and all the nuances, it makes it clear that we shouldn't do this ourselves! And this commit doesn't, it simply relies on the Go standard library to handle all of these cases correctly.
All of the aforementioned issues should have test-cases, presumably added to the test-suite-data.
You may notice that this code barely touches the tests. I'm not sure whether to add explicit test-cases to the Go implementation only, or if these should rather be added to the general test-suite. I'm happy to do either way - I've ensured that these changes work with all the tests from the issues & PRs mentioned above.
The only thing that does change to the test-suite is to use a multi-line matching for the JSON-data, so that not only
{"key": "value"}
qualifiers are valid, but also:(not that it's currently needed, but when I added tests my editor automatically made multi-line strings out of these, so I needed to fix it 🙂)