-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: use url.URL to encode and decode PURLs
This commit refactors the `ToString` and `FromString` functions / methods to use the `url.URL` type, instead of trying to parse URLs directly. The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url) explicitly says: > A purl is a URL: > A purl is a valid URL and URI that conforms to the URL definitions or > specifications ... So why not actually use Go's URL type to do operations on it? This commit does exactly that, and removes a lot of the previously dense parsing-logic with simpler checks based on the URL's fields. Especially `ToString()` has gotten a lot simpler through that. Additionaly, by switching to the URL package, this commit fixes a couple of outstanding bugs: - When a qualifier contained `+` signs, which are valid in URL paths, but not in URL queries, the sign did not get escaped. The previous code relied on `url.PathUnescape` to unescape keys and values, which should have used `url.QueryUnescape` instead. Further, encoding Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This could have led to pURLs losing `+` signs if they were properly encoded. - Fixes #51, where spaces in names have not gotten encoded correctly. - Fixes most test-cases from #22 that are round-trip-save (e.g. all cases where input == output), except the "pre-encoded qualifier value is unchanged" testcase that is wrong - a qualifier shouldn't be encoded with `%20` for a space, but with a plus-sign (query encoding). - Fixes most cases from #41 as well, except: - where the query encoding in the test-cases are wrong (" " -> "+", "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged` and `unencoded_qualifier_value_is_encoded`), - `characters_are_unencoded_where_allowed`: `<` should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine. - `explicit_characters_are_encoded`: `@` does not need to be encoded. When reading through that list and all the nuances, it makes it clear that we shouldn't do this ourselves! And this commit doesn't, it simply relies on the Go standard library to handle all of these cases correctly. All of the aforementioned issues should have test-cases, presumably added to the test-suite-data.
- Loading branch information
1 parent
d729287
commit 564b6fc
Showing
2 changed files
with
121 additions
and
147 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters