Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use url.URL to encode and decode PURLs #52

Merged
merged 1 commit into from
Jul 1, 2023

Conversation

tommyknows
Copy link
Contributor

@tommyknows tommyknows commented Jul 1, 2023

This commit refactors the ToString and FromString functions / methods to use the url.URL type, instead of trying to parse URLs directly.

The PURL Spec explicitly says:

A purl is a URL:
A purl is a valid URL and URI that conforms to the URL definitions or
specifications ...

So why not actually use Go's URL type to do operations on it?

This commit does exactly that, and removes a lot of the previously dense parsing-logic with simpler checks based on the URL's fields. Especially ToString() has gotten a lot simpler through that.

Additionaly, by switching to the URL package, this commit fixes a couple of outstanding bugs:

  • When a qualifier contained + signs, which are valid in URL paths, but not in URL queries, the sign did not get escaped. The previous code relied on url.PathUnescape to unescape keys and values, which should have used url.QueryUnescape instead. Further, encoding Qualifiers used url.PathEscape instead of url.QueryEscape. This could have led to pURLs losing + signs if they were properly encoded.
  • Fixes purl with name contain whitespace can not be built from purl string #51, where spaces in names have not gotten encoded correctly.
  • Fixes most test-cases from Fix inconsistent URL encoding #22 / Inconsistent URL encoding #23 that are round-trip-save (e.g. all cases where input == output), except the "pre-encoded qualifier value is unchanged" testcase that is wrong - a qualifier shouldn't be encoded with %20 for a space, but with a plus-sign (query encoding).
  • Fixes most cases from Cumulative fixes from anchore/packageurl-go #41 as well, except:
    • where the query encoding in the test-cases are wrong (" " -> "+", "+" -> "%20") (test-cases pre-encoded_qualifier_value_is_unchanged and unencoded_qualifier_value_is_encoded),
    • characters_are_unencoded_where_allowed: < should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine.
    • explicit_characters_are_encoded: @ does not need to be encoded.

When reading through that list and all the nuances, it makes it clear that we shouldn't do this ourselves! And this commit doesn't, it simply relies on the Go standard library to handle all of these cases correctly.

All of the aforementioned issues should have test-cases, presumably added to the test-suite-data.


You may notice that this code barely touches the tests. I'm not sure whether to add explicit test-cases to the Go implementation only, or if these should rather be added to the general test-suite. I'm happy to do either way - I've ensured that these changes work with all the tests from the issues & PRs mentioned above.

The only thing that does change to the test-suite is to use a multi-line matching for the JSON-data, so that not only {"key": "value"} qualifiers are valid, but also:

{
  "key": "value"
}

(not that it's currently needed, but when I added tests my editor automatically made multi-line strings out of these, so I needed to fix it 🙂)

@shibumi
Copy link
Collaborator

shibumi commented Jul 1, 2023

Hi @tommyknows 👋

First all thank you for your contribution. I totally agree with you here. If purls are considered URLs we should try using the go url package.

There are a few test cases failing, can you have a look at this?

@tommyknows
Copy link
Contributor Author

Hi @shibumi, thanks, should be fixed! Missed to lowercase the type before giving it to the normalisation-functions :)

This commit refactors the `ToString` and `FromString` functions /
methods to use the `url.URL` type, instead of trying to parse URLs
directly.

The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url)
explicitly says:

> A purl is a URL:
> A purl is a valid URL and URI that conforms to the URL definitions or
> specifications ...

So why not actually use Go's URL type to do operations on it?

This commit does exactly that, and removes a lot of the previously dense
parsing-logic with simpler checks based on the URL's fields. Especially
`ToString()` has gotten a lot simpler through that.

Additionaly, by switching to the URL package, this commit fixes a couple
of outstanding bugs:
- When a qualifier contained `+` signs, which are valid in URL paths,
  but not in URL queries, the sign did not get escaped. The previous
  code relied on `url.PathUnescape` to unescape keys and values, which
  should have used `url.QueryUnescape` instead. Further, encoding
  Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This
  could have led to pURLs losing `+` signs if they were properly
  encoded.
- Fixes package-url#51, where spaces in names have not gotten encoded correctly.
- Fixes most test-cases from package-url#22 that are round-trip-save (e.g. all
  cases where input == output), except the "pre-encoded qualifier value
  is unchanged" testcase that is wrong - a qualifier shouldn't be
  encoded with `%20` for a space, but with a plus-sign (query encoding).
- Fixes most cases from package-url#41 as well, except:
  -  where the query encoding in the test-cases are wrong (" " -> "+",
     "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged`
     and `unencoded_qualifier_value_is_encoded`),
  - `characters_are_unencoded_where_allowed`: `<` should be encoded, as
    far as I can tell. The Go stdlib also encodes it, so this should be
    fine.
  - `explicit_characters_are_encoded`: `@` does not need to be encoded.

When reading through that list and all the nuances, it makes it clear
that we shouldn't do this ourselves! And this commit doesn't, it simply
relies on the Go standard library to handle all of these cases
correctly.

All of the aforementioned issues should have test-cases, presumably
added to the test-suite-data.
packageurl.go Show resolved Hide resolved
packageurl.go Show resolved Hide resolved
packageurl_test.go Show resolved Hide resolved
@shibumi
Copy link
Collaborator

shibumi commented Jul 1, 2023

Added a few comments, mostly nit picks

@shibumi shibumi merged commit 564b6fc into package-url:master Jul 1, 2023
@shibumi
Copy link
Collaborator

shibumi commented Jul 1, 2023

Thank you for contribution @tommyknows , great work!! 💯

@tommyknows tommyknows mentioned this pull request Jul 11, 2023
another-rex referenced this pull request in google/osv-scanner Oct 11, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/google/go-cmp](https://github.com/google/go-cmp) |
require | minor | `v0.5.9` -> `v0.6.0` |
|
[github.com/jedib0t/go-pretty/v6](https://github.com/jedib0t/go-pretty)
| require | patch | `v6.4.7` -> `v6.4.8` |
|
[github.com/package-url/packageurl-go](https://github.com/package-url/packageurl-go)
| require | patch | `v0.1.1` -> `v0.1.2` |
| golang.org/x/exp | require | digest | `9212866` -> `7918f67` |
| golang.org/x/mod | require | minor | `v0.12.0` -> `v0.13.0` |
| golang.org/x/sync | require | minor | `v0.3.0` -> `v0.4.0` |
| golang.org/x/term | require | minor | `v0.12.0` -> `v0.13.0` |

---

### Release Notes

<details>
<summary>google/go-cmp (github.com/google/go-cmp)</summary>

### [`v0.6.0`](https://github.com/google/go-cmp/releases/tag/v0.6.0)

[Compare
Source](https://github.com/google/go-cmp/compare/v0.5.9...v0.6.0)

New API:

- ([#&#8203;340](https://github.com/google/go-cmp/issues/340)) Add
`cmpopts.EquateComparable`

Documentation changes:

- ([#&#8203;337](https://github.com/google/go-cmp/issues/337)) Use of
hotlinking of Go identifiers

Build changes:

- ([#&#8203;325](https://github.com/google/go-cmp/issues/325)) Remove
purego fallbacks

Testing changes:

- ([#&#8203;322](https://github.com/google/go-cmp/issues/322)) Run
tests for Go 1.20 version
- ([#&#8203;332](https://github.com/google/go-cmp/issues/332)) Pin
GitHub action versions
- ([#&#8203;327](https://github.com/google/go-cmp/issues/327)) set
workflow permission to read-only

</details>

<details>
<summary>jedib0t/go-pretty (github.com/jedib0t/go-pretty/v6)</summary>

###
[`v6.4.8`](https://github.com/jedib0t/go-pretty/releases/tag/v6.4.8)

[Compare
Source](https://github.com/jedib0t/go-pretty/compare/v6.4.7...v6.4.8)

### Features

-   **table**
- `RenderTSV()` to render table in TSV format
([#&#8203;277](https://github.com/jedib0t/go-pretty/issues/277)) //
thanks [@&#8203;rafiramadhana](https://github.com/rafiramadhana)

</details>

<details>
<summary>package-url/packageurl-go
(github.com/package-url/packageurl-go)</summary>

###
[`v0.1.2`](https://github.com/package-url/packageurl-go/releases/tag/v0.1.2)

[Compare
Source](https://github.com/package-url/packageurl-go/compare/v0.1.1...v0.1.2)

#### What's Changed

- Add Julia by [@&#8203;Octogonapus](https://github.com/Octogonapus)
in
[https://github.com/package-url/packageurl-go/pull/44](https://github.com/package-url/packageurl-go/pull/44)
- feat: add missing purl types by
[@&#8203;mcombuechen](https://github.com/mcombuechen) in
[https://github.com/package-url/packageurl-go/pull/43](https://github.com/package-url/packageurl-go/pull/43)
- Pull test data from upstream instead of maintaining a local copy by
[@&#8203;Octogonapus](https://github.com/Octogonapus) in
[https://github.com/package-url/packageurl-go/pull/49](https://github.com/package-url/packageurl-go/pull/49)
- Add simple fuzz test by
[@&#8203;imjasonh](https://github.com/imjasonh) in
[https://github.com/package-url/packageurl-go/pull/34](https://github.com/package-url/packageurl-go/pull/34)
- Test using supported Go versions by
[@&#8203;imjasonh](https://github.com/imjasonh) in
[https://github.com/package-url/packageurl-go/pull/50](https://github.com/package-url/packageurl-go/pull/50)
- Remove deprecated usage of ioutil by
[@&#8203;noqcks](https://github.com/noqcks) in
[https://github.com/package-url/packageurl-go/pull/40](https://github.com/package-url/packageurl-go/pull/40)
- fix: use url.URL to encode and decode PURLs by
[@&#8203;tommyknows](https://github.com/tommyknows) in
[https://github.com/package-url/packageurl-go/pull/52](https://github.com/package-url/packageurl-go/pull/52)
- fix: escape and unescape name by
[@&#8203;tommyknows](https://github.com/tommyknows) in
[https://github.com/package-url/packageurl-go/pull/55](https://github.com/package-url/packageurl-go/pull/55)
- fix: escape everything with modified QueryEscape by
[@&#8203;tommyknows](https://github.com/tommyknows) in
[https://github.com/package-url/packageurl-go/pull/58](https://github.com/package-url/packageurl-go/pull/58)
- Add `pub` and `bitnami` types by
[@&#8203;antgamdia](https://github.com/antgamdia) in
[https://github.com/package-url/packageurl-go/pull/60](https://github.com/package-url/packageurl-go/pull/60)
- Add known types and candidate types by
[@&#8203;antgamdia](https://github.com/antgamdia) in
[https://github.com/package-url/packageurl-go/pull/61](https://github.com/package-url/packageurl-go/pull/61)
- Add PackageURL.Normalize by
[@&#8203;wetterjames4](https://github.com/wetterjames4) in
[https://github.com/package-url/packageurl-go/pull/65](https://github.com/package-url/packageurl-go/pull/65)

#### New Contributors

- [@&#8203;mcombuechen](https://github.com/mcombuechen) made their
first contribution in
[https://github.com/package-url/packageurl-go/pull/43](https://github.com/package-url/packageurl-go/pull/43)
- [@&#8203;imjasonh](https://github.com/imjasonh) made their first
contribution in
[https://github.com/package-url/packageurl-go/pull/34](https://github.com/package-url/packageurl-go/pull/34)
- [@&#8203;noqcks](https://github.com/noqcks) made their first
contribution in
[https://github.com/package-url/packageurl-go/pull/40](https://github.com/package-url/packageurl-go/pull/40)
- [@&#8203;tommyknows](https://github.com/tommyknows) made their first
contribution in
[https://github.com/package-url/packageurl-go/pull/52](https://github.com/package-url/packageurl-go/pull/52)
- [@&#8203;antgamdia](https://github.com/antgamdia) made their first
contribution in
[https://github.com/package-url/packageurl-go/pull/60](https://github.com/package-url/packageurl-go/pull/60)
- [@&#8203;wetterjames4](https://github.com/wetterjames4) made their
first contribution in
[https://github.com/package-url/packageurl-go/pull/65](https://github.com/package-url/packageurl-go/pull/65)

**Full Changelog**:
package-url/packageurl-go@v0.1.1...v0.1.2

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 6am on monday" in timezone
Australia/Sydney, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get
[config help](https://github.com/renovatebot/renovate/discussions) if
that's undesired.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/google/osv-scanner).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4wLjMiLCJ1cGRhdGVkSW5WZXIiOiIzNy44LjEiLCJ0YXJnZXRCcmFuY2giOiJtYWluIn0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

purl with name contain whitespace can not be built from purl string
2 participants