Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purl with name contain whitespace can not be built from purl string #51

Closed
agfn opened this issue Jun 28, 2023 · 0 comments · Fixed by #52
Closed

purl with name contain whitespace can not be built from purl string #51

agfn opened this issue Jun 28, 2023 · 0 comments · Fixed by #52
Assignees
Labels

Comments

@agfn
Copy link

agfn commented Jun 28, 2023

import (
	"testing"

	"github.com/package-url/packageurl-go"
)

func TestPURL(t *testing.T) {
	p := packageurl.NewPackageURL("generic", "", "a b", "", nil, "")
	u := p.ToString()
	got, err := packageurl.FromString(u)
	t.Log(err)              // <nil>
	t.Log(p.Name, got.Name) // a b a+b
}
@shibumi shibumi added the bug label Jun 28, 2023
@shibumi shibumi self-assigned this Jun 28, 2023
tommyknows added a commit to tommyknows/packageurl-go that referenced this issue Jul 1, 2023
This commit refactors the `ToString` and `FromString` functions /
methods to use the `url.URL` type, instead of trying to parse URLs
directly.

The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url)
explicitly says:

> A purl is a URL:
> A purl is a valid URL and URI that conforms to the URL definitions or
> specifications ...

So why not actually use Go's URL type to do operations on it?

This commit does exactly that, and removes a lot of the previously dense
parsing-logic with simpler checks based on the URL's fields. Especially
`ToString()` has gotten a lot simpler through that.

Additionaly, by switching to the URL package, this commit fixes a couple
of outstanding bugs:
- When a qualifier contained `+` signs, which are valid in URL paths,
  but not in URL queries, the sign did not get escaped. The previous
  code relied on `url.PathUnescape` to unescape keys and values, which
  should have used `url.QueryUnescape` instead. Further, encoding
  Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This
  could have led to pURLs losing `+` signs if they were properly
  encoded.
- Fixes package-url#51, where spaces in names have not gotten encoded correctly.
- Fixes most test-cases from package-url#22 that are round-trip-save (e.g. all
  cases where input == output), except the "pre-encoded qualifier value
  is unchanged" testcase that is wrong - a qualifier shouldn't be
  encoded with `%20` for a space, but with a plus-sign (query encoding).
- Fixes most cases from package-url#41 as well, except:
  -  where the query encoding in the test-cases are wrong (" " -> "+",
     "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged`
     and `unencoded_qualifier_value_is_encoded`),
  - `characters_are_unencoded_where_allowed`: `<` should be encoded, as
    far as I can tell. The Go stdlib also encodes it, so this should be
    fine.
  - `explicit_characters_are_encoded`: `@` does not need to be encoded.

When reading through that list and all the nuances, it makes it clear
that we shouldn't do this ourselves! And this commit doesn't, it simply
relies on the Go standard library to handle all of these cases
correctly.

All of the aforementioned issues should have test-cases, presumably
added to the test-suite-data.
shibumi pushed a commit that referenced this issue Jul 1, 2023
This commit refactors the `ToString` and `FromString` functions /
methods to use the `url.URL` type, instead of trying to parse URLs
directly.

The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url)
explicitly says:

> A purl is a URL:
> A purl is a valid URL and URI that conforms to the URL definitions or
> specifications ...

So why not actually use Go's URL type to do operations on it?

This commit does exactly that, and removes a lot of the previously dense
parsing-logic with simpler checks based on the URL's fields. Especially
`ToString()` has gotten a lot simpler through that.

Additionaly, by switching to the URL package, this commit fixes a couple
of outstanding bugs:
- When a qualifier contained `+` signs, which are valid in URL paths,
  but not in URL queries, the sign did not get escaped. The previous
  code relied on `url.PathUnescape` to unescape keys and values, which
  should have used `url.QueryUnescape` instead. Further, encoding
  Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This
  could have led to pURLs losing `+` signs if they were properly
  encoded.
- Fixes #51, where spaces in names have not gotten encoded correctly.
- Fixes most test-cases from #22 that are round-trip-save (e.g. all
  cases where input == output), except the "pre-encoded qualifier value
  is unchanged" testcase that is wrong - a qualifier shouldn't be
  encoded with `%20` for a space, but with a plus-sign (query encoding).
- Fixes most cases from #41 as well, except:
  -  where the query encoding in the test-cases are wrong (" " -> "+",
     "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged`
     and `unencoded_qualifier_value_is_encoded`),
  - `characters_are_unencoded_where_allowed`: `<` should be encoded, as
    far as I can tell. The Go stdlib also encodes it, so this should be
    fine.
  - `explicit_characters_are_encoded`: `@` does not need to be encoded.

When reading through that list and all the nuances, it makes it clear
that we shouldn't do this ourselves! And this commit doesn't, it simply
relies on the Go standard library to handle all of these cases
correctly.

All of the aforementioned issues should have test-cases, presumably
added to the test-suite-data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants