Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cumulative fixes from anchore/packageurl-go #41

Closed

Conversation

kzantow
Copy link

@kzantow kzantow commented Jan 4, 2023

This PR is a cumulative set of changes Anchore has made to this library.

To review, I would suggest viewing the files changed rather than commits individually -- there are not a large number of changes. The primary things here are:

  • multiple types added based on updates to the PURL spec
  • a custom pathEscape function which adheres to the rules in the PURL spec
  • a number of added tests for PURL encoding
  • fixes so test run and pass without downloading a moving test file target

We would be very happy to avoid maintaining a fork -- if there is anything we can help with, reviews or otherwise here, we would be happy to do so!

halinds and others added 26 commits August 12, 2021 14:52
Fixes three issues with the encoding/decoding of strings:
1. Namespace segments are escaped as paths instead of queries to match
   how they are unescaped.
2. Subpath segments are escaped to match how they are unescaped.
3. Names are unescaped to match how they are escaped.

A test is added to ensure this is enforced for all components requiring
encoding. Five tests cases failed previous to the change:
1. unencoded_namespace_segment_is_encoded:
   expected    pkg:type/name/spac e/name@version?key=value#sub/path
   to parse as pkg:type/name/spac%20e/name@version?key=value#sub/path
   but got     pkg:type/name/spac+e/name@version?key=value#sub/path
2. pre-encoded_namespace_segment_is_unchanged:
   expected    pkg:type/name/spac%20e/name@version?key=value#sub/path
   to parse as pkg:type/name/spac%20e/name@version?key=value#sub/path
   but got     pkg:type/name/spac+e/name@version?key=value#sub/path
3. pre-encoded_name_is_unchanged:
   expected    pkg:type/name/space/nam%20e@version?key=value#sub/path
   to parse as pkg:type/name/space/nam%20e@version?key=value#sub/path
   but got     pkg:type/name/space/nam%2520e@version?key=value#sub/path
4. unencoded_subpath_segment_is_encoded:
   expected    pkg:type/name/space/name@version?key=value#sub/pat h
   to parse as pkg:type/name/space/name@version?key=value#sub/pat%20h
   but got     pkg:type/name/space/name@version?key=value#sub/pat h
5. pre-encoded_subpath_segment_is_unchanged:
   expected    pkg:type/name/space/name@version?key=value#sub/pat%20h
   to parse as pkg:type/name/space/name@version?key=value#sub/pat%20h
   but got     pkg:type/name/space/name@version?key=value#sub/pat h

Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Dan Luhring <dan.luhring@anchore.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Eric Larssen <eric.larssen@workiva.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Benji Visser <benji@093b.org>
Signed-off-by: Benji Visser <benji@093b.org>
Signed-off-by: Benji Visser <benji@093b.org>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
…upstream-update

Signed-off-by: Keith Zantow <kzantow@gmail.com>
@@ -14,8 +14,6 @@ jobs:
go-version: ${{ matrix.go-version }}
- name: Checkout code
uses: actions/checkout@v2
- name: Download test data
run: curl -L https://raw.githubusercontent.com/package-url/purl-spec/master/test-suite-data.json -o testdata/test-suite-data.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this test data download is required, because of a previous issue.

Looks like your PR is not rebased. Any chance you can rebase that on the current main branch? The differences shouldn't be so big.

@shibumi
Copy link
Collaborator

shibumi commented Jun 28, 2023

@kzantow sorry for the delay, the project had no active maintainer for a while. I took over.

Can you rebase your PR? I see a few changes that are definitely on main now.

tommyknows added a commit to tommyknows/packageurl-go that referenced this pull request Jul 1, 2023
This commit refactors the `ToString` and `FromString` functions /
methods to use the `url.URL` type, instead of trying to parse URLs
directly.

The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url)
explicitly says:

> A purl is a URL:
> A purl is a valid URL and URI that conforms to the URL definitions or
> specifications ...

So why not actually use Go's URL type to do operations on it?

This commit does exactly that, and removes a lot of the previously dense
parsing-logic with simpler checks based on the URL's fields. Especially
`ToString()` has gotten a lot simpler through that.

Additionaly, by switching to the URL package, this commit fixes a couple
of outstanding bugs:
- When a qualifier contained `+` signs, which are valid in URL paths,
  but not in URL queries, the sign did not get escaped. The previous
  code relied on `url.PathUnescape` to unescape keys and values, which
  should have used `url.QueryUnescape` instead. Further, encoding
  Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This
  could have led to pURLs losing `+` signs if they were properly
  encoded.
- Fixes package-url#51, where spaces in names have not gotten encoded correctly.
- Fixes most test-cases from package-url#22 that are round-trip-save (e.g. all
  cases where input == output), except the "pre-encoded qualifier value
  is unchanged" testcase that is wrong - a qualifier shouldn't be
  encoded with `%20` for a space, but with a plus-sign (query encoding).
- Fixes most cases from package-url#41 as well, except:
  -  where the query encoding in the test-cases are wrong (" " -> "+",
     "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged`
     and `unencoded_qualifier_value_is_encoded`),
  - `characters_are_unencoded_where_allowed`: `<` should be encoded, as
    far as I can tell. The Go stdlib also encodes it, so this should be
    fine.
  - `explicit_characters_are_encoded`: `@` does not need to be encoded.

When reading through that list and all the nuances, it makes it clear
that we shouldn't do this ourselves! And this commit doesn't, it simply
relies on the Go standard library to handle all of these cases
correctly.

All of the aforementioned issues should have test-cases, presumably
added to the test-suite-data.
shibumi pushed a commit that referenced this pull request Jul 1, 2023
This commit refactors the `ToString` and `FromString` functions /
methods to use the `url.URL` type, instead of trying to parse URLs
directly.

The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url)
explicitly says:

> A purl is a URL:
> A purl is a valid URL and URI that conforms to the URL definitions or
> specifications ...

So why not actually use Go's URL type to do operations on it?

This commit does exactly that, and removes a lot of the previously dense
parsing-logic with simpler checks based on the URL's fields. Especially
`ToString()` has gotten a lot simpler through that.

Additionaly, by switching to the URL package, this commit fixes a couple
of outstanding bugs:
- When a qualifier contained `+` signs, which are valid in URL paths,
  but not in URL queries, the sign did not get escaped. The previous
  code relied on `url.PathUnescape` to unescape keys and values, which
  should have used `url.QueryUnescape` instead. Further, encoding
  Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This
  could have led to pURLs losing `+` signs if they were properly
  encoded.
- Fixes #51, where spaces in names have not gotten encoded correctly.
- Fixes most test-cases from #22 that are round-trip-save (e.g. all
  cases where input == output), except the "pre-encoded qualifier value
  is unchanged" testcase that is wrong - a qualifier shouldn't be
  encoded with `%20` for a space, but with a plus-sign (query encoding).
- Fixes most cases from #41 as well, except:
  -  where the query encoding in the test-cases are wrong (" " -> "+",
     "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged`
     and `unencoded_qualifier_value_is_encoded`),
  - `characters_are_unencoded_where_allowed`: `<` should be encoded, as
    far as I can tell. The Go stdlib also encodes it, so this should be
    fine.
  - `explicit_characters_are_encoded`: `@` does not need to be encoded.

When reading through that list and all the nuances, it makes it clear
that we shouldn't do this ourselves! And this commit doesn't, it simply
relies on the Go standard library to handle all of these cases
correctly.

All of the aforementioned issues should have test-cases, presumably
added to the test-suite-data.
@tommyknows
Copy link
Contributor

I‘ve made some refactors to the encoder and decoder in #52, so I hope most of the fixes in this PR shouldn‘t be necessary anymore. For reference, I‘ve tested your testcases in my PR as well, and wrote the following in my PR about it:

Fixes most cases from #41 as well, except:

  • where the query encoding in the test-cases are wrong (" " -> "+", "+" -> "%20") (test-cases pre-encoded_qualifier_value_is_unchanged and unencoded_qualifier_value_is_encoded),
  • characters_are_unencoded_where_allowed: < should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine.
  • explicit_characters_are_encoded: @ does not need to be encoded.

Let me know if you have any questions around these! :) (I‘m not a maintainer of this project, but happy to help!)

Comment on lines +49 to +50
// TypeAlpine is a pkg:apk purl.
TypeAlpine = "apk"
Copy link

@luhring luhring Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TypeAlpine is a pkg:apk purl.
TypeAlpine = "apk"

Upstream has TypeApk now. (Also note that "apk" doesn't specifically imply "Alpine", since there are several distros based on APK.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! We should probably rename TypeAlpine to TypeAPK to make this clearer.

@@ -19,18 +19,16 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/
package packageurl_test
package packageurl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it a good practice to favor implementing tests in a separate "_test" package by default. The constraint encourages writing tests against the public interface resulting in tests more resilient to underlying changes in implementation and focused on assertions of exported/exposed observable behavior.
https://pkg.go.dev/testing

If the file is in a separate "_test" package, the package being tested must be imported explicitly and only its exported identifiers may be used. This is known as "black box" testing.

@shibumi
Copy link
Collaborator

shibumi commented Sep 8, 2023

Let's close this PR, because parts of it have been implemented in separate PRs. Thank you for the great ideas, though!

@shibumi shibumi closed this Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.