-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cumulative fixes from anchore/packageurl-go #41
Cumulative fixes from anchore/packageurl-go #41
Conversation
Fixes three issues with the encoding/decoding of strings: 1. Namespace segments are escaped as paths instead of queries to match how they are unescaped. 2. Subpath segments are escaped to match how they are unescaped. 3. Names are unescaped to match how they are escaped. A test is added to ensure this is enforced for all components requiring encoding. Five tests cases failed previous to the change: 1. unencoded_namespace_segment_is_encoded: expected pkg:type/name/spac e/name@version?key=value#sub/path to parse as pkg:type/name/spac%20e/name@version?key=value#sub/path but got pkg:type/name/spac+e/name@version?key=value#sub/path 2. pre-encoded_namespace_segment_is_unchanged: expected pkg:type/name/spac%20e/name@version?key=value#sub/path to parse as pkg:type/name/spac%20e/name@version?key=value#sub/path but got pkg:type/name/spac+e/name@version?key=value#sub/path 3. pre-encoded_name_is_unchanged: expected pkg:type/name/space/nam%20e@version?key=value#sub/path to parse as pkg:type/name/space/nam%20e@version?key=value#sub/path but got pkg:type/name/space/nam%2520e@version?key=value#sub/path 4. unencoded_subpath_segment_is_encoded: expected pkg:type/name/space/name@version?key=value#sub/pat h to parse as pkg:type/name/space/name@version?key=value#sub/pat%20h but got pkg:type/name/space/name@version?key=value#sub/pat h 5. pre-encoded_subpath_segment_is_unchanged: expected pkg:type/name/space/name@version?key=value#sub/pat%20h to parse as pkg:type/name/space/name@version?key=value#sub/pat%20h but got pkg:type/name/space/name@version?key=value#sub/pat h Signed-off-by: Keith Zantow <kzantow@gmail.com>
Fix inconsistent URL encoding
Signed-off-by: Dan Luhring <dan.luhring@anchore.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Eric Larssen <eric.larssen@workiva.com>
Signed-off-by: houdini91 <mdstrauss91@gmail.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Benji Visser <benji@093b.org>
Signed-off-by: Benji Visser <benji@093b.org>
Signed-off-by: Benji Visser <benji@093b.org>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
…upstream-update Signed-off-by: Keith Zantow <kzantow@gmail.com>
@@ -14,8 +14,6 @@ jobs: | |||
go-version: ${{ matrix.go-version }} | |||
- name: Checkout code | |||
uses: actions/checkout@v2 | |||
- name: Download test data | |||
run: curl -L https://raw.githubusercontent.com/package-url/purl-spec/master/test-suite-data.json -o testdata/test-suite-data.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this test data download is required, because of a previous issue.
Looks like your PR is not rebased. Any chance you can rebase that on the current main branch? The differences shouldn't be so big.
@kzantow sorry for the delay, the project had no active maintainer for a while. I took over. Can you rebase your PR? I see a few changes that are definitely on main now. |
This commit refactors the `ToString` and `FromString` functions / methods to use the `url.URL` type, instead of trying to parse URLs directly. The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url) explicitly says: > A purl is a URL: > A purl is a valid URL and URI that conforms to the URL definitions or > specifications ... So why not actually use Go's URL type to do operations on it? This commit does exactly that, and removes a lot of the previously dense parsing-logic with simpler checks based on the URL's fields. Especially `ToString()` has gotten a lot simpler through that. Additionaly, by switching to the URL package, this commit fixes a couple of outstanding bugs: - When a qualifier contained `+` signs, which are valid in URL paths, but not in URL queries, the sign did not get escaped. The previous code relied on `url.PathUnescape` to unescape keys and values, which should have used `url.QueryUnescape` instead. Further, encoding Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This could have led to pURLs losing `+` signs if they were properly encoded. - Fixes package-url#51, where spaces in names have not gotten encoded correctly. - Fixes most test-cases from package-url#22 that are round-trip-save (e.g. all cases where input == output), except the "pre-encoded qualifier value is unchanged" testcase that is wrong - a qualifier shouldn't be encoded with `%20` for a space, but with a plus-sign (query encoding). - Fixes most cases from package-url#41 as well, except: - where the query encoding in the test-cases are wrong (" " -> "+", "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged` and `unencoded_qualifier_value_is_encoded`), - `characters_are_unencoded_where_allowed`: `<` should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine. - `explicit_characters_are_encoded`: `@` does not need to be encoded. When reading through that list and all the nuances, it makes it clear that we shouldn't do this ourselves! And this commit doesn't, it simply relies on the Go standard library to handle all of these cases correctly. All of the aforementioned issues should have test-cases, presumably added to the test-suite-data.
This commit refactors the `ToString` and `FromString` functions / methods to use the `url.URL` type, instead of trying to parse URLs directly. The [PURL Spec](https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#a-purl-is-a-url) explicitly says: > A purl is a URL: > A purl is a valid URL and URI that conforms to the URL definitions or > specifications ... So why not actually use Go's URL type to do operations on it? This commit does exactly that, and removes a lot of the previously dense parsing-logic with simpler checks based on the URL's fields. Especially `ToString()` has gotten a lot simpler through that. Additionaly, by switching to the URL package, this commit fixes a couple of outstanding bugs: - When a qualifier contained `+` signs, which are valid in URL paths, but not in URL queries, the sign did not get escaped. The previous code relied on `url.PathUnescape` to unescape keys and values, which should have used `url.QueryUnescape` instead. Further, encoding Qualifiers used `url.PathEscape` instead of `url.QueryEscape`. This could have led to pURLs losing `+` signs if they were properly encoded. - Fixes #51, where spaces in names have not gotten encoded correctly. - Fixes most test-cases from #22 that are round-trip-save (e.g. all cases where input == output), except the "pre-encoded qualifier value is unchanged" testcase that is wrong - a qualifier shouldn't be encoded with `%20` for a space, but with a plus-sign (query encoding). - Fixes most cases from #41 as well, except: - where the query encoding in the test-cases are wrong (" " -> "+", "+" -> "%20") (test-cases `pre-encoded_qualifier_value_is_unchanged` and `unencoded_qualifier_value_is_encoded`), - `characters_are_unencoded_where_allowed`: `<` should be encoded, as far as I can tell. The Go stdlib also encodes it, so this should be fine. - `explicit_characters_are_encoded`: `@` does not need to be encoded. When reading through that list and all the nuances, it makes it clear that we shouldn't do this ourselves! And this commit doesn't, it simply relies on the Go standard library to handle all of these cases correctly. All of the aforementioned issues should have test-cases, presumably added to the test-suite-data.
I‘ve made some refactors to the encoder and decoder in #52, so I hope most of the fixes in this PR shouldn‘t be necessary anymore. For reference, I‘ve tested your testcases in my PR as well, and wrote the following in my PR about it:
Let me know if you have any questions around these! :) (I‘m not a maintainer of this project, but happy to help!) |
// TypeAlpine is a pkg:apk purl. | ||
TypeAlpine = "apk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TypeAlpine is a pkg:apk purl. | |
TypeAlpine = "apk" |
Upstream has TypeApk
now. (Also note that "apk" doesn't specifically imply "Alpine", since there are several distros based on APK.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point! We should probably rename TypeAlpine to TypeAPK to make this clearer.
@@ -19,18 +19,16 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |||
SOFTWARE. | |||
*/ | |||
package packageurl_test | |||
package packageurl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it a good practice to favor implementing tests in a separate "_test" package by default. The constraint encourages writing tests against the public interface resulting in tests more resilient to underlying changes in implementation and focused on assertions of exported/exposed observable behavior.
https://pkg.go.dev/testing
If the file is in a separate "_test" package, the package being tested must be imported explicitly and only its exported identifiers may be used. This is known as "black box" testing.
Let's close this PR, because parts of it have been implemented in separate PRs. Thank you for the great ideas, though! |
This PR is a cumulative set of changes Anchore has made to this library.
To review, I would suggest viewing the files changed rather than commits individually -- there are not a large number of changes. The primary things here are:
pathEscape
function which adheres to the rules in the PURL specWe would be very happy to avoid maintaining a fork -- if there is anything we can help with, reviews or otherwise here, we would be happy to do so!