Percent encoding does not round-trip #523

karwa · 2020-06-02T12:37:34Z

Hi!

I'm trying to implement the current version of the spec in Swift (for non-browser applications. The idea is that it's useful to have a URL type that behaves as your browser behaves and accepts/rejects the same things).

One issue that I've noticed is that the current definition of the "UTF8 percent encode" algorithm doesn't round-trip for strings which contain the percent character. For example, following the algorithm, the string "%100" (UTF8: [37, 49, 48, 48]) is not changed at all when encoding (regardless or the percent-encoding set; none of them contain the "%" character itself). However, decoding that same string using the decoding algorithm in the spec results in the UTF8 sequence [16, 48], or "0" (ASCII 0x10 is the unprintable "data link escape" character).

RFC-3986 warns about this:

Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI.

Indeed, the encodeURIComponent JS function encodes the percent character:

> encodeURIComponent("%100")
"%25100"

I was ready to submit a PR to have the spec's algorithm also do this, but it appears there is an explicit test that percent characters are not escaped (in this case, in the URL's username component): https://github.com/web-platform-tests/wpt/blob/master/url/resources/urltestdata.json#L2372

    "input": "http://%25DOMAIN:foobar@foodomain.com/",
    "base": "about:blank",
    "href": "http://%25DOMAIN:foobar@foodomain.com/",
    "origin": "http://foodomain.com",
    "protocol": "http:",
    "username": "%25DOMAIN",   // <--- I would expect this to be "%2525DOMAIN"

I'm not sure if this is correct. Usernames are required to be escaped (meaning they must be unescaped to recover their original value), as in the following example:

> new URL("http://;DOMAIN:foobar@foodomain.com/")
...
username: "%3BDOMAIN" // <--- Semicolon is escaped as %3B

However, unescaping the string "%25DOMAIN" as the test expects would not recover the original and result in a different username - "%DOMAIN".

Can anybody confirm the correct behaviour? I think we should be escaping the % character, and that the test is incorrect. If that's not the case (and the current behaviour is correct), this could use a note or warning in the spec.

The text was updated successfully, but these errors were encountered:

annevk · 2020-06-02T13:00:45Z

See #513 and #518 for some upcoming changes in this area. They don't really change the status quo, but do add percent-encode sets that include U+0025, including one that is an exact match for the one in JavaScript.

And I guess we could add a note that it really depends on the caller which of the various percent-encode sets is safe to use.

domenic · 2020-06-02T13:15:06Z

I can confirm that the spec and tests are correct, i.e. they match browsers. See https://jsdom.github.io/whatwg-url/#url=aHR0cDovLyUyNURPTUFJTjpmb29iYXJAZm9vZG9tYWluLmNvbS8=&base=YWJvdXQ6Ymxhbms=

Also add some <div algorithm> wrappers to this section. Closes #523.

annevk · 2022-12-19T15:59:19Z

For clarity, if your username is %25DOMAIN you would have to percent-encode it before stuffing it in the URL: %2525DOMAIN. The URL parser will not percent-encode % for you. #727 attempts to clarify this. (My example doesn't use username, but the same principle applies.)

Also add some <div algorithm> wrappers to this section. Closes #523.

annevk added the clarification Standard could be clearer label Oct 20, 2021

annevk added a commit that referenced this issue Dec 16, 2022

Clarify that percent-encoding often does not roundtrip

3efcb38

Also add some <div algorithm> wrappers to this section. Closes #523.

annevk mentioned this issue Dec 16, 2022

Clarify that percent-encoding often does not roundtrip #727

Merged

annevk closed this as completed in #727 Dec 20, 2022

annevk added a commit that referenced this issue Dec 20, 2022

Clarify that percent-encoding does not always roundtrip

a05ee27

Also add some <div algorithm> wrappers to this section. Closes #523.

lemire mentioned this issue Mar 2, 2023

Additional tests related to pathToFileURL ada-url/ada#248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percent encoding does not round-trip #523

Percent encoding does not round-trip #523

karwa commented Jun 2, 2020 •

edited

Loading

annevk commented Jun 2, 2020

domenic commented Jun 2, 2020

annevk commented Dec 19, 2022

Percent encoding does not round-trip #523

Percent encoding does not round-trip #523

Comments

karwa commented Jun 2, 2020 • edited Loading

annevk commented Jun 2, 2020

domenic commented Jun 2, 2020

annevk commented Dec 19, 2022

karwa commented Jun 2, 2020 •

edited

Loading