Percent-encoding of query wrong for ISO-2022-JP #557

annevk · 2020-10-19T09:16:14Z

If you look at the number of state transitions (hint: count %1B instances) in the example below you can see that invoking the encoder code-point-by-code-point does not match implementations.

 function encode(input, output, desc) {
   test(function() {
     var a = document.createElement("a") // <a> uses document encoding for URL's query
     a.href = "https://example.com/?" + input
     assert_equals(a.search.substr(1), output) // remove leading "?"
   }, "iso-2022-jp encoder: " + desc)
 }

 encode("\uFF90\u4F69", "%1B$B%_PP%1B(B");
 encode("\uFF90a\u4F69", "%1B$B%_%1B(Ba%1B$BPP%1B(B");

There might be a related thing here, discovered in web-platform-tests/wpt#26158 by @JKingweb, that implementations skip invoking encoders if the input is entirely in the ASCII range. (Again, something that uniquely applies to ISO-2022-JP. Form submission should be checked here as well before deciding on something.)

The text was updated successfully, but these errors were encountered:

achristensen07 · 2020-10-19T15:05:51Z

We effectively put the code points into a buffer when parsing them then call the encoder once on the whole buffer when transitioning from query state to fragment state or hitting the "EOF code point". This makes more sense to me than the current spec when using ISO-2022-JP.

achristensen07 · 2020-10-19T17:31:07Z

Alternatively, encoding as a stream should fix this, right?

annevk · 2020-10-20T09:44:01Z

I think we should probably do both. I think we want to use a buffer so the URL parser doesn't itself have to keep an encoder alive, but then we do need to process things code-point-for-code-point to do error handling correctly. (I could see an argument for letting the URL parser hold onto an encoder though and passing that into percent-encoding and friends rather than an encoding.)

I filed whatwg/encoding#235 for Encoding standard cleanup as whatever we do here we can no longer use the encode hook it provides.

(Edit: I'm not longer sure about this either as it seems it wouldn't reset ISO-2022-JP encoder state correctly upon encountering an error. As mentioned in the Encoding issue we probably need to adjust the encoder.)

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Depends on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.

annevk added the topic: parser label Oct 19, 2020

annevk mentioned this issue Oct 19, 2020

Encoding: ISO-2022-JP encoder "SO/SI ESC" test web-platform-tests/wpt#26158

Merged

annevk mentioned this issue Oct 20, 2020

Encode hook not adequate whatwg/encoding#235

Closed

annevk closed this as completed in 2ce4938 Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percent-encoding of query wrong for ISO-2022-JP #557

Percent-encoding of query wrong for ISO-2022-JP #557

annevk commented Oct 19, 2020

achristensen07 commented Oct 19, 2020

achristensen07 commented Oct 19, 2020

annevk commented Oct 20, 2020 •

edited

Loading

Percent-encoding of query wrong for ISO-2022-JP #557

Percent-encoding of query wrong for ISO-2022-JP #557

Comments

annevk commented Oct 19, 2020

achristensen07 commented Oct 19, 2020

achristensen07 commented Oct 19, 2020

annevk commented Oct 20, 2020 • edited Loading

annevk commented Oct 20, 2020 •

edited

Loading