Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percent-encoding of query wrong for ISO-2022-JP #557

Closed
annevk opened this issue Oct 19, 2020 · 3 comments
Closed

Percent-encoding of query wrong for ISO-2022-JP #557

annevk opened this issue Oct 19, 2020 · 3 comments

Comments

@annevk
Copy link
Member

annevk commented Oct 19, 2020

If you look at the number of state transitions (hint: count %1B instances) in the example below you can see that invoking the encoder code-point-by-code-point does not match implementations.

 function encode(input, output, desc) {
   test(function() {
     var a = document.createElement("a") // <a> uses document encoding for URL's query
     a.href = "https://example.com/?" + input
     assert_equals(a.search.substr(1), output) // remove leading "?"
   }, "iso-2022-jp encoder: " + desc)
 }

 encode("\uFF90\u4F69", "%1B$B%_PP%1B(B");
 encode("\uFF90a\u4F69", "%1B$B%_%1B(Ba%1B$BPP%1B(B");

There might be a related thing here, discovered in web-platform-tests/wpt#26158 by @JKingweb, that implementations skip invoking encoders if the input is entirely in the ASCII range. (Again, something that uniquely applies to ISO-2022-JP. Form submission should be checked here as well before deciding on something.)

@achristensen07
Copy link
Collaborator

We effectively put the code points into a buffer when parsing them then call the encoder once on the whole buffer when transitioning from query state to fragment state or hitting the "EOF code point". This makes more sense to me than the current spec when using ISO-2022-JP.

@achristensen07
Copy link
Collaborator

Alternatively, encoding as a stream should fix this, right?

@annevk
Copy link
Member Author

annevk commented Oct 20, 2020

I think we should probably do both. I think we want to use a buffer so the URL parser doesn't itself have to keep an encoder alive, but then we do need to process things code-point-for-code-point to do error handling correctly. (I could see an argument for letting the URL parser hold onto an encoder though and passing that into percent-encoding and friends rather than an encoding.)

I filed whatwg/encoding#235 for Encoding standard cleanup as whatever we do here we can no longer use the encode hook it provides.

(Edit: I'm not longer sure about this either as it seems it wouldn't reset ISO-2022-JP encoder state correctly upon encountering an error. As mentioned in the Encoding issue we probably need to adjust the encoder.)

annevk added a commit that referenced this issue Oct 21, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Depends on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
annevk added a commit that referenced this issue Oct 28, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
annevk added a commit that referenced this issue Oct 28, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
annevk added a commit that referenced this issue Nov 2, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317.

Fixes #557.
@annevk annevk closed this as completed in 2ce4938 Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants