From 22e8bd3b5f81b9c10fcc8f0a5f3526007b7f9ead Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 21 Oct 2020 14:37:50 +0200 Subject: [PATCH] Fix percent-encoding for ISO-2022-JP Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: https://github.com/whatwg/encoding/pull/238. Tests: https://github.com/web-platform-tests/wpt/pull/26158. Fixes #557. --- url.bs | 95 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 42 insertions(+), 53 deletions(-) diff --git a/url.bs b/url.bs index b66bb1e1..065ef6ff 100644 --- a/url.bs +++ b/url.bs @@ -217,67 +217,56 @@ inclusive, and U+007E (~). all code points, except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and U+005F (_). -

To percent-encode after encoding, given an encoding -encoding, code point codePoint, and a -percentEncodeSet, run these steps: +

To percent-encode after encoding, given an encoding +encoding, string input, a percentEncodeSet, and an +optional boolean spaceAsPlus (default false), run these steps:

    -
  1. Let bytes be the result of encoding codePoint using - encoding. +

  2. Let encoder be the result of getting an encoder from encoding. -

  3. -

    If bytes starts with 0x26 (&) 0x23 (#) and ends with 0x3B (;), then: - -

      -
    1. Let output be bytes, isomorphic decoded. +

    2. Let inputQueue be input onverted to an I/O queue. -

    3. Replace the first two code points of output with "%26%23". - -

    4. Replace the last code point of output with "%3B". - -

    5. Return output. -

    +
  4. Let output be the empty string. -

    This can happen when encoding is not UTF-8. +

  5. +

    Let potentialError be 0. -

  6. Let output be the empty string.

  7. +

    This needs to be a non-null value to initiate the subsequent while loop.

  8. -

    For each byte of bytes: +

    While potentialError is non-null:

      -
    1. Let isomorph be a code point whose value - is byte's value. +

    2. Let encodeOutput be an empty I/O queue. -

    3. Assert: percentEncodeSet includes all non-ASCII code points. +

    4. Set potentialError to the result of running encode or fail with + inputQueue, encoder, and encodeOutput. -

    5. If isomorph is not in percentEncodeSet, then append - isomorph to output. +

    6. +

      For each byte of encodeOutput converted to a byte sequence: -

    7. Otherwise, percent-encode byte and append the result to - output. -

    +
      +
    1. If spaceAsPlus is true and byte is 0x20 (SP), then append + U+002B (+) to output. -

    2. Return output. -

    +
  9. Let isomorph be a code point whose value + is byte's value. -

    To percent-encode after encoding, given an encoding -encoding, string input, a percentEncodeSet, and a -boolean spaceAsPlus, run these steps: +

  10. Assert: percentEncodeSet includes all non-ASCII code points. -

      -
    1. Let output be the empty string.

    2. +
    3. If isomorph is not in percentEncodeSet, then append + isomorph to output. -

    4. -

      For each codePoint of input: +

    5. Otherwise, percent-encode byte and append the result to + output. +

    -
      -
    1. If spaceAsPlus is true and codePoint is U+0020, then append - U+002B (+) to output. +

    2. +

      If potentialError is non-null, then append "%26%23", followed by the + shortest sequence of ASCII digits representing potentialError in base + ten, followed by "%3B", to output. -

    3. Otherwise, run percent-encode after encoding with - encoding, codePoint, and percentEncodeSet, and append the result - to output. +

      This can happen when encoding is not UTF-8.

  11. Return output. @@ -285,13 +274,13 @@ boolean spaceAsPlus, run these steps:

    To UTF-8 percent-encode a code point codePoint using a percentEncodeSet, return the result -of running percent-encode after encoding with UTF-8, -codePoint, and percentEncodeSet. +of running percent-encode after encoding with UTF-8, +codePoint as a string, and percentEncodeSet.

    To UTF-8 percent-encode a string input using a percentEncodeSet, return the result of running -percent-encode after encoding with UTF-8, input, -percentEncodeSet, and false. +percent-encode after encoding with UTF-8, input, and +percentEncodeSet.


    @@ -319,20 +308,20 @@ a percentEncodeSet, return the result of running "‽%25%2E" 0xE2 0x80 0xBD 0x25 0x2E - Percent-encode after encoding with Shift_JIS, + Percent-encode after encoding with Shift_JIS, input, and the userinfo percent-encode set - U+0020 + " " "%20" - U+2261 (≡) + "" "%81%DF" - U+203D (‽) + "" "%26%238253%3B" - Percent-encode after encoding with ISO-2022-JP, - input, and the userinfo percent-encode set - U+00A5 (¥) + Percent-encode after encoding with ISO-2022-JP, input, + and the userinfo percent-encode set + "¥" "%1B(J\%1B(B" Percent-encode after encoding with Shift_JIS, input, the