diff --git a/encoding.bs b/encoding.bs index 969476a..7afd377 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1045,12 +1045,17 @@ optional I/O queue of bytes output (default « »), return the result

Legacy hooks for standards

-

Standards are strongly discouraged from using decode, encode, and -BOM sniff, except as needed for compatibility. Standards needing these legacy hooks will most -likely also need to use get an encoding (to turn a label into an -encoding) and get an output encoding (to turn an encoding into -another encoding that is suitable to pass into encode). Other algorithms are not -to be used directly. +

+

Standards are strongly discouraged from using decode, BOM sniff, and + encode, except as needed for compatibility. Standards needing these legacy hooks will + most likely also need to use get an encoding (to turn a label into an + encoding) and get an output encoding (to turn an encoding into + another encoding that is suitable to pass into encode). + +

For the extremely niche case of URL percent-encoding, custom encoder error handling is needed. + The get an encoder and encode or fail algorithms are to be used for that. Other + algorithms are not to be used directly. +

To decode an I/O queue of bytes ioQueue given a fallback encoding encoding and an optional I/O queue of scalar values output (default « »), run @@ -1111,19 +1116,63 @@ corresponding to the byte order mark found, or null otherwise. steps:

    -
  1. Assert: encoding is not replacement or UTF-16BE/LE. +

  2. Let encoder be the result of getting an encoder from encoding. -

  3. Run encoding's encoder with ioQueue, - output, and "html". +

  4. Run encoder with ioQueue, output, and + "html".

  5. Return output.

-

This is mostly a legacy hook for URLs and HTML forms. Layering -UTF-8 encode on top is safe as it never triggers -errors. -[[URL]] -[[HTML]] +

This is a legacy hook for HTML forms. Layering UTF-8 encode on top +is safe as it never triggers errors. [[HTML]] + +


+ +

To get an encoder from an +encoding encoding: + +

    +
  1. Assert: encoding is not replacement or UTF-16BE/LE. + +

  2. Return encoding's encoder. +

+ +

To encode or fail an I/O queue of scalar values ioQueue given an +encoder encoder and an I/O queue of bytes output, run these +steps: + +

    +
  1. Let potentialError be the result of running encoder with + ioQueue, output, and "fatal". + +

  2. Push end-of-queue to output. + +

  3. If potentialError is an error, then return error's + code point's value. + +

  4. Return null. +

+ +
+

This is a legacy hook for URL percent-encoding. The caller will have to keep an + encoder alive as the ISO-2022-JP encoder can be in two different states when + returning an error. That also means that if the caller emits bytes to encode the error in + some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, 0x1B, 0x5C, + and 0x7E. [[URL]] + +

In particular, if upon returning an error the ISO-2022-JP encoder is in the + Roman state, the caller cannot output 0x5C (\) as it will not + decode as U+005C (\). For this reason, applications using encode or fail for unintended + purposes ought to take care to prevent the use of the ISO-2022-JP encoder in combination + with replacement schemes, such as those of JavaScript and CSS, that use U+005C (\) as part of the + replacement syntax (e.g., \u2603) or make sure to pass the replacement syntax through + the encoder (in contrast to URL percent-encoding). + +

The return value is either the number representing the code point that could not be + encoded or null, if there was no error. When it returns non-null the caller will have to + invoke it again, supplying the same encoder and a new output I/O queue. +

@@ -3399,6 +3448,7 @@ Glenn Maynard, Gordon P. Hemsley, Henri Sivonen, Ian Hickson, +J. King, James Graham, Jeffrey Yasskin, John Tamplin,