Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get an encoder and encode or fail for URLs #238

Merged
merged 2 commits into from
Oct 23, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 64 additions & 14 deletions encoding.bs
Original file line number Diff line number Diff line change
Expand Up @@ -1045,12 +1045,17 @@ optional I/O queue of bytes <var>output</var> (default « »), return the result

<h3 id=legacy-hooks>Legacy hooks for standards</h3>

<p class=note>Standards are strongly discouraged from using <a>decode</a>, <a for=/>encode</a>, and
<a>BOM sniff</a>, except as needed for compatibility. Standards needing these legacy hooks will most
likely also need to use <a>get an encoding</a> (to turn a <a>label</a> into an
<a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an <a for=/>encoding</a> into
another <a for=/>encoding</a> that is suitable to pass into <a>encode</a>). Other algorithms are not
to be used directly.
<div class=note>
<p>Standards are strongly discouraged from using <a>decode</a>, <a>BOM sniff</a>, and
<a for=/>encode</a>, except as needed for compatibility. Standards needing these legacy hooks will
most likely also need to use <a>get an encoding</a> (to turn a <a>label</a> into an
<a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an <a for=/>encoding</a> into
another <a for=/>encoding</a> that is suitable to pass into <a>encode</a>).

<p>For the extremely niche case of URL percent-encoding, custom encoder error handling is needed.
The <a>get an encoder</a> and <a>encode or fail</a> algorithms are to be used for that. Other
algorithms are not to be used directly.
</div>

<p>To <dfn export>decode</dfn> an I/O queue of bytes <var>ioQueue</var> given a fallback encoding
<var>encoding</var> and an optional I/O queue of scalar values <var>output</var> (default « »), run
Expand Down Expand Up @@ -1111,19 +1116,63 @@ corresponding to the byte order mark found, or null otherwise.
steps:

<ol>
<li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>.
<li><p>Let <var>encoder</var> be the result of <a>getting an encoder</a> from <var>encoding</var>.

<li><p><a>Run</a> <var>encoding</var>'s <a for=/>encoder</a> with <var>ioQueue</var>,
<var>output</var>, and "<code>html</code>".
<li><p><a>Run</a> <var>encoder</var> with <var>ioQueue</var>, <var>output</var>, and
"<code>html</code>".

<li><p>Return <var>output</var>.
</ol>

<p class="note no-backref">This is mostly a legacy hook for URLs and HTML forms. Layering
<a>UTF-8 encode</a> on top is safe as it never triggers
<a>errors</a>.
[[URL]]
[[HTML]]
<p class="note no-backref">This is a legacy hook for HTML forms. Layering <a>UTF-8 encode</a> on top
is safe as it never triggers <a>errors</a>. [[HTML]]

<hr>

<p>To <dfn export lt="get an encoder|getting an encoder">get an encoder</dfn> from an
<a for=/>encoding</a> <var>encoding</var>:

<ol>
<li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>.

<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.
<li><p>Return a new instance of <var>encoding</var>'s <a for=/>encoder</a>.

There's a difference between an encoder (an "encoder class", so to speak) and an encoder instance, which has state. This hook should also be renamed to "get an encoder instance".

See also #237 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

</ol>

<p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these
<a for=/>encoder</a> instance <var>encoderInstance</var> and an I/O queue of bytes <var>output</var>, run these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

steps:

<ol>
<li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
<li><p><a for="I/O queue">Push</a> <a>end-of-queue</a> into <var>encoder</var>.

Needed so the conversion to a byte sequence in whatwg/url#558 doesn't hang.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, but adjusted the wording slightly and pushed into output instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, nice catch.


<li><p><a for="I/O queue">Push</a> <a>end-of-queue</a> to <var>output</var>.

<li><p>If <var>potentialError</var> is an <a>error</a>, then return <a>error</a>'s
<a>code point</a>'s <a for="code point">value</a>.

<li><p>Return null.
</ol>

<div class=note id=pit-of-iso-2022-jp>
<p>This is a legacy hook for URL percent-encoding. The caller will have to keep an
<a for=/>encoder</a> alive as the <a>ISO-2022-JP encoder</a> can be in two different states when
returning an <a>error</a>. That also means that if the caller emits bytes to encode the error in
some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F, 0x1B, 0x5C,
and 0x7E. [[URL]]

<p>In particular, if upon returning an <a>error</a> the <a>ISO-2022-JP encoder</a> is in the
<a lt="ISO-2022-JP decoder Roman">Roman</a> state, the caller cannot output 0x5C (\) as it will not
decode as U+005C (\). For this reason, applications using <a>encode or fail</a> for unintended
purposes ought to take care to prevent the use of the <a>ISO-2022-JP encoder</a> in combination
with replacement schemes, such as those of JavaScript and CSS, that use U+005C (\) as part of the
replacement syntax (e.g., <code>\u2603</code>) or make sure to pass the replacement syntax through
the encoder (in contrast to URL percent-encoding).

<p>The return value is either the number representing the <a>code point</a> that could not be
encoded or null, if there was no <a>error</a>. When it returns non-null the caller will have to
invoke it again, supplying the same <a for=/>encoder</a> and a new output I/O queue.
</div>



Expand Down Expand Up @@ -3399,6 +3448,7 @@ Glenn Maynard,
Gordon P. Hemsley,
Henri Sivonen,
Ian Hickson,
J. King,
James Graham,
Jeffrey Yasskin,
John Tamplin,
Expand Down