Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor query state to operate on a buffer #558

Merged
merged 2 commits into from
Nov 2, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 70 additions & 62 deletions url.bs
Original file line number Diff line number Diff line change
Expand Up @@ -217,81 +217,70 @@ inclusive, and U+007E (~).
all code points, except the <a>ASCII alphanumeric</a>, U+002A (*), U+002D (-), U+002E (.), and
U+005F (_).

<p>To <dfn for="code point">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
<var>encoding</var>, <a for=/>code point</a> <var>codePoint</var>, and a
<var>percentEncodeSet</var>, run these steps:
<p>To <dfn for=string>percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and an
optional boolean <var>spaceAsPlus</var> (default false), run these steps:

<ol>
<li><p>Let <var>bytes</var> be the result of <a lt=encode>encoding</a> <var>codePoint</var> using
<var>encoding</var>.
<li><p>Let <var>encoder</var> be the result of <a>getting an encoder</a> from <var>encoding</var>.

<li>
<p>If <var>bytes</var> starts with 0x26 (&amp;) 0x23 (#) and ends with 0x3B (;), then:

<ol>
<li><p>Let <var>output</var> be <var>bytes</var>, <a>isomorphic decoded</a>.
<li><p>Let <var>inputQueue</var> be <var>input</var> converted to an <a for=/>I/O queue</a>.

<li><p>Replace the first two code points of <var>output</var> with "<code>%26%23</code>".

<li><p>Replace the last code point of <var>output</var> with "<code>%3B</code>".

<li><p>Return <var>output</var>.
</ol>
<li><p>Let <var>output</var> be the empty string.

<p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
<li>
<p>Let <var>potentialError</var> be 0.

<li><p>Let <var>output</var> be the empty string.</p></li>
<p class=note>This needs to be a non-null value to initiate the subsequent while loop.

<li>
<p>For each <var>byte</var> of <var>bytes</var>:
<p>While <var>potentialError</var> is non-null:

<ol>
<li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
is <var>byte</var>'s <a for=byte>value</a>.
<li><p>Let <var>encodeOutput</var> be an empty <a for=/>I/O queue</a>.

<li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.
<li><p>Set <var>potentialError</var> to the result of running <a>encode or fail</a> with
<var>inputQueue</var>, <var>encoder</var>, and <var>encodeOutput</var>.

<li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
<var>isomorph</var> to <var>output</var>.
<li>
<p>For each <var>byte</var> of <var>encodeOutput</var> converted to a byte sequence:

<li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
<var>output</var>.
</ol>
<ol>
<li><p>If <var>spaceAsPlus</var> is true and <var>byte</var> is 0x20 (SP), then append
U+002B (+) to <var>output</var>.

<li><p>Return <var>output</var>.
</ol>
<li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
is <var>byte</var>'s <a for=byte>value</a>.

<p>To <dfn for="string">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and a
boolean <var>spaceAsPlus</var>, run these steps:
<li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.

<ol>
<li><p>Let <var>output</var> be the empty string.</p></li>
<li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
<var>isomorph</var> to <var>output</var>.

<li>
<p>For each <var>codePoint</var> of <var>input</var>:
<li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
<var>output</var>.
</ol>

<ol>
<li><p>If <var>spaceAsPlus</var> is true and <var>codePoint</var> is U+0020, then append
U+002B (+) to <var>output</var>.
<li>
<p>If <var>potentialError</var> is non-null, then append "<code>%26%23</code>", followed by the
shortest sequence of <a for=/>ASCII digits</a> representing <var>potentialError</var> in base
ten, followed by "<code>%3B</code>", to <var>output</var>.

<li><p>Otherwise, run <a for="code point">percent-encode after encoding</a> with
<var>encoding</var>, <var>codePoint</var>, and <var>percentEncodeSet</var>, and append the result
to <var>output</var>.
<p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
</ol>

<li><p>Return <var>output</var>.
</ol>

<p>To <dfn for="code point" id=utf-8-percent-encode>UTF-8 percent-encode</dfn> a
<a for=/>code point</a> <var>codePoint</var> using a <var>percentEncodeSet</var>, return the result
of running <a for="code point">percent-encode after encoding</a> with <a for=/>UTF-8</a>,
<var>codePoint</var>, and <var>percentEncodeSet</var>.
of running <a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>,
<var>codePoint</var> as a <a for=/>string</a>, and <var>percentEncodeSet</var>.

<p>To <dfn export for=string>UTF-8 percent-encode</dfn> a <a for=/>string</a> <var>input</var> using
a <var>percentEncodeSet</var>, return the result of running
<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>,
<var>percentEncodeSet</var>, and false.
<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>, and
<var>percentEncodeSet</var>.
annevk marked this conversation as resolved.
Show resolved Hide resolved

<hr>

Expand Down Expand Up @@ -319,20 +308,20 @@ a <var>percentEncodeSet</var>, return the result of running
<td>"<code>‽%25%2E</code>"
<td>0xE2 0x80 0xBD 0x25 0x2E
<tr>
<td rowspan=3><a for="code point">Percent-encode after encoding</a> with <a>Shift_JIS</a>,
<td rowspan=3><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>,
<var>input</var>, and the <a>userinfo percent-encode set</a>
<td>U+0020
<td>"<code> </code>"
<td>"<code>%20</code>"
<tr>
<td>U+2261 (≡)
<td>"<code>≡</code>"
<td>"<code>%81%DF</code>"
<tr>
<td>U+203D (‽)
<td>"<code>‽</code>"
<td>"<code>%26%238253%3B</code>"
<tr>
<td><a for="code point">Percent-encode after encoding</a> with <a>ISO-2022-JP</a>,
<var>input</var>, and the <a>userinfo percent-encode set</a>
<td>U+00A5 (¥)
<td><a for=string>Percent-encode after encoding</a> with <a>ISO-2022-JP</a>, <var>input</var>,
and the <a>userinfo percent-encode set</a>
<td>"<code>¥</code>"
<td>"<code>%1B(J\%1B(B</code>"
<tr>
<td><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>, <var>input</var>, the
Expand Down Expand Up @@ -2432,9 +2421,33 @@ string <var>input</var>, optionally with a <a>base URL</a> <var>base</var>, opti
<p>then set <var>encoding</var> to <a>UTF-8</a>.
<!-- https://simon.html5.org/test/url/url-encoding.html -->

<li><p>If <var>state override</var> is not given and <a>c</a> is U+0023 (#), then set
<var>url</var>'s <a for=url>fragment</a> to the empty string and state to
<a>fragment state</a>.
<li>
<p>If one of the following is true:

<ul class=brief>
<li><p><var>state override</var> is not given and <a>c</a> is U+0023 (#)
<li><p><a>c</a> is the <a>EOF code point</a>
</ul>

<p>then:

<ol>
<li><p>Let <var>queryPercentEncodeSet</var> be the <a>special-query percent-encode set</a> if
<var>url</var> <a>is special</a>; otherwise the <a>query percent-encode set</a>.

<li>
<p><a for=string>Percent-encode after encoding</a>, with <var>encoding</var>,
<var>buffer</var>, and <var>queryPercentEncodeSet</var>, and append the result to
<var>url</var>'s <a for=url>query</a>.

<p class=note>This operation cannot be invoked code-point-for-code-point due to the stateful
<a>ISO-2022-JP encoder</a>.

<li><p>Set <var>buffer</var> to the empty string.

<li><p>If <a>c</a> is U+0023 (#), then set <var>url</var>'s <a for=url>fragment</a> to
the empty string and state to <a>fragment state</a>.
</ol>

<li>
<p>Otherwise, if <a>c</a> is not the <a>EOF code point</a>:
Expand All @@ -2446,12 +2459,7 @@ string <var>input</var>, optionally with a <a>base URL</a> <var>base</var>, opti
<li><p>If <a>c</a> is U+0025 (%) and <a>remaining</a> does not start with two
<a>ASCII hex digits</a>, <a>validation error</a>.

<li><p>Let <var>queryPercentEncodeSet</var> be the <a>special-query percent-encode set</a> if
<var>url</var> <a>is special</a>; otherwise the <a>query percent-encode set</a>.

<li><p><a for="code point">Percent-encode after encoding</a>, with <var>encoding</var>,
<a>c</a>, and <var>queryPercentEncodeSet</var>, and append the result to <var>url</var>'s
<a for=url>query</a>.
<li><p>Append <a>c</a> to <var>buffer</var>.
</ol>
</ol>

Expand Down