Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get an encoder and encode or fail for URLs #238

Merged
merged 2 commits into from
Oct 23, 2020

Conversation

annevk
Copy link
Member

@annevk annevk commented Oct 21, 2020

Please excuse the branch name. The first commit is #237 and I will rebase this once @andreubotella has looked at that.

I suspect some changes might be needed here around my IO queue usage. In particular, in this scenario we probably do want to push end-of-queue upon encountering an error, to make conversion to bytes easier for the caller.


Preview | Diff

@annevk annevk requested a review from hsivonen October 21, 2020 11:01
encoding.bs Outdated Show resolved Hide resolved
annevk added a commit to whatwg/url that referenced this pull request Oct 21, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Depends on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
Copy link
Member

@andreubotella andreubotella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(posted by mistake, the actual review is below)

Copy link
Member

@andreubotella andreubotella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these changes depend on #237 (comment).

encoding.bs Outdated
<div class=note>
<p>In addition to the <a>decode</a>, <a>BOM sniff</a>, and <a>encode</a> algorithms below,
standards needing these legacy hooks will most likely also need to use <a>get an encoding</a> (to
turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an
turn a <a>label</a> into an <a for=/>encoding</a> instance) and <a>get an output encoding</a> (to turn an

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #240 on this. I'd like to go for a more minimal approach and can post a PR after this lands if that's okay.

encoding.bs Outdated Show resolved Hide resolved
<ol>
<li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>.

<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.
<li><p>Return a new instance of <var>encoding</var>'s <a for=/>encoder</a>.

There's a difference between an encoder (an "encoder class", so to speak) and an encoder instance, which has state. This hook should also be renamed to "get an encoder instance".

See also #237 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

</ol>

<p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these
<a for=/>encoder</a> instance <var>encoderInstance</var> and an I/O queue of bytes <var>output</var>, run these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.


<ol>
<li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".
<li><p><a for="I/O queue">Push</a> <a>end-of-queue</a> into <var>encoder</var>.

Needed so the conversion to a byte sequence in whatwg/url#558 doesn't hang.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, but adjusted the wording slightly and pushed into output instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, nice catch.

encoding.bs Outdated
</ol>

<div class=note>
<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as
<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> instace alive as

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@annevk
Copy link
Member Author

annevk commented Oct 22, 2020

I thought of an alternative that seems slightly nicer, which is that we push the error itself into the output I/O queue. That still gives the caller freedom to deal with errors in whatever way they want, but preserves the existing contract of encode (other than having to pass a different error mode) and avoids having to hand out encoders. And if the remaining caller of encode really is text/plain (I need to double check that) it might well make sense to do away with "html" entirely.

@andreubotella
Copy link
Member

I seemed to remember that HTML also specified some encoding for multipart/form-data, and indeed, although it doesn't use the "encode" algorithm, what it describes is equivalent to encoding with replacement.

I'm not very keen on the idea of returning an I/O queue of bytes or errors – I'd prefer encode to take an error-handling callback that can push into the output I/O queue, especially since multipart/form-data and text/plain could use the same callback which would be different from the one in the URL standard. But this isn't a blocker for me.

@annevk
Copy link
Member Author

annevk commented Oct 22, 2020

Pushing into the output queue is no good (unless you get to push errors in, but at that point...). URL needs to process non-errors differently from errors. I suppose it could use the callback to do something else, but that seems worse than the current alternatives to me.

@andreubotella
Copy link
Member

Whoops, excuse my brain fart there. In that case, I'm still not too keen on it, but it seems to be the best way to solve this, so that's fine by me.

@hsivonen
Copy link
Member

I much prefer the concept of an encoder instance over the concept of error objects intermingling in a queue with bytes, because the former is closer to actual implementation concepts.

@annevk annevk requested a review from hsivonen October 23, 2020 11:54
@annevk annevk merged commit c55584b into master Oct 23, 2020
@annevk annevk deleted the annevk/hooks-for-standards branch October 23, 2020 14:44
annevk added a commit to whatwg/url that referenced this pull request Oct 28, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
annevk added a commit to whatwg/url that referenced this pull request Oct 28, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
annevk added a commit to whatwg/url that referenced this pull request Nov 2, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317.

Fixes #557.
annevk added a commit to whatwg/url that referenced this pull request Nov 2, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317.

Fixes #557.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants