Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opaque hosts #185

Merged
merged 3 commits into from
Jan 24, 2017
Merged

Add opaque hosts #185

merged 3 commits into from
Jan 24, 2017

Conversation

annevk
Copy link
Member

@annevk annevk commented Dec 28, 2016

For URLs without a special scheme we cannot use the host parser
directly due to compatibility issues and we need to serialize slightly
differently too.

Fixes #148.

@annevk
Copy link
Member Author

annevk commented Dec 28, 2016

@achristensen07 what do you think of this approach?

@annevk annevk added the needs tests Moving the issue forward requires someone to write tests label Dec 28, 2016
@achristensen07
Copy link
Collaborator

Seems good. Make sure to add a test verifying something like "new URL('path', 'asdf://host')" serializes to 'asdf://host/path' adding the slash. I broke something by forgetting that in my initial implementation. I don't add a slash for query-only or fragment-only relative URLs, but that might be worth more discussion.

@annevk
Copy link
Member Author

annevk commented Dec 28, 2016

@achristensen07 running https://quuz.org/url/liveview.html#notspecial://test?x in Safari TP does yield a slash for me.

@achristensen07
Copy link
Collaborator

Yes, but "new URL('?query', 'asdf://host')" does not add a slash right now in STP. Maybe this is inconsistent.

I think this is a good approach, but we might need another issue for these query-only and fragment-only URLs with non-special schemes.

for further processing.

<p class="note no-backref">A <a>non-special-URL host</a> is emphatically not a <a for=/>host</a> and
only used by <a lt="is special">non-special URLs</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is only used by...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"X is A and B" seems fine?

Copy link
Member

@domenic domenic Dec 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you add "not"s it's confusing. "X is not A and B" could be either "X is not A and X is not B" or "X is not A and X is B".

<a for=url>path</a> (including empty strings), separated from each other by
"<code>/</code>", to <var>output</var>.
<li>
<p>Otherwise, then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove ", then" here

<li><p>Return the result of <a lt="host parser">host parsing</a> <var>input</var>.
</ol>

<li><p>Return <var>input</var>.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I either need to percent-encode input here or in the state machine.

@annevk
Copy link
Member Author

annevk commented Dec 29, 2016

The pathname getter should probably also be adjusted in a way similar to the serializer.

annevk added a commit to web-platform-tests/wpt that referenced this pull request Dec 29, 2016
@annevk
Copy link
Member Author

annevk commented Dec 29, 2016

Yes, but "new URL('?query', 'asdf://host')" does not add a slash right now in STP. Maybe this is inconsistent.

Yeah I think that would be quite odd and required a slash in the tests (and standard) for now. I guess I should also test asdf://host?query. (Edit: tested and it seems that Safari TP does add a slash there.)

@annevk annevk removed the needs tests Moving the issue forward requires someone to write tests label Dec 29, 2016
@annevk annevk mentioned this pull request Dec 29, 2016
annevk added a commit that referenced this pull request Dec 29, 2016
This is a theoretical issue once #185 lands, but as discussed it seems
good to address this anyway. Fixes #79.
<a lt="host serializer">serialized</a>, to <var>output</var>.

<li><p>Otherwise, append <var>url</var>'s <a for=url>host</a>, to <var>output</var>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/, to output/ to output/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's currently somewhat consistently "wrong"...

@domenic
Copy link
Member

domenic commented Dec 29, 2016

This doesn't seem to work either, testing in whatwg-url. In particular sc://ñ does not parse as having an empty path, but instead a path containing a single empty string item. Again could be a bug in my spec translation. I'll post a whatwg-url PR for you to review.

@annevk
Copy link
Member Author

annevk commented Dec 30, 2016

Yeah, you are right. Even with EOF you get a path that is non-empty.

Another problem here is not forbidding certain code points in the host setter.

I guess I should rethink this.

The other question of course is whether all this additional complexity is worth it, versus treating non-special schemes as just having a path and not being able to be base URLs for each other. (Which is what most browsers implement at the moment.)

@annevk
Copy link
Member Author

annevk commented Dec 30, 2016

One thought I had is that it's perhaps worth splitting the non-special-URL host change and the change to not serialize a path if none was in the input.

But it might be easy to do the no path in input change. It only really matters if the URL is not special and you hit EOF during the host/hostname state, right? It should be easy enough to account for that.

@annevk annevk added the needs implementer interest Moving the issue forward requires implementers to express interest label Jan 4, 2017
@annevk
Copy link
Member Author

annevk commented Jan 5, 2017

I'm going to pursue making this work, since it seems worthwhile to support relative URLs for non-special schemes, even if it comes at a cost somewhat greater than initially anticipated. I don't think this is the reason to move to the Chrome/Firefox model of effectively setting the cannot-be-base-URL flag for all non-special URLs.

I landed a commit that addresses the issue @domenic found, but I also need to look through the tests again since I think I missed updating a couple tests (in particular setter tests).

annevk added a commit to web-platform-tests/wpt that referenced this pull request Jan 5, 2017
@annevk
Copy link
Member Author

annevk commented Jan 5, 2017

I also updated the tests. I didn't add any new host/hostname setter tests however.

@annevk annevk removed the needs implementer interest Moving the issue forward requires implementers to express interest label Jan 5, 2017
@annevk
Copy link
Member Author

annevk commented Jan 5, 2017

I forgot to address "Another problem here is not forbidding certain code points in the host setter" so this is still not ready.

@annevk
Copy link
Member Author

annevk commented Jan 6, 2017

The other thing I noticed that while testing is that currently we allow userinfo and a port to exist while host is empty. That shouldn't be allowed (and isn't in Safari).

@annevk
Copy link
Member Author

annevk commented Jan 21, 2017

Also, come to think of it, it's basically the same code points that are restricted. I just removed the code points that are not problematic for non-special URLs, such as [ and ] (restricted for IPv6).

@annevk
Copy link
Member Author

annevk commented Jan 21, 2017

We could keep the list of restricted code points the same though if you think that is better. I don't care strongly.

@achristensen07
Copy link
Collaborator

"https://" is now invalid in WebKit. I don't think it's worth allowing it in the spec. It's a relatively small compatibility issue.

@annevk annevk changed the title Add non-special-URL hosts Add opague hosts Jan 24, 2017
@annevk annevk changed the title Add opague hosts Add opaque hosts Jan 24, 2017
@annevk
Copy link
Member Author

annevk commented Jan 24, 2017

@domenic this PR and the corresponding tests at web-platform-tests/wpt#4406 can be reviewed again. I have verified it with whatwg-url, but haven't updated my patch for it. Since it's a rather trivial change it seems good to use that as a test to ensure you get the same results.

@annevk annevk requested a review from domenic January 24, 2017 11:38
idempotent:
https://@/example.org/ -> https:///example.org/ -> https://example.org/ -->

<li><p>Decrease <var>pointer</var> by the number of code points in <var>buffer</var> plus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be separate substeps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this as-is since the parser generally groups steps that can reasonably be grouped.

domenic added a commit to jsdom/whatwg-url that referenced this pull request Jan 24, 2017
domenic added a commit to jsdom/whatwg-url that referenced this pull request Jan 24, 2017
annevk added a commit to web-platform-tests/wpt that referenced this pull request Jan 24, 2017
@annevk annevk merged commit 3036255 into master Jan 24, 2017
@annevk annevk deleted the annevk/non-special-url-hosts branch January 24, 2017 19:53
domenic added a commit to jsdom/whatwg-url that referenced this pull request Jan 24, 2017
annevk added a commit that referenced this pull request Jan 24, 2017
This is a theoretical issue once #185 lands, but as discussed it seems
good to address this anyway. Fixes #79.
TimothyGu added a commit to TimothyGu/node that referenced this pull request Apr 23, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: nodejs#12523
Fixes: nodejs#10608
Fixes: nodejs#10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
TimothyGu added a commit to nodejs/node that referenced this pull request Apr 24, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit to nodejs/node that referenced this pull request Apr 25, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit to nodejs/node that referenced this pull request May 1, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit to nodejs/node that referenced this pull request May 2, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants