Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: update WHATWG URL parser to latest spec #12523

Closed
wants to merge 1 commit into from

Conversation

TimothyGu
Copy link
Member

@TimothyGu TimothyGu commented Apr 19, 2017

The following (non-editorial) spec changes are implemented in this PR:

I had to squash the four spec updates into one commit, since some of them overlap with one another, making it difficult to separate atomic changes.

This also enables all tests in url-tests.js, which means that our implementation will be fully spec-compliant after this PR. 🎉

Fixes: #10608
Fixes: #10634

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

url

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v4.x whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Apr 19, 2017
@TimothyGu
Copy link
Member Author

@jasnell
Copy link
Member

jasnell commented Apr 19, 2017

FYI: Once this lands, and we can verify that we're fully spec compliant, I want to open a PR that moves the WHATWG URL out of Experimental status.

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit

@@ -8,6 +8,8 @@ const {
const binding = process.binding('url');
const context = Symbol('context');
const cannotBeBase = Symbol('cannot-be-base');
const cannotHaveUsernamePasswordPort =
Symbol('cannot have a username/password/port');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Symbol('cannot-have-username-password-port') to remain consistent with the other symbols?

// https://url.spec.whatwg.org/#cannot-have-a-username-password-port
get [cannotHaveUsernamePasswordPort]() {
const { host, scheme } = this[context];
return ((host == null || host === '') ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps host.length === 0 might be better/faster?

Copy link
Member Author

@TimothyGu TimothyGu Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says "if its host is null or the empty string" so it'll be clearer this way, and no, it's not faster in my tests.

@TimothyGu
Copy link
Member Author

I want to open a PR that moves the WHATWG URL out of Experimental status.

Sounds good to me. Will this for v8.x?

@jasnell
Copy link
Member

jasnell commented Apr 19, 2017

Hopefully we can do it in time for 8.0.0

@TimothyGu
Copy link
Member Author

@jasnell nit fixed.

@watilde
Copy link
Member

watilde commented Apr 20, 2017

Great work! Can you also update test/fixture/url-setter-tests.js as well?

@TimothyGu
Copy link
Member Author

TimothyGu commented Apr 20, 2017

@watilde, done. That actually made me notice a few bugs that were in the initial implementation, thanks! 👏

@jasnell I'd like you to take a look at this again, especially 3f53f8038fa676618db6f8169f93ba53aeaebc0c which changes how the JS-C++ layer works right now.

New CI: https://ci.nodejs.org/job/node-test-pull-request/7559/

@jasnell
Copy link
Member

jasnell commented Apr 20, 2017

Ok, I'll take a look in detail shortly

- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: nodejs#12523
Fixes: nodejs#10608
Fixes: nodejs#10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
@TimothyGu
Copy link
Member Author

Also implemented whatwg/url#260.

One more CI: https://ci.nodejs.org/job/node-test-pull-request/7622/

@TimothyGu
Copy link
Member Author

Landed in b2870a4.

@TimothyGu TimothyGu closed this Apr 24, 2017
@TimothyGu TimothyGu deleted the url-perc-nonspecial branch April 24, 2017 23:36
TimothyGu added a commit that referenced this pull request Apr 24, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit that referenced this pull request Apr 25, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
@evanlucas evanlucas mentioned this pull request May 1, 2017
evanlucas pushed a commit that referenced this pull request May 1, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit that referenced this pull request May 2, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants