Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: adding WHATWG URL support #7448

Closed
wants to merge 1 commit into from
Closed

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Jun 27, 2016

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

url

Description of change

This is a work in progress.

See: nodejs/node-eps#28
See: https://url.spec.whatwg.org

Implements WHATWG URL support. Example:

var u = new url.URL('http://example.org');

This is quite rough to start. Most of the WHATWG URL basic parser is supported. Currently only about 2726 of the standard tests fail. There's still quite a bit of additional work to be done on this.

@jasnell jasnell added the url Issues and PRs related to the legacy built-in url module. label Jun 27, 2016
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jun 27, 2016
@jasnell jasnell added wip Issues and PRs that are still a work in progress. semver-minor PRs that contain new features and should be released in the next minor version. labels Jun 27, 2016
@jasnell
Copy link
Member Author

jasnell commented Jun 27, 2016

Related: #7355

@jasnell
Copy link
Member Author

jasnell commented Jun 28, 2016

@mscdex @bnoordhuis @indutny ... would appreciate you all taking a look. The emphasis here currently is on spec compliance but the performance definitely needs to improve. It's functionally operational but obviously can use some performance improvements. The one bit currently missing is the host parsing which is why this currently fails 26 of the standard tests (all the failures deal with host edge cases).

The WHATWG URL parsing is intentionally liberal and lenient so there are some bits we can't easily work around -- such as ignoring whitespace.

@@ -0,0 +1,328 @@
'use strict';

function getPunycode() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is in anticipation of my other PR that switches to ICU's punycode impl landing. If/when that moves forward, we'll be able to take advantage of some other performance improvements from that.

@mscdex
Copy link
Contributor

mscdex commented Jun 28, 2016

I'm curious, why implement part in javascript and part in c++?

Also, I probably wouldn't worry much about performance until it passes all tests first.

@jasnell
Copy link
Member Author

jasnell commented Jun 28, 2016

I implemented it in pure js first and the performance was fairly abysmal.
I'm planning to keep toying with the js impl to see if I can get the
performance better but this seems to do pretty well.
On Jun 27, 2016 11:03 PM, "Brian White" notifications@github.com wrote:

I'm curious, why implement part in javascript and part in c++?

Also, I probably wouldn't worry much about performance until it passes all
tests first.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7448 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAa2ebfB63hQpDB0KxT1jE-ncZiuCQ4jks5qQLlDgaJpZM4I_c8x
.

@Fishrock123
Copy link
Contributor

Not sure how widespread breakage would be, but I'm generally for this, so long as it doesn't introduce a new module or global.

@jasnell
Copy link
Member Author

jasnell commented Jun 28, 2016

Breakage would be minimal at this point because there is absolutely zero change to the existing URL implementation. Existing code should be entirely unaffected.

StorageObject.prototype = Object.create(null);

class URL {
constructor(input, base) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good implementation of the IDL. input and base do not need to be strings (or URLs).

Instead, the first few lines of this should be something like

if (input === undefined) {
  throw new TypeError("first argument is required");
}
input = String(input);

if (base !== undefined) {
  base = String(base);
}

(Actually those String(input)s should actually be toUSVString(input)... but that's a separate discussion.)

The same applies to all the setters, which should just do scheme = String(scheme), not throw a TypeError.

These aspects of the API will not be tested in the web platform tests since they are, in typical implementations, auto-generated from Web IDL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the js side of this is somewhat temporary at the moment while I focus on the parsing bits. Currently this is a close-enough-to-be-somewhat-functional impl of the IDL bits ;-). This is definitely helpful tho and I've added it to my list of todo's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK cool :). Sorry for the premature review then; the JS part is easier for me to review so I just dived in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all! I definitely appreciate the eyes on it and I'm hoping t sort out all of the js/IDL bits this week so this is helpful.

@Fishrock123
Copy link
Contributor

Breakage would be minimal at this point because there is absolutely zero change to the existing URL implementation.

ok well, I definitely don't think we should add a new, competitive implementation.
👎

@jasnell
Copy link
Member Author

jasnell commented Jun 29, 2016

As I've said previously, the idea is to introduce the standard API in
parallel specifically to avoid breaking things. Once the impl is done and
we're sure it works, we can begin transitioning users off the old impl and
deprecate the current url.parse method or modify it so that it returns a
URL object.
On Jun 29, 2016 2:55 AM, "Jeremiah Senkpiel" notifications@github.com
wrote:

Breakage would be minimal at this point because there is absolutely zero
change to the existing URL implementation.

ok well, I definitely don't think we should add a new, competitive
implementation.
👎


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7448 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAa2eb-DLnuTfMDY-QjSGmoS8AqS2ofxks5qQkEOgaJpZM4I_c8x
.

@btd
Copy link

btd commented Jun 29, 2016

Will it contain also URLSearchParams? It will be quite usefull, and future possible replacement for standart querystring module.

@jasnell
Copy link
Member Author

jasnell commented Jun 29, 2016

Yes, though it likely would not replace the querystring module entirely.
The querystring module would likely be used to help impl URLSearchParams.
On Jun 29, 2016 7:14 AM, "Denis Bardadym" notifications@github.com wrote:

Will it contain also URLSearchParams
https://url.spec.whatwg.org/#interface-urlsearchparams? It will be
quite usefull, and future possible replacement for standart querystring
module.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7448 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAa2eWBvagA0DQQYPKYXYb2MWGRA7ZuWks5qQn2wgaJpZM4I_c8x
.

@jasnell
Copy link
Member Author

jasnell commented Jun 29, 2016

Ok, updated with a rough implementation of the host parsing. Currently only failing 11 of the standard parsing tests... most of which deal with punycode handling. Once #7355 lands, that'll be easier to handle.

@jasnell jasnell force-pushed the new-url branch 2 times, most recently from 8e480ee to 6b0e8a0 Compare June 30, 2016 19:06
@jasnell
Copy link
Member Author

jasnell commented Jun 30, 2016

@nodejs/collaborators @domenic ... this is now passing 100% of the parsing tests and all but two of the setter tests (those two are currently skipped, chrome fails those two also it appears). I have added the URLSearchParams impl and the URL statics domainToASCII() and domainToUnicode().

At this point, this has the appropriate test coverage but the performance is still lacking. The next step after this is to begin working on improving performance without sacrificing correctness. It's likely to not perform as well as require('url').parse() for a while but it's reasonable.

@jasnell
Copy link
Member Author

jasnell commented Jun 30, 2016

First CI: https://ci.nodejs.org/job/node-test-pull-request/3145/
Some failures on various platforms. Will be investigating...

@STRML
Copy link
Contributor

STRML commented Oct 26, 2016

@jasnell Was the old work fundamentally incompatible (as in, could it have been refactored into the URL constructor)? If you don't mind me asking, what drove the development into a C module? Do we expect performance will be better in the long run despite the interop penalty?

@jasnell
Copy link
Member Author

jasnell commented Oct 26, 2016

I went with the native code implementation purely for performance reasons and with a greenfield implementation to ensure as close a match to the current url standard spec as possible. A pure js implementation (which I wrote first) performed significantly slower.

@STRML
Copy link
Contributor

STRML commented Oct 26, 2016

Understood, thanks.

@SimenB SimenB mentioned this pull request Nov 19, 2016
@gibfahn gibfahn mentioned this pull request Jun 15, 2017
3 tasks
TimothyGu pushed a commit to TimothyGu/node that referenced this pull request Nov 28, 2017
Implements WHATWG URL support. Example:

```
var u = new url.URL('http://example.org');
```

Many, many other commits improving the implementation have been squashed
into this backport PR. They are not listed separately here for brevity.

PR-URL: nodejs#7448
Reviewed-By: Ilkka Myller <ilkka.myller@nodefield.com>
MylesBorins pushed a commit that referenced this pull request Jan 18, 2018
Implements WHATWG URL support. Example:

```
var u = new url.URL('http://example.org');
```

Many, many other commits improving the implementation have been squashed
into this backport PR. They are not listed separately here for brevity.

Backport-PR-URL: #17365
PR-URL: #7448
Reviewed-By: Ilkka Myller <ilkka.myller@nodefield.com>
@MylesBorins MylesBorins mentioned this pull request Jan 24, 2018
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
Implements WHATWG URL support. Example:

```
var u = new url.URL('http://example.org');
```

Many, many other commits improving the implementation have been squashed
into this backport PR. They are not listed separately here for brevity.

Backport-PR-URL: #17365
PR-URL: #7448
Reviewed-By: Ilkka Myller <ilkka.myller@nodefield.com>
MylesBorins added a commit that referenced this pull request Feb 11, 2018
This LTS release comes with 109 commits, 17 of which are considered
Semver-Minor. This includes 32 which are doc related, 29 which are test
related, 8 which are build / tool related and 1 commit which updates
a dependency.

Notable Changes:

* console:
  - added console.count() and console.clear() (James M Snell)
    #12678
* crypto:
  - expose ECDH class (Bryan English)
    #8188
  - added cypto.randomFill() and crypto.randomFillSync() (Evan Lucas)
    #10209
  - warn on invalid authentication tag length (Tobias Nießen)
    #17566
* deps:
  - upgrade libuv to 1.16.1 (cjihrig)
    #16835
* dgram:
  - added socket.setMulticastInterface() (Will Young)
    #7855
* http:
  - add agent.keepSocketAlive and agent.reuseSocket as to allow
    overridable keep-alive behavior of `Agent` (Fedor Indutny)
    #13005
* lib:
  - return this from net.Socket.end() (Sam Roberts)
    #13481
* module:
  - add builtinModules api that provides list of all builtin modules in
    Node (Jon Moss)
    #16386
* net:
  - return this from getConnections() (Sam Roberts)
    #13553
* promises:
  - more robust stringification for unhandled rejections (Timothy Gu)
    #13784
* repl:
  - improve require() autocompletion (Alexey Orlenko)
    #14409
* src:
  - add openssl-system-ca-path configure option (Daniel Bevenius)
    #16790
  - add --use-bundled-ca --use-openssl-ca check (Daniel Bevenius)
    #12087
  - add process.ppid (cjihrig)
    #16839
* tls:
  - accept `lookup` option for `tls.connect()` (Fedor Indutny)
    #12839
* tools, build:
  - a new macOS installer! (JP Wesselink)
    #15179
* url:
  - WHATWG URL api support (James M Snell)
    #7448
* util:
  - add %i and %f formatting specifiers (Roman Reiss)
    #10308

PR-URL: #18342
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
Implements WHATWG URL support. Example:

```
var u = new url.URL('http://example.org');
```

Many, many other commits improving the implementation have been squashed
into this backport PR. They are not listed separately here for brevity.

Backport-PR-URL: #17365
PR-URL: #7448
Reviewed-By: Ilkka Myller <ilkka.myller@nodefield.com>
MylesBorins added a commit that referenced this pull request Feb 12, 2018
This LTS release comes with 109 commits, 17 of which are considered
Semver-Minor. This includes 32 which are doc related, 29 which are test
related, 8 which are build / tool related and 1 commit which updates
a dependency.

Notable Changes:

* console:
  - added console.count() and console.clear() (James M Snell)
    #12678
* crypto:
  - expose ECDH class (Bryan English)
    #8188
  - added cypto.randomFill() and crypto.randomFillSync() (Evan Lucas)
    #10209
  - warn on invalid authentication tag length (Tobias Nießen)
    #17566
* deps:
  - upgrade libuv to 1.16.1 (cjihrig)
    #16835
* dgram:
  - added socket.setMulticastInterface() (Will Young)
    #7855
* http:
  - add agent.keepSocketAlive and agent.reuseSocket as to allow
    overridable keep-alive behavior of `Agent` (Fedor Indutny)
    #13005
* lib:
  - return this from net.Socket.end() (Sam Roberts)
    #13481
* module:
  - add builtinModules api that provides list of all builtin modules in
    Node (Jon Moss)
    #16386
* net:
  - return this from getConnections() (Sam Roberts)
    #13553
* promises:
  - more robust stringification for unhandled rejections (Timothy Gu)
    #13784
* repl:
  - improve require() autocompletion (Alexey Orlenko)
    #14409
* src:
  - add openssl-system-ca-path configure option (Daniel Bevenius)
    #16790
  - add --use-bundled-ca --use-openssl-ca check (Daniel Bevenius)
    #12087
  - add process.ppid (cjihrig)
    #16839
* tls:
  - accept `lookup` option for `tls.connect()` (Fedor Indutny)
    #12839
* tools, build:
  - a new macOS installer! (JP Wesselink)
    #15179
* url:
  - WHATWG URL api support (James M Snell)
    #7448
* util:
  - add %i and %f formatting specifiers (Roman Reiss)
    #10308

PR-URL: #18342
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
Implements WHATWG URL support. Example:

```
var u = new url.URL('http://example.org');
```

Many, many other commits improving the implementation have been squashed
into this backport PR. They are not listed separately here for brevity.

Backport-PR-URL: #17365
PR-URL: #7448
Reviewed-By: Ilkka Myller <ilkka.myller@nodefield.com>
MylesBorins added a commit that referenced this pull request Feb 13, 2018
This LTS release comes with 109 commits, 17 of which are considered
Semver-Minor. This includes 32 which are doc related, 29 which are test
related, 8 which are build / tool related and 1 commit which updates
a dependency.

Notable Changes:

* console:
  - added console.count() and console.clear() (James M Snell)
    #12678
* crypto:
  - expose ECDH class (Bryan English)
    #8188
  - added cypto.randomFill() and crypto.randomFillSync() (Evan Lucas)
    #10209
  - warn on invalid authentication tag length (Tobias Nießen)
    #17566
* deps:
  - upgrade libuv to 1.16.1 (cjihrig)
    #16835
* dgram:
  - added socket.setMulticastInterface() (Will Young)
    #7855
* http:
  - add agent.keepSocketAlive and agent.reuseSocket as to allow
    overridable keep-alive behavior of `Agent` (Fedor Indutny)
    #13005
* lib:
  - return this from net.Socket.end() (Sam Roberts)
    #13481
* module:
  - add builtinModules api that provides list of all builtin modules in
    Node (Jon Moss)
    #16386
* net:
  - return this from getConnections() (Sam Roberts)
    #13553
* promises:
  - more robust stringification for unhandled rejections (Timothy Gu)
    #13784
* repl:
  - improve require() autocompletion (Alexey Orlenko)
    #14409
* src:
  - add openssl-system-ca-path configure option (Daniel Bevenius)
    #16790
  - add --use-bundled-ca --use-openssl-ca check (Daniel Bevenius)
    #12087
  - add process.ppid (cjihrig)
    #16839
* tls:
  - accept `lookup` option for `tls.connect()` (Fedor Indutny)
    #12839
* tools, build:
  - a new macOS installer! (JP Wesselink)
    #15179
* url:
  - WHATWG URL api support (James M Snell)
    #7448
* util:
  - add %i and %f formatting specifiers (Roman Reiss)
    #10308

PR-URL: #18342
MylesBorins added a commit that referenced this pull request Feb 13, 2018
This LTS release comes with 112 commits, 17 of which are considered
Semver-Minor. This includes 32 which are doc related, 30 which are test
related, 8 which are build / tool related and 1 commit which updates
a dependency.

Notable Changes:

* console:
  - added console.count() and console.clear() (James M Snell)
    #12678
* crypto:
  - expose ECDH class (Bryan English)
    #8188
  - added cypto.randomFill() and crypto.randomFillSync() (Evan Lucas)
    #10209
  - warn on invalid authentication tag length (Tobias Nießen)
    #17566
* deps:
  - upgrade libuv to 1.16.1 (cjihrig)
    #16835
* dgram:
  - added socket.setMulticastInterface() (Will Young)
    #7855
* http:
  - add agent.keepSocketAlive and agent.reuseSocket as to allow
    overridable keep-alive behavior of `Agent` (Fedor Indutny)
    #13005
* lib:
  - return this from net.Socket.end() (Sam Roberts)
    #13481
* module:
  - add builtinModules api that provides list of all builtin modules in
    Node (Jon Moss)
    #16386
* net:
  - return this from getConnections() (Sam Roberts)
    #13553
* promises:
  - more robust stringification for unhandled rejections (Timothy Gu)
    #13784
* repl:
  - improve require() autocompletion (Alexey Orlenko)
    #14409
* src:
  - add openssl-system-ca-path configure option (Daniel Bevenius)
    #16790
  - add --use-bundled-ca --use-openssl-ca check (Daniel Bevenius)
    #12087
  - add process.ppid (cjihrig)
    #16839
* tls:
  - accept `lookup` option for `tls.connect()` (Fedor Indutny)
    #12839
* tools, build:
  - a new macOS installer! (JP Wesselink)
    #15179
* url:
  - WHATWG URL api support (James M Snell)
    #7448
* util:
  - add %i and %f formatting specifiers (Roman Reiss)
    #10308

PR-URL: #18342
MylesBorins added a commit that referenced this pull request Feb 13, 2018
This LTS release comes with 112 commits, 17 of which are considered
Semver-Minor. This includes 32 which are doc related, 30 which are test
related, 8 which are build / tool related and 1 commit which updates
a dependency.

Notable Changes:

* console:
  - added console.count() and console.clear() (James M Snell)
    #12678
* crypto:
  - expose ECDH class (Bryan English)
    #8188
  - added cypto.randomFill() and crypto.randomFillSync() (Evan Lucas)
    #10209
  - warn on invalid authentication tag length (Tobias Nießen)
    #17566
* deps:
  - upgrade libuv to 1.16.1 (cjihrig)
    #16835
* dgram:
  - added socket.setMulticastInterface() (Will Young)
    #7855
* http:
  - add agent.keepSocketAlive and agent.reuseSocket as to allow
    overridable keep-alive behavior of `Agent` (Fedor Indutny)
    #13005
* lib:
  - return this from net.Socket.end() (Sam Roberts)
    #13481
* module:
  - add builtinModules api that provides list of all builtin modules in
    Node (Jon Moss)
    #16386
* net:
  - return this from getConnections() (Sam Roberts)
    #13553
* promises:
  - more robust stringification for unhandled rejections (Timothy Gu)
    #13784
* repl:
  - improve require() autocompletion (Alexey Orlenko)
    #14409
* src:
  - add openssl-system-ca-path configure option (Daniel Bevenius)
    #16790
  - add --use-bundled-ca --use-openssl-ca check (Daniel Bevenius)
    #12087
  - add process.ppid (cjihrig)
    #16839
* tls:
  - accept `lookup` option for `tls.connect()` (Fedor Indutny)
    #12839
* tools, build:
  - a new macOS installer! (JP Wesselink)
    #15179
* url:
  - WHATWG URL api support (James M Snell)
    #7448
* util:
  - add %i and %f formatting specifiers (Roman Reiss)
    #10308

PR-URL: #18342
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
This LTS release comes with 112 commits, 17 of which are considered
Semver-Minor. This includes 32 which are doc related, 30 which are test
related, 8 which are build / tool related and 1 commit which updates
a dependency.

Notable Changes:

* console:
  - added console.count() and console.clear() (James M Snell)
    nodejs#12678
* crypto:
  - expose ECDH class (Bryan English)
    nodejs#8188
  - added cypto.randomFill() and crypto.randomFillSync() (Evan Lucas)
    nodejs#10209
  - warn on invalid authentication tag length (Tobias Nießen)
    nodejs#17566
* deps:
  - upgrade libuv to 1.16.1 (cjihrig)
    nodejs#16835
* dgram:
  - added socket.setMulticastInterface() (Will Young)
    nodejs#7855
* http:
  - add agent.keepSocketAlive and agent.reuseSocket as to allow
    overridable keep-alive behavior of `Agent` (Fedor Indutny)
    nodejs#13005
* lib:
  - return this from net.Socket.end() (Sam Roberts)
    nodejs#13481
* module:
  - add builtinModules api that provides list of all builtin modules in
    Node (Jon Moss)
    nodejs#16386
* net:
  - return this from getConnections() (Sam Roberts)
    nodejs#13553
* promises:
  - more robust stringification for unhandled rejections (Timothy Gu)
    nodejs#13784
* repl:
  - improve require() autocompletion (Alexey Orlenko)
    nodejs#14409
* src:
  - add openssl-system-ca-path configure option (Daniel Bevenius)
    nodejs#16790
  - add --use-bundled-ca --use-openssl-ca check (Daniel Bevenius)
    nodejs#12087
  - add process.ppid (cjihrig)
    nodejs#16839
* tls:
  - accept `lookup` option for `tls.connect()` (Fedor Indutny)
    nodejs#12839
* tools, build:
  - a new macOS installer! (JP Wesselink)
    nodejs#15179
* url:
  - WHATWG URL api support (James M Snell)
    nodejs#7448
* util:
  - add %i and %f formatting specifiers (Roman Reiss)
    nodejs#10308

PR-URL: nodejs#18342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. semver-minor PRs that contain new features and should be released in the next minor version. url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.