Skip to content
This repository has been archived by the owner on May 5, 2023. It is now read-only.

HTTPS.Request() does not accept WHATWG URL #26

Closed
JonathanWilbur opened this issue Feb 19, 2019 · 12 comments
Closed

HTTPS.Request() does not accept WHATWG URL #26

JonathanWilbur opened this issue Feb 19, 2019 · 12 comments
Assignees

Comments

@JonathanWilbur
Copy link

NodeJS has a new URL object: the WHATWG URL. This library was clearly designed for what is now known as the "Legacy URL" API. See: https://nodejs.org/docs/latest-v8.x/api/url.html.

When https.get in patch-core.js is called with a new WHATWG URL object, the call to Object.assign() does not copy over things like hostname and protocol and port, because, unlike with the Legacy URL API, these are getters.

The result of the call to Object.assign() using the Legacy URL API looks like this:

{ protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'www.sec.gov',
  port: null,
  hostname: 'www.sec.gov',
  hash: null,
  search: null,
  query: null,
  pathname:
   '/Archives/edgar/data/21344/000002134418000008/a2017123110-k.htm',
  path:
   '/Archives/edgar/data/21344/000002134418000008/a2017123110-k.htm',
  href:
   'https://www.sec.gov/Archives/edgar/data/21344/000002134418000008/a2017123110-k.htm' }

The result of the call to Object.assign() when using the newer WHATWG URL API looks like this:

{ [Symbol(context)]:
   URLContext {
     flags: 400,
     scheme: 'https:',
     username: '',
     password: '',
     host: 'www.sec.gov',
     port: null,
     path:
      [ 'Archives',
        'edgar',
        'data',
        '21344',
        '000002134418000008',
        'a2017123110-k.htm' ],
     query: null,
     fragment: null },
  [Symbol(query)]: URLSearchParams {} }

This means that, when https.get() is called with the newer WHATWG URL, an object that does not contain all of the mandatory parameters is passed into the standard library's https.get(). As it turns out, this incomplete object manages to slip through, because it has the one mandatory property that the HTTPS module checks for to determine if it is a WHATWG URL:

  } else if (args[0] && args[0][searchParamsSymbol] &&
             args[0][searchParamsSymbol][searchParamsSymbol]) {
    // url.URL instance
    options = urlToOptions(args.shift());
  }

Then, when urlToOptions() in the HTTPS module is called, it crashes, because hostname is undefined:

function urlToOptions(url) {
  var options = {
    protocol: url.protocol,
    hostname: url.hostname.startsWith('[') ?
      url.hostname.slice(1, -1) :
      url.hostname,
    hash: url.hash,
    search: url.search,
    pathname: url.pathname,
    path: `${url.pathname}${url.search}`,
    href: url.href
  };
  if (url.port !== '') {
    options.port = Number(url.port);
  }
  if (url.username || url.password) {
    options.auth = `${url.username}:${url.password}`;
  }
  return options;
}

The Solution

Make https.request in patch-core.js check for the properties of the WHATWG URL and use a different methodology to clone it if it is.

@BYK
Copy link
Collaborator

BYK commented Jun 5, 2019

@JonathanWilbur thanks a lot for the accurate and detailed diagnosis! Would you be willing to submit a PR that fixes this? (It's okay if you cannot, just asking before I try to take a stab at this myself :))

@ljw1004
Copy link

ljw1004 commented Jun 12, 2019

@JonathanWilbur Thank you for filing this!

I'm trying to use Google's cloud-storage API. My code had been working fine, making its https requests. But once I added require(@google-cloud/storage) then suddenly the https requests started getting "connect ECONNREFUSED 127.0.0.1:443".

It boils down to the fact that @google-cloud/storage ultimately depends upon require("agent-base"), and I was passing URL objects as in the code below.

The workaround (until this bug is fixed) was simple: I just switch from https.get(new URL("https://alterslash.org/"), ...) to https.get(new URL("https://alterslash.org/").href, ...)

var https = require("https");
require("agent-base");

var req = https.get(new URL("https://alterslash.org/"), response => {
  console.log(response.statusCode);
  if (response.statusCode !== 200) return response.resume();
  response.on("data", () => console.log("data"));
  response.on("end", () => console.log("end"));
});
req.on("error", e => console.error(e.message));

@JonathanWilbur
Copy link
Author

@BYK I do not plan on fixing this bug. I am just too busy with other projects I consider to be of higher priority. Just letting you know so you can respond.

@BYK
Copy link
Collaborator

BYK commented Jun 13, 2019

I'll try to look into this over the weekend then. Thanks again @JonathanWilbur!

@ljw1004
Copy link

ljw1004 commented Jun 15, 2019

Just to note, I believe there's an addition separate bug beyond the "Object.assign failing to copy all fields" issue that @JonathanWilbur mentioned. Look at this code from patch-core.js:

https.get = function (_url, _options, cb) {
    let options;
    if (typeof _url === 'string' && _options && typeof _options !== 'function') {
      options = Object.assign({}, url.parse(_url), _options);
    } else if (!_options && !cb) {
      options = _url;
    } else if (!cb) {
      options = _url;
      cb = _options;
    }

Consider a call to https.get(new URL(s), options, cb). Then it won't go down any of the three branches, won't call Object.assign, and local variable options will be left unassigned.

@ljw1004
Copy link

ljw1004 commented Jun 18, 2019

And there's a third separate bug. Look at this code from patch-code.js:

https.request = (function(request) {
    return function(_options, cb) {
      let options;
      if (typeof _options === 'string') {
        options = url.parse(_options);
      } else {
        options = Object.assign({}, _options);
      }
      if (null == options.port) {
        options.port = 443;
      }
      options.secureEndpoint = true;
      return request.call(https, options, cb);

Consider a call to https.request(new URL(s).href, options, cb). This patch code will invoke an underlying https.request(href, options) and completely ignore the callback.

Consider a call to https.request(new URL(s), options, cb). This patch code will think that the URL is an options field, will assign it to the temporary options, and will fail to supply the callback.

Once again, the impact is that my app used to be able to use https.request just fine, but as soon as I added require(@google-cloud/storage) then suddenly the https requests started getting "connect ECONNREFUSED 127.0.0.1:443".

@BYK
Copy link
Collaborator

BYK commented Jun 18, 2019

@ljw1004 - do you think you can provide me some test cases to work off of for a fix?

@sadasant
Copy link
Collaborator

sadasant commented Jul 6, 2019

Hi! I'm new here. What's the solution? Instead of Object.assign, what should we use? With some help I can get this out asap ❤️

sadasant added a commit that referenced this issue Jul 6, 2019
This is an attempt to support WHATWG URLs.
Feedback is appreciated!

Fixes #26
@sadasant
Copy link
Collaborator

sadasant commented Jul 6, 2019

@BYK, @ljw1004, @JonathanWilbur Hi! I did this PR to attempt to fix this issue: #28

Feedback is appreciated! ✨

@BYK
Copy link
Collaborator

BYK commented Jul 8, 2019

Thanks a lot @sadasant! I added a few comments to the PR and I hope we can merge the PR soon and help all others! :)

@sadasant
Copy link
Collaborator

sadasant commented Aug 2, 2019

I haven't had time to continue with this! I'll do my best to have an update in the near future.

@TooTallNate
Copy link
Owner

Please try out v5.0.0, which removes the https.request() patching.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants