Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default resolve should not throw ERR_UNSUPPORTED_ESM_URL_SCHEME #138

Closed
giltayar opened this issue Apr 26, 2023 · 24 comments · Fixed by nodejs/node#47824
Closed

default resolve should not throw ERR_UNSUPPORTED_ESM_URL_SCHEME #138

giltayar opened this issue Apr 26, 2023 · 24 comments · Fixed by nodejs/node#47824

Comments

@giltayar
Copy link
Contributor

While working on my ESM loaders talk for JSNation and NodeTLV conferences (repo here and draft slides here), I wrote a toy loader that supports HTTP[S]. This is the (obviously) simplistic code:

export async function load(url, context, nextLoad) {
  if (url.startsWith('http://') || url.startsWith('https://')) {
    const response = await fetch(url, {redirect: 'follow'})
    const source = await response.text()

    return {shortCircuit: true, format: 'module',source}
  }
  return await nextLoad(url, context)
}

Unfortunately, this won't work. If you use it, you will get an ERR_UNSUPPORTED_ESM_URL_SCHEME because the default Node.js resolver doesn't support HTTPS. So, unfortunately, I had to write (and demonstrate in my talk) that this loader needs a resolver, even though the only thing it does is support loading HTTPS:

export async function resolve(specifier, context, nextResolve) {
  if (isBareSpecifier(specifier))
    return await nextResolve(specifier, context)

  const url = new URL(specifier, context.parentURL).href

  if (url.startsWith('http://') || url.startsWith('https://'))
    return {url, shortCircuit: true}

  return await nextResolve(specifier, context)
}

Thinking about it more, I realized that the only reason I need to write this resolver is that Node.js doesn't recognize the HTTP scheme in its resolver. But if Node.js would have not recognized the scheme in the loader, then I wouldn't have needed to write this resolver at all.

The same logic goes for ERR_UNKNOWN_FILE_EXTENSION and writing, for example, a TypeScript loader.

So my point is this: both checks for unknown scheme and unknown file extension should move from the default Node.js resolver to the default Node.js loader. This will obviate the need for some loaders to implement a resolve which mostly replicates the Node.js logic, just for the sake of avoiding these exceptions.

I would gladly implement the change myself, but I wanted to check with the loaders team before tackling this task.

@arcanis
Copy link
Contributor

arcanis commented Apr 26, 2023

I'm not sure that's correct - semantically, if the URL scheme is unsupported, then the resolution can't happen. For instance, let's say you have import 'my-protocol://hello', can you guarantee me that the untransformed my-protocol://hello is a valid resolution, if you don't know what my-protocol actually means?

For instance, let's say I write a tool validating that all the import statements in my project are resolvable, and this tool checks it by calling import.meta.resolve. With the change you propose, I wouldn't know that https:// urls are a problem - even though they'd cause the application to crash.

@aduh95
Copy link
Contributor

aduh95 commented Apr 26, 2023

both checks for unknown scheme and unknown file extension should move from the default Node.js resolver to the default Node.js loader

That would have a consequence on import.meta.resolve. Note that when writing a TS loader, I'm using to my advantage the fact that the default resolver throws:

export async function resolve(specifier, context, next) {
  try {
    return await next(specifier, context);
  } catch (err) {
    if (err?.code === "ERR_MODULE_NOT_FOUND" && specifier.endsWith(".js")) {
      try {
        return await next(specifier.replace(/\.js$/, ".ts"), context);
      } catch (err2) {
        if (err2) {
          err2.cause = err;
          throw err2;
        }
        throw err;
      }
    }
    throw err;
  }
}

The same logic goes for ERR_UNKNOWN_FILE_EXTENSION and writing, for example, a TypeScript loader.

To clarify, defaultResolve doesn't throw ERR_UNKNOWN_FILE_EXTENSION on unknown file extension. Would "whenver defaultResolve is given a non file: URL, it returns it immediately without any verification" be a good summary of your proposal?

@giltayar
Copy link
Contributor Author

I'm not sure that's correct - semantically, if the URL scheme is unsupported, then the resolution can't happen. For instance, let's say you have import 'my-protocol://hello', can you guarantee me that the untransformed my-protocol://hello is a valid resolution, if you don't know what my-protocol actually means?

Why not accept all valid URL-s as valid during resolve? They will fail in the load phase anyway.

For instance, let's say I write a tool validating that all the import statements in my project are resolvable, and this tool checks it by calling import.meta.resolve. With the change you propose, I wouldn't know that https:// urls are a problem - even though they'd cause the application to crash.

That's a good point. Also @aduh95's point above, which points out that resolution checks for the existence of a file, and so could not be able to do that for an unknown extension. But @aduh95 proposes a solution to that, which makes sense: throw the exception in default resolve only for non file: extensions.

@aduh95
Copy link
Contributor

aduh95 commented Apr 26, 2023

I feel this discussion is quite similar to the one we had some time ago regarding import attributes, we might need another hoook for loaders to say what import attribute key they support, maybe we could add what protocols the support as well.

@giltayar
Copy link
Contributor Author

To clarify, defaultResolve doesn't throw ERR_UNKNOWN_FILE_EXTENSION on unknown file extension.

You're right! I just modified my toy TS loader and removed the resolve and it works. Huh... I am absolutely sure that this was a problem! 😊 Oh, well. Thanks for pointing this out!

Would "whenver defaultResolve is given a non file: URL, it returns it immediately without any verification" be a good summary of your proposal?

Hmm.... yes! But @arcanis raises a valid concern: semantically, should import.meta.resolve always fail if the file does not exist? In this case, the HTTP loader would be forced to include a resolve that checks for the existence of the file (and maybe caches it knowing that it will be requested immediately after that).

That's a good question. In a way, it's a question for the JS standards group that is working on import.meta.resolve too.

If the answer is "yes", then import.meta.resolve MUST check for the existence, and @arcanis is right. But if the answer is "no", then @arcanis is not "allowed" to use import.meta.resolve to check for the existence of all modules.

@giltayar
Copy link
Contributor Author

@arcanis just checked MDN on this: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta/resolve#description

Therefore, its return value is the same regardless of whether the returned path corresponds to a file that exists, and regardless of whether that file contains valid code for a module.

So you're actually not "allowed" to use import.meta.resolve to verify the existence of a module. This makes sense for the browser because import.meta.resolve is synchronous which means that they can't do any HTTP fetching to verify this.

Having seen this, I would submit that even Node.js shouldn't throw ERR_MODULE_NOT_FOUND in the resolution phase, but only in the loading phase (arguable, but would make sense).

This also means that @aduh95 code that checks for ERR_MODULE_NOT_FOUND in the resolution phase is incorrect when loaders are used, because they may not throw anything. This MUST be checked only in the loading phase.

@aduh95
Copy link
Contributor

aduh95 commented Apr 26, 2023

According to the HTML spec, browsers don't perform any check on the passed specifier string. I just tried to pass import('data:text/javascript,console.log(import.meta.resolve("my-protocol://hello"))') in Firefox, Safari, Chromium, and Deno, none of them threw an error.

@giltayar
Copy link
Contributor Author

According to the HTML spec, browsers don't perform any check on the passed specifier string. I just tried to pass import('data:text/javascript,console.log(import.meta.resolve("my-protocol://hello"))') in Firefox, Safari, Chromium, and Deno, none of them threw an error.

Yup. The only thing they do is check the import map of the current HTML to see if the URL needs to be mapped.

@giltayar
Copy link
Contributor Author

So given the above definition of import.meta.resolve, I believe we should amend the issue to be even stronger:

For any non-bare specifier, the default Node.js resolver should do only absolutization of the URL, and should not verify anything else about the final URL (e.g. it should not verify existence of the file, nor support of the URL scheme).

@GeoffreyBooth
Copy link
Member

I feel like there’s a reason it currently happens in resolve, beyond the performance benefit of erroring before attempting a filesystem lookup (which isn’t all that important, since we don’t expect this error to happen often). Maybe we need the extension to find a valid translator, and that happens before load?

@giltayar
Copy link
Contributor Author

@GeoffreyBooth as @aduh95 showed in his loader code above (and I can attest also happens in my babel-register-esm loader), people rely on the fact that the Node.js resolver checks for the existence of files! And, as @arcanis said, some devs assume that import.meta.resolve should check for the existence of files.

But this doesn't mean that it should be like that. In the browser world, it is explicitly not like that. And my guess is that most HTTP loaders won't check for the existence of the file in the resolve phase.

ESM in Node.js was always about aligning ourselves with the standardand not doing a separate thing. And the spec says not to check for existence of files in import.meta.resolve.

I suggest we align with the standard now, before it's too late and ESM loaders move out from experimental.

(and at least don't check the scheme is supported in the load phase and not the resolve phase 😊)

@giltayar
Copy link
Contributor Author

BTW, not checking for existence of files in the resolve phase, will mean that for relative paths, loading will be faster, because one existSync will be avoided.

Moreover, this will mean that less loaders will have a resolve function, which will, again, improve performance when those loaders are used.

@GeoffreyBooth
Copy link
Member

Okay, want to make a PR?

@giltayar
Copy link
Contributor Author

Yes! I'll go all the way and include not checking for existence of files in the resolve phase, along with not checking if the scheme is supported in the resolve phase.

@giltayar
Copy link
Contributor Author

For fairness' sake, I'd like to note that this will break loaders that use the existing node resolver to check for the existence of files. E.g. my babel-register-esm and @aduh95's loader.

@arcanis
Copy link
Contributor

arcanis commented Apr 28, 2023

For the record, I feel a little uneasy making import.meta.resolve more complicated/ less useful for the sake of browser parity, when browser authors explicitly reject Node use cases in their own designs (see also: the function being sync).

@giltayar
Copy link
Contributor Author

I can do two PR-s. One that removes the check for schema from the default resolver. I think this is the less controversial one, and will benefit loader writers and will benefit loader chaining (see below on why). This PR also won't break anything (I think).

The next PR, which we can all discuss on, is moving the file existence checking from the resolve phase to the loading phase. For the record, I am for moving it, both for browser compatibility, and for performance reasons.

For the record, I do want the second PR to pass, if only for performance reasons.

@giltayar
Copy link
Contributor Author

Why removing the check for file: schema from the default resolver will benefit users:

It's all about loader chaining. Let's say I have two loaders: import-map (which maps specifiers to urls) and http (which loads modules from http). The order of loaders, from a user's perspective, is...

  • import-map (first override the specifier to an http url then...),
  • http (....load the http url, and finally...)

i.e. --loader=import-map,http

Except this doesn't work! Because Node.js (rightly) resolves using the http, which does nothing to a bare specifier, and then invokes the import-map, which overrides it to an HTTP url and passes it on to the default resolver which... fails!

The order that the user needs to use it --loader=http,import-map, unfortunately. The user doesn't understand that the http has its own resolver and so doesn't understand why this should be so.

Now if Node.js in it's default resolver wouldn't check the schema in the resolver, then the order could have been the natural order --loader=import-map,http, which is, from the user's perspective, the natural order.

@arcanis
Copy link
Contributor

arcanis commented Apr 28, 2023

Now if Node.js in it's default resolver wouldn't check the schema in the resolver, then the order could have been the natural order --loader=import-map,http, which is, from the user's perspective, the natural order.

I'm not sure it's a good reason to make the change you propose - it works in the example you provide, but make it slightly more complex and it falls apart. Take the following:

--loader=zip,tgz,node-specifier-resolution

Assuming a setup where each resolve hook call nextResolve, how would it work with the following, where you have a zip inside a tar inside a zip? Would changing the loader order change anything?

/path/to/level0.zip/level1.tgz/level2.zip/src

Regardless of the loader order, something won't work:

  • zip needs to be first, so that node-specifier-resolution can know how to interact with the level0.zip fs
  • tgz needs to be second, for the same reason but with level1.tgz
  • But then you have another zip archive; you also need it to go through the zip layer, but you already passed through it

It perhaps can be tweaked so that the first zip tweaks the fs so that all zip archives can be supported, not only the level0.zip one, but it gets complicated, and isn't always possible.

The proper way to handle this case is to not call the followup loader. In the majority of cases, the resolution should actually be recursive and restart from the start, not from nextResolve. Calling nextResolve should in theory only be called when the current resolver doesn't change the resolution in any way.

As a bonus, doing that also clears most of the issues around resolution order: if resolvers call nextResolve when they don't do anything, and startResolve when they do, then it doesn't matter whether the user passes --loader http,import-map or --loader import-map,http (order only matters when multiple resolvers would want to affect the resolution of the same input string).

@giltayar
Copy link
Contributor Author

@arcanis I'm sorry, but I couldn't follow your example, because I'm not quite sure what zip and tgz loaders are supposed to do.

But regarding this:

The proper way to handle this case is to not call the followup loader

I'm assuming you meant not calling nextResolve. I really hope that loaders will never do what you advise because the downstream loaders may want to do a transformation of the URL, and so need to be always called after the url is really resolved. The example I have in mind is the "mocking modules" loader (e.g. quibble) which adds a query parameter to all the resolved URLs in order to simulate cache clearing.

@arcanis
Copy link
Contributor

arcanis commented Apr 28, 2023

Recommending loaders to call startResolve rather than nextResolve doesn't prevent you from writing a special loader that doesn't, if you have a good reason for that (operating on the final resolution seems a pretty good one).

There will always be exceptions, but the general point remains: significant changes on intermediate resolutions should re-enter the resolution pipeline, not continue from where they are, otherwise they'll put strong constraints on the hook order, as you noticed in your http,import-map example.

@giltayar
Copy link
Contributor Author

What's startResolve?

@arcanis
Copy link
Contributor

arcanis commented Apr 28, 2023

The first resolve function in the chain, just like nextResolve is the next one. It isn't exposed as cleanly yet, I think - the closest match I know is import.meta.resolve, but since import.meta.resolve doesn't take additional parentModuleURL, it's not exactly that.

@giltayar
Copy link
Contributor Author

Why is the first resolve in the chain "privileged"? It seems like you're adding complexity (startResolve) and assymetry to a very simple and symmetrical API.

I have to admit that I couldn't figure out what you were saying about making the loader chain impervious to order. Is this an API change to the loaders? If so, how does it make it not important whether the http loader comes before or after the typescript one?

fasenderos pushed a commit to fasenderos/node that referenced this issue May 22, 2023
Fixes nodejs/loaders#138

PR-URL: nodejs#47824
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
targos pushed a commit to nodejs/node that referenced this issue May 30, 2023
Fixes nodejs/loaders#138

PR-URL: #47824
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
danielleadams pushed a commit to nodejs/node that referenced this issue Jul 6, 2023
Fixes nodejs/loaders#138

PR-URL: #47824
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
MoLow pushed a commit to MoLow/node that referenced this issue Jul 6, 2023
Fixes nodejs/loaders#138

PR-URL: nodejs#47824
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants