-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: In an ESM-first mode, how should extensionless entry points be handled? #49431
Comments
Perhaps it can go like this:
I don't really understand what "failure to parse as ESM JavaScript" means here (maybe "no Default JS means whatever is defined by package scope, or fallback to internal default. Fallback to ESM adds convenience at the cost of breaking change if we add anything else later. I feel -0 about idea of using magic overall: at least because this method can give false positive on arbitrary input data, or we may run into potential collisions if we add more detectable-by-file-contents formats in the future. |
So I did a little experiment in the node
Welcome to Node.js v20.5.1.
Type ".help" for more information.
> fs.readFileSync('./test/fixtures/simple.wasm', 'utf8')
'\x00asm\x01\x00\x00\x00\x01\x07\x01`\x02\x7F\x7F\x01\x7F\x03\x02\x01\x00\x07\x07\x01\x03add\x00\x00\n\t\x01\x07\x00 \x00 \x01j\x0B' Those Since all Wasm files start with these bytes, and these bytes always fail to parse as JavaScript, I think it’s safe to say that it’s impossible to have an ambiguous file that might be runnable as either Wasm or JavaScript. Or put another way, trying to parse any Wasm file as JavaScript will always fail. So therefore I think the last part of your algorithm, where we’re down to the “no file extension present” section, can go like this:
Because we’re trying JavaScript first, we’re free to expand the list of supported binary file types in the future, presuming they each have magic bytes or defined headers. We’ll never be able to support a non-JavaScript string source type as an extensionless file, but I think that’s okay. |
@GeoffreyBooth do you have a |
Can you explain in particular why my proposed flow wouldn’t work? Not saying I have high confidence in it, but I think it should work, and it would be deterministic. Here’s how I would phrase it for the docs:
It’s basically like try/catch with opening files. |
Even if we assume that we can implement check if file contents are ES module, right now we're discussing only two formats, and they are mutually exclusive. Because of exclusivity, it doesn't matter in which order do we check, and it should be implemented in "Wasm -> ESM" order in code anyway, because complexity and performance cost is incomparable. I think, the order of actions should be like this:
Between 1 and 2 there is no breaking change, because trying to launch extensionless Wasm after 1 will throw because it can never be a valid JavaScript code. Hence, I think 1 has good chances to land first. 2 adds guessing by magic number, so this change might be controversial and meet some opposition. 3, if there is less simple check (for example, check if code is not ESM but CJS or TS), it will be even more controversial. Two questions I have at this point:
|
What would a "wasm entry point" mean? Wasm can't do anything unless provided with an environment, which is normally done from JS. I guess the idea would be to provide the WASI interface? But that still doesn't provide a way to import any JS things, so it would basically just be |
This is true; however since the vast majority of cases will be JavaScript, shouldn’t we try that first before attempting to read the magic bytes? As in, optimize for the common case?
Agreed. I don’t know what other formats there might even be;
We have a few cases within the ESM loader where inside the error path, we do some additional checks to try to give a more informative error, like for example suggesting changing What I was thinking was that when we try to evaluate the entry point, if it errors we would catch that error, and then check for magic bytes and if found, run as Wasm; but if the magic bytes aren’t found, we just re-throw the original JavaScript error. The user doesn’t need to know about the Wasm check. If the magic bytes are found, the original error is discarded and we try to evaluate as Wasm and potentially throw any errors generated from that.
Within a |
It depends on if we can try it without any overhead for JavaScript. I would assume that it wouldn't be easy to achieve, if possible at all. Even if it's feasible, this can be discussed in a PR that adds the guessing by magic number and/or syntax error, so we can measure complexity and do benchmarks. It's a purely internal implementation, the outcome will be the same anyway. Documenting this as "switching on [magically guessing the type]" without explicit order can also work.
I don't see a problem here. After first change (I'll go ahead and make a PR), it will be symmetrical: extensionless inside Running Wasm-alike file as CJS should throw SyntaxError. If we'll want to have extensionless Wasm work in |
Building off of #49295 (comment) and #31415, we’re considering a new mode, probably enabled by flag, where all of the current places where Node defaults to CommonJS would instead default to ESM. One of the trickiest questions to answer for defining such a new mode is how extensionless entry points should be handled.
In an early version of the ESM implementation, extensionless entry points followed the same interpretation of
.js
files; if they were in a"type": "module"
package.json
scope, they were run as ESM, otherwise they were run as CommonJS. They couldn’t be referenced viaimport
; extensionless support was limited to entry points, to handle the CLI tool use case without complicating the resolution algorithm. #31415 removed this ESM extensionless entry point support because there was desire to someday permit extensionless entry points to be not just ESM JavaScript but also Wasm. We haven’t followed up since.There is a strong desire from users for the ability to write CLI apps or “shell scripts” using ESM, without needing things like symlinks or CommonJS wrappers in order to enable extensionless entry points being interpreted as ESM. I can think of two potential ways to enable such, which we could limit to being enabled by our new flag:
Extensionless is just always ESM in a
"type": "module"
scope (or in this new ESM-first mode, outside of a"type": "commonjs"
scope). There’s simply no ability for extensionless Wasm entry points; Node.js is a JavaScript runtime, after all. This tracks how there’s also currently no support for extensionless native/.node
-file entry points.Use the Wasm magic byte to disambiguate between Wasm and ESM entry. Either always check for magic byte before trying to parse, or check after failure to parse as ESM JavaScript.
Are there other approaches?
I think we need to find some way to enable this use case, either behind this new “ESM by default” flag or in Node generally. It keeps coming up and the pressure from users will only increase as ESM adoption continues. @LiviaMedeiros @nodejs/loaders @nodejs/wasi @nodejs/tsc
The text was updated successfully, but these errors were encountered: