Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Meeting: Disambiguation by Syntax Features #263

Closed
1 task done
SMotaal opened this issue Feb 8, 2019 · 16 comments
Closed
1 task done

Meeting: Disambiguation by Syntax Features #263

SMotaal opened this issue Feb 8, 2019 · 16 comments
Labels
events Topic, time, venue, and actual humans…

Comments

@SMotaal
Copy link

SMotaal commented Feb 8, 2019

  • Done — Thanks everyone

This thread is intended to helps us schedule a meeting to address the following:

Over the past week, there has been a number of threads relating to the bottomline scenario of ending up with a source text (string or file) for which there is no explicit parsing goal (mode or type) due to limited out-of-band context from which the execution mode can be inferred.

This topic can be tricky to work through, because how much or how little out-of-band context depends on a lot of moving parts across the design of the module system, and deeply ties to historic and intuitive UX expectations as perceived by the wide range of Node.js users.

I'd like to think that these discussions emerge directly as a result of adding Phase 2-6 to deal with --eval and stdin use cases.

It may be useful to try to close the gaps by setting up a time slot for those interested to align our efforts and address any concerns that might lead to avoidable confusion down the road (imho).

When: Tuesday 4-5 pm (ET) — based on feedback
Where: link will be added


Tentative Agenda:

  • Identify cases related to --eval, stdin and current implementations

  • Consider possible workarounds to avoid parsing-based disambiguation

    • @ljharb Cases where an explicit consumer-provided indicator differs from an author-provided indicator
  • Narrow down (if any) when parsing will be needed

  • Compare and contrast parsing methods

Related Drafts:

Note: The following documents are working drafts.


Meeting

Minutes

@SMotaal SMotaal added the events Topic, time, venue, and actual humans… label Feb 8, 2019
@SMotaal
Copy link
Author

SMotaal commented Feb 8, 2019

It seems that the overlap with 100% attendance for everyone so far will be Tuesday 4-5 PM (ET).

Should we lock the doodle on that time?

@ljharb
Copy link
Member

ljharb commented Feb 8, 2019

I’m a bit confused; the title suggests you want to meet about disambiguating parsing goal based on syntax - which we’ve established is not possible for 100% of cases - but the OP suggests that you want to meet about handling input from places without a filename, which is an orthogonal concern (because even if node tried to guess the parse goal, we’d need an explicit mechanism for the edge cases, at which point the guessing is just a UX enhancement on top of that)

@ljharb
Copy link
Member

ljharb commented Feb 8, 2019

Either way, locking a doodle after a mere number of hours doesn’t seem to accommodate time zones; I’d suggest waiting at least 24 hours.

@SMotaal
Copy link
Author

SMotaal commented Feb 8, 2019

Makes sense... Can you please join to bring context to the insights you outline and make it possible for those interested to ask questions or reframe the concerns.

@ljharb
Copy link
Member

ljharb commented Feb 8, 2019

I’ll try, if time permits; can you clarify the intention of the meeting?

@SMotaal
Copy link
Author

SMotaal commented Feb 8, 2019

I will work on a tentative agenda as soon as I shift back to modules time 👍

@GeoffreyBooth
Copy link
Member

@ljharb it’s related to #262. That proposal includes an opt-in for having Node disambiguate for cases like --eval where the module format is unknown. It acknowledges that there are edge cases where the format cannot be determined, and proposes what to do for such ambiguous code.

@SMotaal
Copy link
Author

SMotaal commented Feb 10, 2019

I closed the doodle for this Tuesday 4-5 pm (ET) and will be posting a link and tentative discussion points soon.

Please feel free to reply with your suggestions, and if possible links to relevant forks, examples… etc.

@SMotaal
Copy link
Author

SMotaal commented Feb 10, 2019

Zoom meeting id: 932-268-356

Link will be added right before.

@SMotaal
Copy link
Author

SMotaal commented Feb 12, 2019

I updated the issue with the links

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Feb 12, 2019

This was a great meeting, thanks @SMotaal for organizing this.

So the summary seems to be (and others please edit this comment as you see fit):

  • We’re all interested in somehow making detection happen in general, at least as a userland solution but potentially as part of Node. Gus and I agreed to try to aim for designing and building a solution that could possibly ship in core, since that would have a higher bar to clear and we can always punt it to userland later if necessary. So let’s assume we’re trying to design a detection solution for core.

  • Detection needs to assume that the code to be detected is an initial entry point. Lots of ESM apps might have non-entry point files with neither import or export statements, so we’d get lots of false positives/negatives if we didn’t assume entry point only. This corresponds to --type=auto (or -a for short) in the entry points proposal.

  • We’re interested in making a function that can be called by Node when it encounters --type=auto/-a, but also in making this function available from a core module. Such a function would take a string as input and would return an object with findings, like confidence level that a file is ESM, etc. A separate utility function would take a file path as input and determine if it’s ESM per our other algorithms (look at extension, then find nearest parent package.json, etc.). The -a version would need to decide what to do based on the findings; if a file is definitively ESM (e.g., contains import or export statements) then it can run as ESM, but we’re debating what to do otherwise.

  • The -a algorithm needs to be based around parsing, not evaluation, since evaluation could cause unintended side effects if code is run twice. Ideally we would use the V8 parser, assuming it’s the fastest for this purpose; and we would submit a PR to it so that it can return formatted info about detected parse goal rather than us simply reading string error messages.

  • The most minimal version of -a would be as follows:

    • Parse input source code.
    • If it is definitively ESM, run as ESM.
    • Else error.

    This isn’t terribly useful, as CommonJS code never executes. This is only very slightly better than just running with --type=module/-m.

    We could decide to replace the “else error” with “execute as CommonJS” but that has its own issues; ambiguously ESM-or-CJS code would then execute as CommonJS.

  • A more useful version of -a, in my opinion, would be as follows:

    • Parse input source code.
    • If it is definitively ESM, run as ESM.
    • If it is definitively CommonJS, run as CommonJS.
    • Else error.

    This is what I think users would expect -a to do, but it has the problem of how to achieve “if it is definitively CommonJS.” As far as I know all we can do is parse to find references to the CommonJS globals (require, module, exports, __filename, or __dirname) but we would have to close off any possibility of the user defining globals before running this code—e.g. node --require 'some-package-that-defines-global-require' --type=auto file.js would be a problem. There are other issues to work out, as @devsnek mentioned, and they may not all be surmountable.

    One “definitely CommonJS” detection is to look for the with keyword, as that’s prohibited in ESM. That would be a way for build tools like Babel to signal that a file is CommonJS, but wouldn’t be very useful otherwise as few users probably use with.

    Also, we could decide to replace the “else error” with either “execute as CommonJS” or “execute as ESM,” if we want to define that behavior in the docs, but the safest approach (at least for the first version of this) is probably to error. Entry points that are ambiguous, neither importing nor requireing anything, should be very rare, and so erroring on them probably won’t bother too many users.

  • Next step is to try to work out a -a algorithm at least in psuedocode, that is robust enough that people think it could ship in core.

@bmeck
Copy link
Member

bmeck commented Feb 13, 2019

I'd note that using the this keyword in any non-trivial way also probably shows something to be CJS since in ESM it is undefined, and sloppy IIFEs using this also do as well.

We need to be careful about what we consider to be denoting ESM as well as whatever we use as a heuristic may change over time (since it is all guesswork). My main concern here is a personal desire to propose import.meta coming to Script sometime in the future so that we can unify some workflows, but more may arise and more ambiguity may increase as cases where modules that do not use static imports or exports potentially become more useful.

@GeoffreyBooth
Copy link
Member

@devsnek Do you want to post your list of concerns somewhere (maybe a Google doc or something like that, rather than on this thread) of detecting CommonJS when it’s an entry point? I agree detecting it when it’s not an entry point is basically impossible (ditto for ESM, for that matter) but I feel like we should be able to figure out something for entry points.

@devsnek
Copy link
Member

devsnek commented Feb 14, 2019

if you are 100% certain its the entry, you can try building up a fake lexical environment and tracking require being assigned and variables being assigned as require (and the same for module and exports, this, etc). you'll also need to do sloppy mode checks (checking for with, this in functions, iifes, etc)

@GeoffreyBooth
Copy link
Member

For the purposes of -a, yes, we're certain it's the entry. We can choose to make -a unavailable in the event that a loader or -r is used, if we want to be paranoid, but I'm not sure if we should go that far (versus trusting the user). We could even scan all files in a loader or required library to check for assignment of globals like require.

I think let's worry about -a before we deal with the function version. Since the function could return a confidence rather than yes/no, it can be a lot less bulletproof.

Aside from writing an algorithm to parse the AST and definitively detect usage of CommonJS globals (as opposed to user-supplied ones) were there any other concerns? Because I'm sure we could write such a detection function.

@SMotaal
Copy link
Author

SMotaal commented Feb 22, 2019

I think we can close this, right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
events Topic, time, venue, and actual humans…
Projects
None yet
Development

No branches or pull requests

5 participants