Tooling: Brainstorming ideas that can lead to efficient loader-oriented designs #203

SMotaal · 2018-10-17T12:07:44Z

Having both ecmascript-modules and @jkrems hackable loader has opened up tremendous scope for experimentation.

Note: This thread does not make claims for or against existing tooling, some of which have stood the test of time, evolved, and are fixtures of the ecosystem. The intent is simply to consider different perspectives being explored in experimental efforts.

As far as things go, the broad range of tooling that applies to loaders basically iterates over productions in each source, irrespective of the specifics of implementation or operations.

Most tools are designed to be used for much more complex applications than merely loading. To that effect, they often avoid the use of new language features that would prevent them from working on older platforms. They can also avoid new features which may have been prematurely associated with inefficiencies in early stages. Some are also built with infrastructures or features that are not ideal or not optimized specifically for loading, like using workers, verbose error checking (ie as a language service)... etc.

I would like to dedicate this thread to brainstorming experimental or just different ideas to implement related patterns for loader-first designs.

Brainstorming: A safe place to discuss ideas and provide constructive feedback

How to contribute

Please avoid emoting that can be confusing (especially if it can construed as passively aggressive)

😄	Indication
👍	To indicate a "Yes" response
👎	To indicate a "No" response
🎉	To indicate a "Aha" moment

Read the Digest

The following is a set of ideas or conclusions curated from the discussions:

Syntax Detection (CJS vs ESM)

Safely using RegExp — @SMotaal
- requires guarding against string hijacking — @devsnek
- recommend using acorn instead — @devsnek
Fallback for ESM without import and export — @targos
- shouldn't use import(…) to resolve ambiguity — @bmeck
- can use import.meta — @bmeck
Dual parsing a module was deemed inefficient — @MylesBorins
- consider @jdalton's top-level parse — @bmeck
- consider bailing-early — @SMotaal
- consider using single comparative scans — @SMotaal

Syntax Identification (CJS vs ESM)

Mime type meta data via something like webpackage — @jkrems
- may rely on package.json — @GeoffreyBooth
- may borrow from the idea of service-worker scopes — @SMotaal
Magic bytes — @jkrems

Wrapping CJS in an ESM module system

Sometimes CJS is not "use strict" — @benjamingr
- requires evaluating wrapper in non-strict — @SMotaal
- can use multiple SourceTexts — @bmeck

The text was updated successfully, but these errors were encountered:

SMotaal · 2018-10-17T12:53:18Z

ECMAScript modules syntax can arguably be detected using a RegExp which bails on first match.

Does anyone have ideas for cjs vs esm syntax detection?

devsnek · 2018-10-17T12:57:44Z

@SMotaal you can't use regexp to parse js grammar (you can always make a pattern of string literals or whatever to confuse the regexp) and the differences between valid cjs and valid esm are ambiguous and can't be reliably detected by just looking at the code.

SMotaal · 2018-10-17T13:01:15Z

you can always make a pattern of string literals or whatever to confuse the regexp

So, can we constructively say that so long as you guard against string hijacking (maybe there is a better term for this), only then can you safely use RegExp?

devsnek · 2018-10-17T13:02:18Z

@SMotaal I would just use acorn

SMotaal · 2018-10-17T13:11:49Z

@devsnek humor me in this effort, consider this both an idea-gathering as well as a team-building exercise. Acron is obviously a great solution, but I am trying to create opportunities for people to talk about the aspects that make this and others such great tools. The notion here is that people might just have some evolving ideas that they might want to bounce around. How we connect the dots, like you pointing out the hijacking limitation can potentially inspire untapped solutions to existing problems.

Sounds fair?

targos · 2018-10-17T13:21:03Z

@SMotaal You could say that a file with import or export syntax is probably an ES Module (the syntax is invalid in Script mode). However, the problem is that files without import and export could be either Script or Module, and depending on how they are written, could have different behaviour in Script vs Module mode.

For example:

test = 42;

In Script mode, this creates the property test on the global object.
In Module mode, this throws a ReferenceError.

benjamingr · 2018-10-17T13:22:21Z

@targos does the issue get any better if we say that such a loader always imports CJS in strict mode regardless of an explicit "use strict"?

devsnek · 2018-10-17T13:25:00Z

are we trying to come up with use cases for loaders or something else?

if you're using a resolve loader hook you'll always be able to read the contents of whatever you're resolving, at which point you can regex or acorn or whatever it as you see fit.

targos · 2018-10-17T13:27:07Z

I'm having trouble to see the relation between 'cjs vs esm syntax detection" and the OP. Maybe I don't really understand what this thread is about, sorry.

SMotaal · 2018-10-17T13:55:56Z

@targos Actually, I think you are hitting the nail with pointing out that:

without import and export [a file] could be either Script or Module

So would it be possible to say that when dealing with ambiguous code, syntax-based detection is possible for ECMAScript Modules (ie having those explicit syntaxes import and export) as long as there is a mechanism to fallback on when those features are not present.

Sounds right?

devsnek · 2018-10-17T14:00:00Z

@SMotaal you could always fall back to your own opinions of what the file should be but its impossible to know the author's intent.

i agree with targos that i have no idea what this thread is for.

SMotaal · 2018-10-17T14:11:05Z

@devsnek The ideas you are all expressing here are extremely valuable, they allow others to actually learn or at least consider a different perspective. It also makes it easier for people to be able to better appreciate and understand intent in future discussions. I think that the biggest problem is not that people disagree, this is actually not bad, but more so that sometimes we tend to do but end up arguing in two separate directions due to miscommunication and misunderstanding.

SMotaal · 2018-10-17T14:19:41Z

@benjamingr I might be mistaken, but I believe that it is possible to evaluate non-strict code. While I am not certain how --experimental-modules handle it, I believe that if the wrapper function expression is evaluated in a non-strict context, it will only be strict if "use strict" is in the body of the wrapped module. I played around with this a bit when experimenting with realms which is stage 2 and still actively being updated.

SMotaal · 2018-10-17T14:29:49Z

I'm having trouble to see the relation between 'cjs vs esm syntax detection" and the OP. Maybe I don't really understand what this thread is about, sorry.

@targos Until we actually figure out how the Modules WG will handle source ambiguity, it can be helpful to explore (maybe even POC) the various ways to achieve it. Thinking of any of those ideas as either core vs extensions is premature, but that should not discourage efforts of reasoning about it and trying to find ways to refine them irrespective of where those aspects end up.

MylesBorins · 2018-10-17T14:33:34Z

Try to keep in mind performance when exploring options for handling ambiguity, as well as the fact that this space has been explored extensively during the prior EPS process. Dual parsing a module was deemed inefficient

…

On Wed, Oct 17, 2018, 10:29 AM Saleh Abdel Motaal ***@***.*** wrote: I'm having trouble to see the relation between 'cjs vs esm syntax detection" and the OP. Maybe I don't really understand what this thread is about, sorry. Until we actually figure out how the Modules WG will handle source ambiguity, it can be helpful to explore (maybe even POC) the various ways to achieve it. Thinking of any of those ideas as either core vs extensions is premature, but that should not discourage efforts of reasoning about it and trying to find ways to refine them irrespective of where those aspects end up. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#203 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAecV8LGBqH_HEYEFLdQMt3qtlAgrUsSks5ulz7egaJpZM4Xjw5D> .

devsnek · 2018-10-17T14:49:52Z

@SMotaal so we're discussing how the default loader should handle source ambiguity?

SMotaal · 2018-10-17T14:57:02Z

@MylesBorins I think it may be important to know more about the dual parsing approach. I state this not to suggest that dual parsing is or is not a solution, but rather to see if a different parsing approach may be something worth exploring.

From my own research (which I know is relatively limited to other folks in this space), I often find the common pattern of tokenizing into ASTs, which in many cases seems to be an eagerly contiguous process, which makes sense for many things, especially for transforms. In contrast loader-first tokenization (AST or not) may be more efficient if it bailes out on the first conclusively deterministic feature, and more so if it is possible to have a non-binary intent which would allow a single scan to be used.

Can you shed some light on the methodology? (maybe a link to follow-up)

SMotaal · 2018-10-17T15:01:01Z

@SMotaal so we're discussing how the default loader should handle source ambiguity?

@devsnek I think of this as a parallel discussion altogether, not intended to directly affect other discussions that deal with the specifics of the default loader... etc. That said, there is no harm if we end up drawing some conclusions that positively influence our process in general.

bmeck · 2018-10-17T15:03:29Z

@MylesBorins the inefficiency is tolerable as @jdalton shows with a top level parse, which would be much faster though if v8 directly supported such mechanisms. However, as the language increases, there are a few concerns:

Some heuristics may fail/be unreliable as features get added to different modes:

Some Modules may only use import(), which is available in both goals. What should we do with this?
Currently import.meta is only in Module, but certainly could be proposed to come to Script. If it gets added to Script would that mean that a Source Text would change from a Module to Script because the language added a feature to Script?

We could probably think of more as we desire, but the idea of what to do in ambiguous cases seems a bit beyond scope of tooling itself, these would need to be definitive answers that we can provide a direct answer to as they come up.

@SMotaal Per the question about how does the current loader load non-strict code. It uses multiple Source Texts, it does not create a single string that has both Module and CJS code. You cannot inline a sloppy source text into ESM without using Function which would not have direct access to local variables and they would need to be passed in. Ideally, we could avoid using Function to avoid double parsing the same string somehow and violates some people wishing to prevent JS based codegen for security reasons (see things like CSP or v8's SetAllowCodeGenerationFromStringsCallback), but someone may think of a reason why it would be useful to keep.

jkrems · 2018-10-17T15:03:59Z

Things that are possible:

Mime type meta data via something like webpackage - e.g. moving away from files on disk for dependencies. This should also allow for faster loading because you're not trying to load millions of tiny files individually.
Magic bytes. This works for WASM, to a lesser degree for JSON, not really for JS (script vs. CJS vs. module).
A magical bridge protocol, query/hash param, or some other import-site mechanism. Downside: This introduces n bugs for n imports of the same file.

Agreed with above - there's things that are really hard to figure out and "run CJS with implicit strict" might work for app code but not for dependencies. We tried. It breaks with things like:

if (cond) {
  /* [...] */

  function myHelper(el) { /* [...] */ }
  someArr.forEach(myHelper);
}

The above will throw in strict mode IIRC and this pattern does appear in real (popular) npm modules.

SMotaal · 2018-10-17T15:34:18Z

It uses multiple Source Texts, it does not create a single string that has both Module and CJS code. You cannot inline a sloppy source text into ESM without using Function which would not have direct access to local variables and they would need to be passed in.

@bmeck Absolutely… I was inspired by this approach in the early days of --experimental-modules and found a lot uses of for it beyond CJS in a more general sense.

@benjamingr does that align with the concerns you raised?

GeoffreyBooth · 2018-10-17T15:36:18Z

@SMotaal if you search this repo for “unambiguous grammar” or “unambiguous syntax” you'll find lots of discussion on this topic.

The webpackage idea from @jkrems does give me one idea though: what if an import statement of a file always imports as ESM, and it's only importing of packages where importing of CommonJS is possible? The package.json is a metadata file about the package, capable of holding properties like module parse goal. It's much more capable as a metadata repository than a file extension is. And if someone wants to import a loose CommonJS file into an ESM module, well, we built createRequireFromUrl for that.

jkrems · 2018-10-17T18:36:20Z

Side note: I also dislike that a single --loader array means there's now an order dependence for when exactly which loader needs to be passed. In a world where there's phases, they could be passed in any order.

bmeck · 2018-10-17T18:41:20Z

@jkrems

Side note: I also dislike that a single --loader array means there's now an order dependence for when exactly which loader needs to be passed. In a world where there's phases, the could be passed in any order.

The load order is still required even with phases, if one phase loader always guess the type to be text/javascript and doesn't properly delegate to another that would guess it to be application/wasm, flipping the order of those loaders would still mean a change in behavior. Phases do not fix load ordering, we must rely on users to properly configure things.

Yes, but the question was about adding support for WASM. Or HTML. Or binary AST. Or anything else that isn't a simple compilation into an equivalent ESM source text. The resource fetch can get the data but that isn't the actually interesting bit for those. The interesting bit is taking the data and turning it into something that can be linked into the module graph (in the above: an init hook).

I don't understand how this relates, like I said, any supported format that Node can link into a graph works. This is unrelated and doesn't need a separate phase. Even with an init phase like you propose, if ESM linking cannot directly integrate with a WASM module because the host doesn't provide a way, you still must create a facade in your proposal.

Also, the webpackage example should start with http://some-url-in-the-package. In your example - what would the webpackage loader receive?

I don't understand this. Webpackage could support file: last I saw, I can update the strings to have file: in them I guess in the example.

jkrems · 2018-10-17T19:46:55Z

I don't understand this. Webpackage could support file: last I saw, I can update the strings to have file: in them I guess in the example.

But why would it be limited to file: URLs? Especially since those would risk conflicting with real on-file URLs. A portable webpackage provided by a registry should either use HTTPS URLs (that could even resolve potentially) or a custom scheme. Reusing file: would mean that you'd end up hitting the disk for every file first and worst case even load something.

jkrems · 2018-10-17T19:51:53Z

Phases do not fix load ordering, we must rely on users to properly configure things.

They do fix ordering for unrelated concerns, like fetching a resource and actually interpreting it.

any supported format that Node can link into a graph works.

So the disagreement is if init should be exposed or not, not if it is a separate phase. Because if init is hard-coded to a well-known list of supported module types, then it's still there, just not configurable.

bmeck · 2018-10-18T17:11:41Z

So the disagreement is if init should be exposed or not, not if it is a separate phase. Because if init is hard-coded to a well-known list of supported module types, then it's still there, just not configurable.

I'm saying init doesn't make sense as you are explaining it, you can't make the VM accept new unknown Module types into the graph. Same way, you can't just make new Module types work in Node. There is always a minimal set. CoffeeScript modules could compile to WASM or JS, it doesn't matter, but we can't suddenly make V8 accept something like JVM bytecode and have it act as a Module without turning it into a supported Module type.

bmeck · 2018-10-18T17:12:44Z

But why would it be limited to file: URLs? Especially since those would risk conflicting with real on-file URLs. A portable webpackage provided by a registry should either use HTTPS URLs (that could even resolve potentially) or a custom scheme. Reusing file: would mean that you'd end up hitting the disk for every file first and worst case even load something.

It isn't? It accepts the full specifier and id of the module loading some dependency, I would expect there to be no constraints except that the id should be unique, and the specifier is a string.

jkrems · 2018-10-19T01:53:00Z

It isn't?

Ah, I misread your example code. My bad.

but we can't suddenly make V8 accept something like JVM bytecode and have it act as a Module without turning it into a supported Module type.

Yes, but turning it into a supported Module type doesn't necessarily mean turning it into a supported module type source code. E.g. for WASM (or for the JVM bytecode example actually), you would realistically analyze/compile the resource content first to determine the interface, then generate a facade, and then expose the compilation result inside of the module. Trying to inline the original bytes in the source text and then recompiling on execution would be fairly inefficient and in some cases not practical. The only alternative I can think of is globals and unique ids but that's not really a proper solution.

For me CoffeeScript isn't the target I'd want to optimize for. If what your loading can easily be converted into self-contained JS code on the fly, it might just as well have been compiled ahead of time. The same isn't true for things that do not compile to JS and have different execution semantics. One example would be importing a DLL for example.

bmeck · 2018-10-19T15:35:20Z

Yes, but turning it into a supported Module type doesn't necessarily mean turning it into a supported module type source code. E.g. for WASM (or for the JVM bytecode example actually), you would realistically analyze/compile the resource content first to determine the interface, then generate a facade, and then expose the compilation result inside of the module. Trying to inline the original bytes in the source text and then recompiling on execution would be fairly inefficient and in some cases not practical.

We already have an example that doesn't do inline based transformation, currently our CJS translator is creating a separate module record and just loading in the CJS without inlining it. It doesn't recompile on execution at all currently.

The only alternative I can think of is globals and unique ids but that's not really a proper solution.

Modules will need unique ids anyway in order to ensure the (module, specifier) pair is unique. I'm not sure how any other solution could be "proper" since without unique ids that makes the pair unable to correctly have a 1-1 relationship with an import.

For me CoffeeScript isn't the target I'd want to optimize for. If what your loading can easily be converted into self-contained JS code on the fly, it might just as well have been compiled ahead of time. The same isn't true for things that do not compile to JS and have different execution semantics. One example would be importing a DLL for example.

It isn't just CoffeeScript that does JS compilation; historically code coverage has done this (no longer!!!), eslint certainly could be useful to enforce at boot time, development runs without having 2 commands for build vs run, etc.

It certainly isn't the only thing we should optimize for, but it is part of it. If the concern is mostly around avoiding duplicate parse/eval phases that is something we can design around, but I don't see how init solves this in any new way.

jkrems · 2018-10-19T15:40:10Z

init allows us to officially support initializing a module using "real" APIs. E.g. module.setLazyDynamicExports(exportLists, getExports) or whatever the final API could look like. With loader hooks that can just returns bytes, this will always be somewhat awkward and indirect. Afaik our CJS translator isn't implemented as a loader hook that just spits out bytes..?

bmeck · 2018-10-19T15:43:23Z

@jkrems

Afaik our CJS translator isn't implemented as a loader hook that just spits out bytes..?

Correct. It currently doesn't, but if we wanted to we could rewrite it to do so. I'm not sure if that information is for or against anything given that.

init allows us to officially support initializing a module using "real" APIs. E.g. module.setLazyDynamicExports(exportLists, getExports) or whatever the final API could look like.

If you follow the Realms proposal there are fewer JS APIs being considered and most interactions for things are being moved to be purely string based. I'm not sure what APIs are being talked about here.

SMotaal · 2018-10-23T11:53:14Z

As I catch up on this thread, I am appreciating how everyone tries to follow a more brainstorming approach to allow everyone to pose ideas to see how they materialize (or not) later on.

I think this type of discussion helps people with very diverse backgrounds, experiences, and extents of familiarity with the intricacies of ESM and CJS to mutually share and gain insights that are sometimes missed during goal-oriented debates.

SMotaal · 2018-10-23T12:52:44Z

On the idea of top-level parsing to disambiguate JS sources. I took some time to put together an experiment to roughly demonstrate the relative costs associated with different parsing strategies.

The gist of it is that a parser would bail out at the first occurence of a particular syntax, where it will parse through the entire file length otherwise, using as little grammars as possible for a safe parse. The current experiment does not bail out, it simply identifies escapable entities that can be used for hijacking, contextualizing symbols, and the set of keywords that would satisfy the condition.

I added new parsing modes to the experimental parser "esm", "cjs" and "esx". In "esm", the parser will operate in strictly top-level and only look for the keywords import, export, from, as (for completeness). In "cjs", it will parse deep and look for keywords module and exports (though they are really not keywords, still working on that). In "esx", it will parse deep and also look for the combined set of keywords, with the intent to consider a single differential parse versus multiple binary parses. The "es" mode is an incomplete mode intended for full source analysis.

The demo page is served from http://smotaal.github.io/experimental/markup/markup using the ordered parametric notation: #[url]![mode]*[replicates]**[iterations]. If mode is omitted, it is inferred from content-type. If iterations are specified and ≥ 1, a separate loop will run the tokenizer on the same code without rendering it (average time of loops will be shown, for sampling purposes). Replicates, which are not needed for this demo, if ≥ 2, the source text is repeated, so it will parse and render x repeats of the original text as a single source text, however, if you are working with really large sources (like babel) try *0**[iterations] to eliminate rendering overhead which can crash in some browsers.

Demo: acorn.mjs

Demo: acorn.js

Note: This experimental code works in the latest Chrome, Safari, and Firefox Nightly with varying performance. If you try this on a slower device, use a smaller source and change the ** iterations value as needed. All parsing happens in the main thread.

Obviously this does not address disambiguation of ambiguous source texts. If relative performance gains can be further improved or optimized, then disambiguation (loader or not) by source text will be something will likely be favoured by some down the road.

GeoffreyBooth · 2018-10-23T15:44:40Z

@SMotaal That’s a great start for something that I can see as a loader. For your CommonJS detection I would add a check to look for globally-referenced require.

Perhaps it would be good to start compiling a list somewhere of things that people might want to see as loaders. Besides this case, off the top of my head there’s transpilers, automatic completion of file extensions/folder root files, configuration of module loading behavior based on file extension, and general backward compatibility to bridge the gap between what will be possible in ESM in Node and what is/was possible/allowed in Babel and other transpiled/built versions of ESM.

SMotaal · 2018-10-25T00:22:01Z

@GeoffreyBooth obviously my efforts are gearing towards complimenting any potential implementations for loaders once the design process matures, but for this particular experiment, I decided to isolate for any such efforts and instead tried to focus on some proxy problems. In my effort, thinking of syntax highlighting was a great way to visually solve parsing challenges, and parsing in the main thread without dependencies was a great way to address performance issues, adhering to generators was a great way to force a stream-like approach... the list goes on.

Regardless, it was just perfectly timed to use it to demo relative performance gains compared to more common AST all the things then do one small thing, which would be rather expansive for esm vs cjs (or my proposed esx) detection in my best estimate (but still needs real world benchmarks).

SMotaal · 2018-10-25T10:21:48Z

ESX parsing currently scans the full length of source text, but the intent is to actually keep reference of enclosing ranges and not analyze them unless there are no signals of ESM syntax on top-level, then finally scan enclosures to find the first cjs hint or not, this makes it possible to report ESM, CJS, or still ambiguous so use the default based on out-of-band settings... etc.

ljharb · 2018-10-25T15:21:56Z

If we have out of band settings, and that info conflicts with a parsed result, I’d expect it to throw - the two shouldn’t be in disagreement.

SMotaal · 2018-10-26T00:29:19Z

That’s actually a very important aspect, because I in my rushed vision of eliminating parsing errors which are handled normally by the runtime, I have not given thought to certain errors that belong specifically to the intent at hand.

More of this kind of insights here can go a long way down the road when making decisions. Awesome 🙂

SMotaal · 2018-10-26T10:39:22Z

When considering the case of parsing, I was having trouble mentally placing the metadata communicated between two loaders for instance.

In this case, it is in-band (imo) but it is not "directly" from source, it is inferred and attributed to the source text, and is triggered (or bypassed) and responds to out-of-band (one-to-many) and out-of-source (one-to-one) aspects or settings.

Can I propose the following complementary pairs: (examples in brackets)

"out-of-band" — setting that trickles down to one or more resolved specifiers (flag, ext, mime…)

"in-band" — settings determined from resolved source features (pragma, this parse…).

"from-source" — settings declared in the source text (pragma, shebang…)

"out-of-source" — settings inferred or attributed to a source text (some in- and out-of-band)

Can anyone find a more practical breakdown of such information regarding a source text's journey?

This is all crude thoughts, it needs magic from the group. I feel that a distinction between what maps to sources and what is specific to a source but not baked right into the body are essential distinctions.

SMotaal · 2018-10-28T16:26:57Z

I finally updated the README and pushed the revisions made last week. Timing is more accurate now. I also converted the rendering pipeline to async APIs. Tokenization APIs remain sync but use generators so they yield and return as needed. I improved the modes for esm, cjs, esx, and added the missing alias es for the regular javascript syntax mode.

I am really interested to hear some feedback on the three modes (esm, cjs, esx) with various sources, especially if you find a source that breaks or chokes in one of those modes.

devsnek · 2018-10-28T16:58:56Z

@SMotaal its cool i guess? i don't really understand why we have an issue open for it though.

SMotaal · 2018-10-28T23:08:18Z

@devsnek This thread is about ideas in general separate from implementation. As we move closer to loaders and defaults, those discussions and demos can be helpful, at the very least, they can serve as a reference for those who need to find more about them.

SMotaal · 2018-11-05T11:34:01Z

@jdalton Can you pitch in on the idea of syntax detection relating to top-level parse. I tried to find a way to model this to the benefit of everyone in the group and was able to show a 200% increase in performance (theoretical) relative to the same method to full ES grammar parsing like ASTs would.

This was done avoiding the conventional all-or-nothing AST approach, using half-way optimized RegExps addressing usual concerns like hijacking.

Ideas like dual-parsing (@MylesBorins) and your top-level parse (@bmeck) made me think of a single-parse limited to the minimal subset of both grammars and it was roughly capped at 175% depending on nested complexity but on average better than 150%.

Since we're trying to find the first clue to determine syntax, the expectation is that such clues will often materialize early on in a text, making it reasonable to bail or delay the rest of the parsing (if at all needed).

Can we hash out pseudo code for syntax determination based on your initial thoughts on top-level parse?

About this thread…

I'm trying to brainstorm ideas parallel to our implementation efforts that make it possible for our broadly diverse members to appreciate the various technical challenges associated with decisions we are making.

Based on an early digest of this discussion, which I took liberty to summarize at top. I tried to pick ideas which seemed to create rifts in discussions elsewhere, mainly in on the topics of syntax detection and interoperability.

GeoffreyBooth · 2018-11-05T21:55:26Z

@SMotaal This is impressive . . . just to understand what you’ve done here, is your goal to determine parse goal by analyzing the syntax? A.k.a. a real implementation of the “unambiguous syntax”/grammar that we’ve been discussing?

If so, and assuming that you find an algorithm that works, have you thought about how to address the related concerns listed in #150 (comment)?

devsnek · 2018-11-05T21:57:08Z

to be clear, it's just a lightweight way of parsing js. this doesn't make the ambiguity go away.

ljharb · 2018-11-05T22:32:14Z

Confirmed; there does not exist any approach based on parsing that is unambiguous in all cases, absent a language spec change.

SMotaal · 2018-11-05T23:07:29Z

Yeah, while I would love to be the one that can solve ambiguity of source text and other sources, this is really nothing more than a very modest effort to model different parsing methods separate from the usual tools.

My gut feeling tells me that while implementing solutions is best served by employing tried and tested tools, coming up with optimal solutions may not always share in those benefits. So in other words, AST's have a way about them that force looking at problems in certain ways, so modeling the problem without is a way to avoid restricting ourselves to the givens of using them.

So this is far from a solution, just an attempt to provide a way to explore solutions, and the bottom line holds, ambiguity is ultimately a source problem, and if it is, then the only way to resolve it is out of band.

SMotaal · 2018-11-06T01:05:59Z

@devsnek the underlying motivation behind my markup experiment in general is not restricted to JS, in fact, I was interested to find different ways for efficient and responsive multi-syntax parsing without the pitfalls of conventional methods. And on that, I think I am ready to dare make the claim that it can be done with virtually no switching overhead, using less popular features like generators and regexps: html (and script tags)

SMotaal added interoperability cjs discussion labels Oct 17, 2018

SMotaal changed the title ~~Tooling: Using new language features to design efficient loader extensions~~ Tooling: Using new language features to design efficient loader-first extensions Oct 17, 2018

SMotaal added the brainstorming Safe place to discuss ideas and provide constructive feedback label Oct 17, 2018

This comment has been minimized.

Sign in to view

SMotaal changed the title ~~Tooling: Using new language features to design efficient loader-first extensions~~ Tooling: Brainstorming ideas that can lead to efficient loader-oriented designs Oct 17, 2018

This comment has been minimized.

Sign in to view

SMotaal closed this as completed Nov 30, 2018

SMotaal mentioned this issue Mar 6, 2019

--type=auto module type detection algorithm GeoffreyBooth/node-esm-entry-points-proposal#5

Merged

Tooling: Brainstorming ideas that can lead to efficient loader-oriented designs #203

Tooling: Brainstorming ideas that can lead to efficient loader-oriented designs #203

Comments

SMotaal commented Oct 17, 2018 • edited Loading

SMotaal commented Oct 17, 2018

devsnek commented Oct 17, 2018

SMotaal commented Oct 17, 2018

devsnek commented Oct 17, 2018

This comment has been minimized.

SMotaal commented Oct 17, 2018 • edited Loading

targos commented Oct 17, 2018

benjamingr commented Oct 17, 2018

devsnek commented Oct 17, 2018

targos commented Oct 17, 2018

SMotaal commented Oct 17, 2018 • edited Loading

devsnek commented Oct 17, 2018

This comment has been minimized.

SMotaal commented Oct 17, 2018

SMotaal commented Oct 17, 2018

SMotaal commented Oct 17, 2018 • edited Loading

MylesBorins commented Oct 17, 2018 via email

devsnek commented Oct 17, 2018

SMotaal commented Oct 17, 2018 • edited Loading

SMotaal commented Oct 17, 2018

bmeck commented Oct 17, 2018

jkrems commented Oct 17, 2018 • edited Loading

SMotaal commented Oct 17, 2018

GeoffreyBooth commented Oct 17, 2018 • edited Loading

jkrems commented Oct 17, 2018 • edited Loading

bmeck commented Oct 17, 2018

jkrems commented Oct 17, 2018

jkrems commented Oct 17, 2018

bmeck commented Oct 18, 2018

bmeck commented Oct 18, 2018

jkrems commented Oct 19, 2018

bmeck commented Oct 19, 2018

jkrems commented Oct 19, 2018

bmeck commented Oct 19, 2018

SMotaal commented Oct 23, 2018

SMotaal commented Oct 23, 2018 • edited Loading

GeoffreyBooth commented Oct 23, 2018

SMotaal commented Oct 25, 2018 • edited Loading

SMotaal commented Oct 25, 2018

ljharb commented Oct 25, 2018

SMotaal commented Oct 26, 2018

This comment has been minimized.

SMotaal commented Oct 26, 2018 • edited Loading

SMotaal commented Oct 28, 2018

devsnek commented Oct 28, 2018

SMotaal commented Oct 28, 2018

SMotaal commented Nov 5, 2018

GeoffreyBooth commented Nov 5, 2018

devsnek commented Nov 5, 2018

ljharb commented Nov 5, 2018

SMotaal commented Nov 5, 2018

SMotaal commented Nov 6, 2018 • edited Loading

SMotaal commented Oct 17, 2018 •

edited

Loading

SMotaal commented Oct 17, 2018 •

edited

Loading

SMotaal commented Oct 17, 2018 •

edited

Loading

SMotaal commented Oct 17, 2018 •

edited

Loading

SMotaal commented Oct 17, 2018 •

edited

Loading

jkrems commented Oct 17, 2018 •

edited

Loading

GeoffreyBooth commented Oct 17, 2018 •

edited

Loading

jkrems commented Oct 17, 2018 •

edited

Loading

SMotaal commented Oct 23, 2018 •

edited

Loading

SMotaal commented Oct 25, 2018 •

edited

Loading

SMotaal commented Oct 26, 2018 •

edited

Loading

SMotaal commented Nov 6, 2018 •

edited

Loading