-
Notifications
You must be signed in to change notification settings - Fork 27
AST, IR, bytecode and friends - standards and inovation #104
Comments
Feel free to edit my title/desc... |
So I was asking, should we consider embedding |
Just to be clear: V8's Ignition bytecode is not a serialized AST. Its format is not designed to be public, but rather an implementation detail. The bytecode encodes data specific to V8's design choices (value representation, tagging scheme, call convention, object layout, metadata slot indices, etc.). Currently it is evolving quickly. We cannot afford to freeze the bytecode format or even specify it formally. I agree that it is desirable to have an intermediate format to avoid some parsing, but Ignition bytecode is not that format. When you think of bytecode, you probably think of Java bytecode. Ignition bytecode is very unlike Java bytecode in that it is very much intertwined with V8's implementation details, whereas Java bytecode is a public format shared between many Java implementations. WebAssembly is much closer, though with caveats. |
One thing that I'm really interested in is how we are going to be representing the DAG of an ESModule based import before handing it over to V8 to In the past I helped organize an AST summit involving stakeholders from Esprima, Acorn, and Babel... perhaps some of those folks could chime in. I'm not 100% sure if that a standard is something that should live in core, but perhaps it is something we could collaborate on. |
Not entirely sure I understand the DAG needs for the format besides mapping of specifier to resolved IR subgraph. I think it might be possible to create bundled DAG of ESM, but it might not always be possible (like if a dep is CommonJS) which we have to do some odd things outside of ESM source code and pass refs into ESM. |
The V8 team has been looking into parsing speed a lot during the last 12 months, since this is highly important for page load these days. The V8 parser improved a lot during that time, and might improve even further. Since the engine cannot just blindly trust the bytecode, some verification has to happen as well, plus the external bytecode must be compiled to our internal bytecode. All of this takes time, and at this point we don't have reasons to believe that it'll be beneficial, based on the data we have. |
While I'm working on ways to better support AST manipulation in core right now, I don't believe it's currently the right time to bundle the AST manipulation part itself into core. V8 is a moving target in that regard, and it would be a substantial issue for the VM-agnostic efforts to deal with. I also don't think we should be pulling larger codebases like esprima into core. In my opinion, the best way forward is minimal changes to the require system now and evaluating whether a standard interface across VMs for AST manipulation is worth pursuing. (My gut feeling is no, but who knows?) |
I'm not sure I understand the value in having Node standardize around an AST. But if I was to suggest are starting place if would be the Shift AST: https://github.com/shapesecurity/shift-spec |
I believe this is an interesting topic, but:
Who? Why? |
Application Performance Monitoring vendors want it for safer and better performing instrumentation than monkey-patching everything async in the environment with a bunch of closures. It's also useful for language transpiling, applying code coverage tracking instrumentation, stubbing files for mocks in a test suite and numerous other possibilities. |
@Qard is this a lack of hooks problem, or are people actually modifying how non-exported code runs? |
If the Node Foundation wants to get more involved in the JS-to-JS compiler pipeline for JavaScript I can introduce the Babel team, I know that they are looking for more support, and Babel is effectively the de-facto tool for this in the community. |
@Qard thanks! Do you mind showing me an example of how one would use such hypothetical tools? |
@benjamingr not sure what you mean by hypothetical, both @Qard and I as well as others in the APM vendor space do this. Here is actual code:
Our tx.transform adds function entry/exit timing information using a source code transform. @Qard's probably does something similar, though what everyone does is specific to the type of information their APM wants to extract. |
@Qard @sam-github would https://github.com/v8/v8/blob/master/include/v8.h#L6288 or similar being exposed fix that? |
@bmeck In our APM work we do need to patch non-exported module internals fairly frequently. |
@matthewloring is that to modify behavior or get timing etc.? |
It is usually to either collect timing information or to fix userspace queuing issues (see 5 under desired features) that break async context propagation. The |
@bmeck I don't know enough about the V8 to know if that would help, but it looks to me like as @matthewloring says, it runs for every single function, almost certainly cannot be used to reenter into V8 and run js code, probably is a singleton (it can't be used by an APM module and by the profiler), and it can't be used to pass closure info to solve the user-spacce queuing, aka "connection pooling", problem (though the async-hooks APIs may eventually help with that). |
@matthewloring for the queueing issue I defer to The problem sounds real, but I am not convinced AST is the appropriate way to fix this even if it is the current approach. |
I'm not sure If we should add official APIs for rewriting. patch Module._compile actually works OK for now. To make me feel good about blessing the practice in perpetuity with an official API, I'd want to see that its something that would help across a range of use-cases, and APMs have hacked around this sucessfully for a while. Unless the API was backported to LTS, we couldn't even use it until 8.x is no longer in LTS, which is years from now. Which is maybe a reason to get it into 9.x:-). The babel/typescript/etc runtime transpiler use-case, for example, is something that we have traditionally told users to do as a pre-processing stage before passing the output javascript to Node. Will making it be easier to do with an ENV var or one of these mechanisms noticably improve the user-experience? How so? |
@sam-github thanks, but I meant in terms of how the new potential API would help you? |
We do mutation by file-name matching, you can see that in the example code. We further use either heuristics or else hard-coded pre-knowledge of how a npm package works attempt to only patch code that is a top level API exported by the module, or that needs async context passing. Exactly what gets patched depends on why we are patching, I've worked on 4 different modules which do this by now which have all ended up at IBM (strong-agent, a method tracing tool, appmetrics, and BAM). |
@benjamingr I didn't say it would help me: #104 (comment) Having to do this over and over by patching an undocumented Module method is a bit ugly. But just encoding ugliness in an API maybe is still ugly, too. Hopefully we'll see more examples reaised, maybe there is something that core could expose as an API that would make this better, maybe not. I'm undecided. |
@bmeck and I don't think absence of an AST standard is a problem. If the hook exposes javascript, how people rewrite the js is up to them. As you list, there are many options, I don't think there is any request here that node expose an AST and support rewrite of it. It might be cool if it did, but given esprima, its not really necessary. |
@sam-github if this isn't about AST/IR/bytecode as in the title, we should make a diff issue. |
@bmeck sorry! I totally confused this with nodejs/node#12349, I'm awash in github notifications. |
Okay, so in the case of TraceView, the product I used to work on, there is extensive need for modifying behaviour for the purpose of capturing meta data to describe the state of the async thing being measured. Here's an example of collecting some metadata related to a mysql query by patching the https://github.com/tracelytics/node-traceview/blob/master/lib/probes/mysql.js#L120-L126 You'll also see that, further up in that file, I had to patch There's also potential dangers to monkey-patching the world. Here, I'm patching the postgres driver: https://github.com/tracelytics/node-traceview/blob/master/lib/probes/pg.js#L17-L30 It conditionally patches Here's the hapi instrumentation: https://github.com/tracelytics/node-traceview/blob/master/lib/probes/hapi.js That code jumps through a long list of hoops to detect the module version both at initial patch time and in various points at patch time, the hapi module changed dramatically over the years, so there's piles of checks required all over the place to see what things look like to determine how it should apply the next instrumentation step. It's also patching the All that benefits greatly from AST transforms, but it's not too overly difficult to hijack https://github.com/nodejs/node/blob/master/lib/internal/bootstrap_node.js#L466-L470 This means AST transforms just plain can not be applied to built-in modules. This means the http instrumentation for https://github.com/tracelytics/node-traceview/blob/master/lib/probes/http.js#L246 That right there uses this terrifying pile of hacks to produce a FILO queue for exit events to encapsulate the layers of the request and end the transaction trace when the request stream ends. The code that handles that is here: https://github.com/tracelytics/node-traceview/blob/master/lib/index.js#L397-L436 Basically, if you want to report more meaningful data beyond simple timing, you're in for a rough time and a ton of potential performance pitfalls. 😱 |
So, as was referenced above already by @refack, I've been working on nodejs/node#12349. I believe trying to expose AST manipulation in Node.js itself is likely a very significant challenge, especially considering efforts to be VM-agnostic. It'd basically need to be done in a way that is purely disconnected from the VM itself, in which case, why not just let userland use the already well-established esprima or shift AST parsers/generators? What I'm aiming for is a simple way to intercept a module between loading the file and handing it to the VM to parse, allowing whatever text-to-text transformation you like to be applied to it. It does mean a larger performance hit to |
@benjamingr Here's what I've been building to do AST parse/generate based instrumentation: https://github.com/Qard/node-module-compile-transform-ast/blob/master/test.js For each require, it'll do a regex test against the filename and if a match is found it'll run the corresponding patch handler function on the AST. Sorry for the simplicity of the sample there. I haven't had time to continue working on it for the past week as I've been travelling, but the intent is to simply define patch handler functions for loaded file paths that match a given regex pattern--exact string matching also works. The regex matching is simply to enable spotting nested dependencies. I'm hoping to carve out some time to put together a proof-of-concept AST-based instrumentation for multiple versions of express sometime in the next few weeks. That'd prove the value on a userland module. Again though, there's currently no way to intercept the a file contents pre-parse, thus my PR. |
@Qard thanks, that's great as a motivating example. I think we should summarize this and put it at the top of the issue so the benefits can be obvious. |
If I'm not mistaken, node-module-compile-transform sounds like what babel-register does? |
Basically. It's a bit lower-level though. It runs any arbitrary function that receives and returns a string rather than running babel on it. The |
@benjamingr I think also nodejs/node#12471 fits in this conversation, as is nodejs/node#11842 and IMHO embedding a "front-end" transpiler will help up drive language innovation (even your #12 and nodejs/node#12442 fits in here, as an AOP crosscutting-concern) |
@sam-github having a |
@refack |
I should state here, I am very concerned if we ship Signed Packages in any way we should/need to make all transforms opt-in by the Signed Package via per package config or an opt-in API. The more I think about this, the less I think AST transforms are right to land in core, and might cause problems if a Signed Package exists since it would not be transformed automagically. |
Agreed, but adding a simple rule to promisify everything will...
I'm in line with minimising magic, everything should be opt-in (+ user rules with exclusions). |
@refack as shown in the PR it is not a single/simple rule : https://github.com/nodejs/node/pull/12442/files#diff-0a5d4868b2b9b17cf9e2c11f1bd1311eR390 |
That's the implementation, I agree "good" problems, have "elaborate" solutions, and also as a core solution is should be complete. |
Added a few summery point to first comment. |
That's a very good point, signed packages need to opt out of this - at which point I'm wondering if the AoP value this brings is less strong. I'm wondering if this is a deal breaker for the instrumentation use cases. |
Not arguing, just think of
Just quoting "Signed Package are not the holy grail" they are one of a few competing concerns. |
@bmeck @benjamingr @sam-github A real question, what happens when a client wants "gold standard" i.e. LTS engine + only Signed Packages, but also "most have" APM (as client always do ;) ) ? |
@refack Signed Packages don't exist, not even in proposal form, hard to speculate on how they will interact with other features. |
@sam-github https://github.com/dimich-g/webpackage exists and I have been working to try and make sure Node is able to adopt it when all parties are agreed on it. Electron, Web Browsers, and document publishers are also looking at it
My guess is more that a hook mechanism could be exposed that allows this, |
@bmeck If I understand you right, you say that AST/IR manipulation might not be the right solution for APM, or similar structured instrumentation (that could be mapped to If I understand you right I totally agree; well defined hooks hands-down beat generic instrumentation fiddling with code/AST/IR "in the dark". Personally with this discussion I would want to focus on enabling more adventures innovation, not replacing other better solutions... |
This issue has been inactive for a while and this repository is now obsolete. I'm going to close this, but feel free to open another issue in a relevant active repository (TSC perhaps?) and include a link back to this issue if this is a subject that should receive continued attention. |
Spin off from #100 (comment)
The premise: people in
nodejs
have ideas that require manipulation of the AST or of IR. The people ofV8
at present have no plan to open the engine's AST as API, but are close to achieving useable bytecode modularization (nodejs/node#11842).Discussion points:
.pyc
?/cc @addaleax @hashseed @MylesBorins @indutny
[update 2017-04-20]
Some motivations:
express
util.awaitable/promisify
(Support awaiting values throughutil.awaitable
#12 or util: add util.promisify() node#12442), could come with an easy way to for userland to apply to everything.The text was updated successfully, but these errors were encountered: