LLVM Backend #2264

sebmarkbage · 2018-07-16T07:33:36Z

This lets us Prepack to native machine code or WebAssembly - without a JS runtime.

Prepack knows a lot about a program that it can evaluate. It is also highly specialized at getting rid of intermediate objects.

Most of the complexity of the serializer has to do with residual objects and closures that might leak to other JS.

Most of the complexity of a JS runtime comes from supporting the object model.

If we forbid leaking objects, and that Prepack has full knowledge of the program, then we know a lot about the types. This won't work with existing programs but new programs written for these constraints could benefit from this.

I wrote a new backend in parallel to the normal serializer. There is not a lot in common with the problem space so I decided to add a new serializer rather than build on the existing one.

Type System

In this first PR, only booleans and numbers are supported at the interop layer but I expect to support closures, symbols, and array buffers. Longer term we can support strings and Typed Objects.

The type system is currently strongly typed so it will reject a program where abstract values yield more than one type.

Functions are modeled by the normal function with return __abstract(':void', 'linkMethodName'). The argument types are inferred by the arguments.

I model booleans as i1, integrals as i32 and other numbers as f64.

Limitations

The limitations are mainly in the same set of problems we're currently investigating. Loops and recursive functions are not allowed.

The generated code must inline everything to completely get rid of all objects. This can yield bloated and suboptimal code.

In the future I hope that we can use arena allocation of custom object structures to temporarily store values created in recursive functions and loops.

Is This Useful?

I could see this as helpful for simpler functions such as animation functions that need to run on a different thread, audio processing functions, simple but highly parallizable functions like shaders etc.

It could potentially be useful for some React components that needs to execute at extreme performance.

Installation

This PR adds an optional dependency on the llvm-node project which contains node bindings to LLVM.

Downstream users of prepack doesn't automatically install these dependencies. Instead they have to be manually installed in the parent project. For this reason, the prepack CLI lazily requires these modules and print an error message if they're not installed.

It requires both cmake and LLVM to be installed. llvm-node depends on the nan project which should install automatically but I had to manually install nan first for some reason.

MacOS installation instructions:

brew install cmake
brew install llvm
yarn add nan
yarn add llvm-node

Additionally running the yarn test-llvm command requires the lli tool (LLVM interpreter) available on the PATH.

Building a Native Program

Compile to LLVM bitcode:

prepack filename.js --emitLLVM --out filename.bc

Compile to native assembly:

llc hello.bc -o hello.s

Link the program to a native executable:

gcc filename.s -o filename

Run it:

./filename

Debug by printing the LLVM IR assembly language code:

prepack filename.js --emitLLVMAssembly

Future Work

Model strings as stack allocations and allow them to be passed as pointers.
Expose ArrayBuffer as stack allocations and allow them to be passed as pointers.
Allow optimized residual functions to be passed as callbacks. Must not mutate global module state.
Bridge TypedObjects to some kind of memory managed mechanism for passing rich objects to C++.
Precompile regular expressions or link to an external library.
Map Math methods to LLVM operations.
Track float32 types returned by Math.fround. Allow functions to return float32.
Implement built-ins methods on strings etc. in JavaScript
Implement BigInt spec as Int64.

sebmarkbage · 2018-07-16T07:36:05Z

test/llvm/StdInOut.js

+
+(function() {
+  let getchar = __abstract(":integral", "getchar");
+  let putchar = __abstract(":integral", "putchar");


Here's how you use it to call standard library functions. Didn't need any changes to the API.

sebmarkbage · 2018-07-16T07:36:46Z

test/llvm/StdInOut.js

+
+  let y = ucs2toutf8("y".charCodeAt(0))[0];
+
+  print("hello world? ");


Note that this only prints constant strings atm. Making these dynamic surfaces some limitations in Prepack or the LLVM backend.

sebmarkbage · 2018-07-16T08:06:09Z

Basically the idea is that the main function and any residual function are “actors” that process some data. They’ll use arena allocation and are expected to be short lived.

A runtime outside of these can control the memory management of long lived objects. That allows for parallelism and more efficient memory management outside of these “worklets”.

NTillmann · 2018-07-16T10:13:36Z

Fantastic work!

You bypass the existing serializer, but then still find the existing SerializationContext useful, but then also add some hacks to work with the existing BabelNodeExpressions. I wonder if that the first thing we should clean up here --- make the SerializationContext generic and not BabelNodeExpressions specific, and generally clean that thing up --- it really just grew out of immediate needs.

Also, I wonder if you'd soon need something like the ResidualHeapVisitor when you want to support objects. The visitor computes some useful information.

sebmarkbage · 2018-07-16T16:48:48Z

Yea, this exercise really shows where our abstractions leak and where they don't. The generators are fairly flexible but still a bit leaky. There are some things in there that isn't really necessary from interpreters point of view, but it's a hard dependency from the interpreter.

E.g. creating intermediate variables happens in the interpreter right now. The generator also inserts temporary variables itself. The rest of the system is essentially SSA so it gets a little awkward to manage both. In this PR I just undo this by storing my own variable map and undoing the temporary assignment. We should try to move that concept out to be completely isolated in the serialization pass instead of interleaved.

Regarding the BabelNodeExpression hack, while the SerializationContext has an unfortunate dependency on it, the deeper issue is actually with AbstractValue whose build node depends on us materializing nodes before we actually know what operation we're serializing. That's the one we need to think a bit about.

I originally expected LLVM to help me with much of what the visitor does, but it is lacking in some areas so yea I might need a pre-processing pass like the ResidualHeapVisitor.

NTillmann · 2018-07-16T16:53:30Z

And as a plus, theResidualHeapVisitor is completely ignorant of BabelNodeExpressions.

Ideally, I'd like to move out all build nodes to a Babel-specific place, and instead place specific named instructions in the generator. That would allow printing the generator tree in some nice assembly format, and then different backends can be plugged in more easily.

Summary: Since I'm adding a new experiment I figured I'd delete an equivalent sized one. Last year I added an option that runs the Prepack program by invoking Node.js JS runtime which lets us prepack the whole module system and initialization. It's essentially a packager with perfect Node.js module resolution semantics. It did this by modeling Node's native environment as Prepack bindings. This PR removes that whole option. There's a few reasons why I don't think that worked out as a good idea. - It's not solving a real need. It is hard to keep different module systems in tact. There is always something in the ecosystem that breaks down and using the canonical one solves that. However, in practice, if there is a need for bundling the ecosystem itself adapts to the toolchain. So it's not actually that hard to bundle up a CLI even with Webpack, even if it's strictly not 100% compatible, by tweaking a few downstream depenencies. - Running the resulting bundle is tricky. The resulting bundle includes the JS parts of Node. This overlaps with what Node.js adds at runtime so it runs it twice. The ideal is actually to build a custom distribution of Node.js but this is generally overkill for what people want. - Bindings change a lot. While Node.js's API notoriously doesn't change much. The internals do change a lot. By picking the API boundary in the middle of the internals of Node.js, it risks changing with any version. While technically observable changes, nobody else relies on these details. If this option was worth its weight, someone could probably maintain it but so far that has not been the case so we had to disable this option in CI to upgrade Node. However, going forward I think there are alternative approaches we can explore. - First class module system. This is something we really need at some point. A first class module system would be able to load Node.js module files from disk and package them up while excluding others. It doesn't have to be literally Node.js's module system. Close enough is ok. Especially as standards compliant ECMAScript modules get more popular. This lets us target compiling output that runs after Node's initialization. - By introducing havocing and membranes in the boundaries, it becomes possible to initialize Node.js modules without actually knowing the internal of the boundaries. - We've started optimizing residual functions which is much more interesting. However, this requires that code puts some constraints on how it works with its environment. It's not designed to be fully backwards compatible. That's probably a good thing but that also means that we can put constraints on the modules being Prepacked. This removes the ability to prepack Prepack itself which is unfortunate but already wasn't being tested. To speed up Prepack itself, the [LLVM backend](#2264) seems much more useful if it can ever work on Prepack itself. Pull Request resolved: #2267 Differential Revision: D8863788 Pulled By: sebmarkbage fbshipit-source-id: d777ec9a95c8523b3386cfad553d9f691ec59074

This adds an option to emit an LLVM module instead of JavaScript source text. The CLI can print this as either LLVM bitcode or assembly language.

Generators build statements, which builds expressions from values.

Execute the resulting IR through the `lli` command which has to exist on the path.

We need the Value at the time we're evaluating binary expressions since it contains more information than the LLVMValue.

This means that all computations that is required by either branch is eagerly computed. This is unfortunate but without a visitor we don't know if one of them is shared and needs to be outside the branch.

This adds basic string support. We'll use UTF8 as the standard format since it is the most common format coming in and going out. The most common format to interop with is UTF8 through C, C++, Rust etc. interfaces. For indexing we can convert to UCS2 in cases we can't prove that it is safe to index through UTF8. No ropes will be used to keep implementation lightweight and simple. Allocation happens on an arena on the stack. Appending two strings is inlined as memcpy. Other operations will need a standard library that can be called.

Since function arguments are not typed right now, we don't need to further specify the null pointer's type.

If a conditional or logical expression results in two different number types, coerce them to double.

facebook-github-bot · 2018-07-25T21:35:00Z

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email cla@fb.com with your details so we can update your status.

ManasJayanth · 2018-08-11T05:04:24Z

@NTillmann Are the two serialisers going to merge sometime in future? Coming from a contributor's perspective, what changes can we expect in the current serialiser?

facebook-github-bot · 2018-08-11T05:57:37Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

hermanventer

Some quick questions before I read more.

hermanventer · 2018-08-13T20:29:52Z

scripts/test-llvm.js

+import { prepackFileSync } from "../lib/prepack-node.js";
+import invariant from "../lib/invariant.js";
+
+let chalk = require("chalk");


Why do we require chalk and not import it?

I don't know. This is copy-paste from the other test-serializer.

Ultimately, I'd like to run all the normal serializer tests through the LLVM path. At that point it might make sense to unify the runners.

hermanventer · 2018-08-13T20:30:54Z

scripts/test-llvm.js

+let fs = require("fs");
+let child_process = require("child_process");
+
+function search(dir, relative) {


This will be copy no 6 of this function. It looks like something we should factor out.

I think I ultimately want to use test-serializer to run the same tests. For now I just needed a quick way to get started until I have enough features to actually run those tests.

hermanventer · 2018-08-13T20:32:28Z

src/llvm/CompilerIntrinsics.js

+/* @flow */
+
+import {
+  Module,


Please sort these.

hermanventer · 2018-08-13T20:57:36Z

src/llvm/CompilerIntrinsics.js

+import { llvmContext } from "./llvm-context.js";
+
+export class Intrinsics {
+  +_module: Module;


Intrinsics is not an interface or a type, so what is the meaning of the invariant annotation in this context?

I'm not sure what you mean. It just means that we always have to initialize it with a Module in the constructor.

You seem to be using a feature of Flow that is not documented in https://flow.org/en/docs/. My reading of that document is that + is a way to mark an interface property as covariant, which constrains the code accesses the underlying object via the interface to not write to that property, lest it inadvertently violate the type annotation of the property of the underlying object.

There is no documentation that I can find that informs me that + is a way to mark a class property as "must be initialized inside the constructor". I'm also much bemused by this interpretation since this should already the be case since _module MUST always have the type Module.

Where do I find more information on this use of +?

https://flow.org/blog/2016/10/04/Property-Variance/

I don't think a blog post from 2016 trumps the current documentation. Even so, I don't see anything there that suggests that + means that we always have to initialize the property.

Sorry, I just meant that this is the only place where I know + was documented. Dunno if that changed, and haven't looked into how to use it myself.

Ohhh you're referring to the +. Yea that just means that this field is read-only which is helpful to make it covariant. In Flow, the classes have the ability to define their own interface so the documentation for interfaces applies here.

This is just a way to say that it is read-only. This is to enforce, that we don't mutate this later on. Mutating it would break the invariant that the lazy initialized functions and types all belong to the same module. If we're generating multiple-modules we need multiple Intrinsics objects.

Covariant (read-only) properties are allowed to be initialized in class constructors in a recent-ish change to the type system. This preserves the semantics of covariant properties since you can’t assign back some larger type.

type Y = {+p: number | string}; class X { +p: number; constructor() { this.p = 42; // Ok setTimeout(() => { this.p = 0; // Error (this: Y).p = 'foo'; // Error }, 0); } m() { this.p = 0; // Error (this: Y).p = 'foo'; // Error } }

try-Flow

hermanventer · 2018-08-13T21:42:30Z

src/llvm/CompilerIntrinsics.js

+  }
+
+  isStringType(type: LLVMType): boolean {
+    if (!this._stringType) return false;


Is it a good idea to return false if _stringType is not defined? Why not return this.stringType.equals(type)?

If it's not defined, it'll throw and Flow won't let me call equals on it.

If it is not yet defined, that means that there are no strings defined in this context yet. So whatever this type is, it can't be equal to a string since there are no strings.

Hmm, this is extremely subtle. You are essentially saying that the only way a there can be a type T so that this.stringType.equals(T) is if T === this.stringType. I'm none too sure that this is desirable, but if it is, there should be a big comment about it and my follow up to that would be why call equals instead of using ===?

The reason for this approach is because I don't want to generate unnecessary code in the LLVM output for a bunch of types if this type is never referenced in the output. I still have to test whether something is this type without knowing if it might be though. I suspect there will be a lot more of these cases for other types that I'll need to encode. I wouldn't want to document it for ever type but I can make a global comment about it in this file.

The equals call is just how the LLVM node bindings work since the wrapper objects don't guarantee JS object equality since the underlying C++ APIs can return instances that the JS bindings doesn't know about. That pattern will be all over the code base. It's pretty common since JS doesn't have operator overloading (yet).

I'm still befuddled. If the C++ API can return a value that wasn't obtained via the getter of stringType, what guarantee is there that isStringType won't be called before stringType is called and therefore have inconsistent behavior?

The C++ can't make up a type that I haven't created yet. However, if I have created it, the C++ API can return the C++ version of that object from reflection APIs. The JS binding around the C++ bindings doesn't know that there already is a JS instance associated with this C++ object so it creates a new wrapper JS object around the C++ object.

OK, be that as it may then. A source code comment to this effect would be very helpful to a future reader. Ideally some invariant to enforce it would also be desirable.

NTillmann · 2018-09-06T22:17:04Z

src/llvm/builders/To.js

+    return builder.createSelect(value, ConstantInt.get(llvmContext, 1), ConstantInt.get(llvmContext, 0));
+  } else if (value.type.isDoubleTy()) {
+    // Number
+    // TODO: I think we can make this fewer instructions/faster by some clever bit manipulation.


Why the special handling of infinity/NaN? Reading the spec, FPToSI would result in something undefined otherwise. And why mod 2^32? Is this all to emulate x | 0 in JS?

This is the internal operation called by x | 0 yes. <<, >> and >>> uses ToUint32 which is slightly different.

https://tc39.github.io/ecma262/#sec-toint32

I'm not sure what you're asking. The spec requires infinity/nan to have a certain behavior. The FPToSI behavior is somewhat undefined but will in practice simply yield the wrong result so I can't rely on it alone.

The mod 2^32 is speced because it basically means taking the first 32 bits of the fraction after the rounding. This is different than the overflow behavior you would get from using any of the other LLVM operations.

The reason things are defined this way is so that multiple operations are idempotent which helps when you combine multiple of these operations in sequence and asm.js. I don't fully understand it but seems clever.

None of this applies to any single or even few CPU instructions. Interestingly no implementation does the same thing here. Everyone tries their own clever thing. I can't rely on certain platform specific quirks so not all solution would work cross-platform. I have some ideas around 64-bit casting (not currently exposed by the llvm-node bindings). In the end I opted for just the simplest code.

sebmarkbage · 2018-09-06T23:30:14Z

src/llvm/builders/To.js

+    // in a struct to keep track that this is an unsigned int so that we know
+    // which operations to apply on this.
+    let unsignedValue = UndefValue.get(state.intrinsics.uint32Type);
+    return builder.createInsertValue(unsignedValue, value, [0]);


Note how because of the spec is written these things become safe for subsequent operations.

sebmarkbage requested review from gaearon, trueadm, hermanventer and NTillmann July 16, 2018 07:33

facebook-github-bot added the CLA Signed label Jul 16, 2018

sebmarkbage commented Jul 16, 2018

View reviewed changes

sebmarkbage mentioned this pull request Jul 16, 2018

Delete node-cli Option and all the Node.js intrinsics #2267

Closed

Add emitLLVM options

636b7be

This adds an option to emit an LLVM module instead of JavaScript source text. The CLI can print this as either LLVM bitcode or assembly language.

sebmarkbage force-pushed the llvm branch from 1f6deb3 to c1f1da2 Compare July 19, 2018 04:49

sebmarkbage added 13 commits July 18, 2018 21:54

Add basic compiler infra

0496903

Add optionalDependency and gracefully handle missing llvm node

578aa7c

Basic scaffolding for building LLVM IR

46ba63e

Generators build statements, which builds expressions from values.

Infer LLVM function return type based on specified return type

d2b8335

Cache values to avoid serializing duplicates

762e66f

Reuse variables declared as intrinsics

6b24be7

Invariants

414fe6f

Handle simple equality check

2ce21d7

Support if statement

9c564f7

Add llvm test suite

708f3e8

Execute the resulting IR through the `lli` command which has to exist on the path.

Update yarn lock file

c7add22

Support conditionals

7bdadf7

Wait to unwrap Value until inside the expression

e497a19

We need the Value at the time we're evaluating binary expressions since it contains more information than the LLVMValue.

sebmarkbage force-pushed the llvm branch from c1f1da2 to d4012c5 Compare July 19, 2018 04:54

sebmarkbage added 2 commits July 19, 2018 09:43

Simplify conditionals by avoiding branching

5e32b39

This means that all computations that is required by either branch is eagerly computed. This is unfortunate but without a visitor we don't know if one of them is shared and needs to be outside the branch.

String Comparison

396855c

sebmarkbage force-pushed the llvm branch from d4012c5 to 86df70a Compare July 19, 2018 17:04

sebmarkbage added 5 commits July 19, 2018 10:10

Add string test

0ee081f

String lint

0079a86

Expression lint

81e968a

Allow null as i8* nullptr

a0bb8eb

Since function arguments are not typed right now, we don't need to further specify the null pointer's type.

Add type conversions

cca9a97

sebmarkbage force-pushed the llvm branch from 86df70a to cca9a97 Compare July 22, 2018 05:01

Implement the rest of Binary, Logical and Unary Expressions

87920a1

sebmarkbage force-pushed the llvm branch from c14eaa3 to 87920a1 Compare July 23, 2018 23:29

sebmarkbage added 2 commits July 23, 2018 16:47

Coerce conditional numbers to double

9849ced

If a conditional or logical expression results in two different number types, coerce them to double.

Support null/undefined ToString/ToNumber

dd6fa20

StevenX911 mentioned this pull request Aug 5, 2018

2018-08-06 CtripFE/fe-weekly#70

Closed

hermanventer reviewed Aug 13, 2018

View reviewed changes

NTillmann reviewed Sep 6, 2018

View reviewed changes

sebmarkbage commented Sep 6, 2018

View reviewed changes

zmlzm mentioned this pull request Nov 25, 2018

Is it possible target llvm like this PR in prepack? swc-project/swc#79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM Backend #2264

LLVM Backend #2264

sebmarkbage commented Jul 16, 2018 •

edited

Loading

sebmarkbage Jul 16, 2018

sebmarkbage Jul 16, 2018

sebmarkbage commented Jul 16, 2018

NTillmann commented Jul 16, 2018

sebmarkbage commented Jul 16, 2018

NTillmann commented Jul 16, 2018

facebook-github-bot commented Jul 25, 2018

ManasJayanth commented Aug 11, 2018 •

edited

Loading

facebook-github-bot commented Aug 11, 2018

hermanventer left a comment

hermanventer Aug 13, 2018

sebmarkbage Aug 17, 2018

hermanventer Aug 13, 2018

sebmarkbage Aug 17, 2018

hermanventer Aug 13, 2018

hermanventer Aug 13, 2018

sebmarkbage Aug 17, 2018

hermanventer Aug 17, 2018

gaearon Aug 17, 2018

hermanventer Aug 17, 2018

gaearon Aug 17, 2018

sebmarkbage Aug 17, 2018 •

edited

Loading

calebmer Aug 20, 2018

hermanventer Aug 13, 2018

sebmarkbage Aug 17, 2018

hermanventer Aug 17, 2018

sebmarkbage Aug 17, 2018

hermanventer Aug 20, 2018

sebmarkbage Aug 20, 2018 •

edited

Loading

hermanventer Aug 20, 2018

NTillmann Sep 6, 2018 •

edited

Loading

sebmarkbage Sep 6, 2018

sebmarkbage Sep 6, 2018


		let y = ucs2toutf8("y".charCodeAt(0))[0];

		print("hello world? ");

LLVM Backend #2264

Are you sure you want to change the base?

LLVM Backend #2264

Conversation

sebmarkbage commented Jul 16, 2018 • edited Loading

Type System

Limitations

Is This Useful?

Installation

Building a Native Program

Future Work

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebmarkbage commented Jul 16, 2018

NTillmann commented Jul 16, 2018

sebmarkbage commented Jul 16, 2018

NTillmann commented Jul 16, 2018

facebook-github-bot commented Jul 25, 2018

ManasJayanth commented Aug 11, 2018 • edited Loading

facebook-github-bot commented Aug 11, 2018

hermanventer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebmarkbage Aug 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebmarkbage Aug 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NTillmann Sep 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebmarkbage commented Jul 16, 2018 •

edited

Loading

ManasJayanth commented Aug 11, 2018 •

edited

Loading

sebmarkbage Aug 17, 2018 •

edited

Loading

sebmarkbage Aug 20, 2018 •

edited

Loading

NTillmann Sep 6, 2018 •

edited

Loading