-
Notifications
You must be signed in to change notification settings - Fork 424
LLVM Backend #2264
base: main
Are you sure you want to change the base?
LLVM Backend #2264
Conversation
test/llvm/StdInOut.js
Outdated
|
||
(function() { | ||
let getchar = __abstract(":integral", "getchar"); | ||
let putchar = __abstract(":integral", "putchar"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how you use it to call standard library functions. Didn't need any changes to the API.
|
||
let y = ucs2toutf8("y".charCodeAt(0))[0]; | ||
|
||
print("hello world? "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this only prints constant strings atm. Making these dynamic surfaces some limitations in Prepack or the LLVM backend.
Basically the idea is that the main function and any residual function are “actors” that process some data. They’ll use arena allocation and are expected to be short lived. A runtime outside of these can control the memory management of long lived objects. That allows for parallelism and more efficient memory management outside of these “worklets”. |
Fantastic work! You bypass the existing serializer, but then still find the existing Also, I wonder if you'd soon need something like the |
Yea, this exercise really shows where our abstractions leak and where they don't. The generators are fairly flexible but still a bit leaky. There are some things in there that isn't really necessary from interpreters point of view, but it's a hard dependency from the interpreter. E.g. creating intermediate variables happens in the interpreter right now. The generator also inserts temporary variables itself. The rest of the system is essentially SSA so it gets a little awkward to manage both. In this PR I just undo this by storing my own variable map and undoing the temporary assignment. We should try to move that concept out to be completely isolated in the serialization pass instead of interleaved. Regarding the I originally expected LLVM to help me with much of what the visitor does, but it is lacking in some areas so yea I might need a pre-processing pass like the ResidualHeapVisitor. |
And as a plus, the Ideally, I'd like to move out all build nodes to a Babel-specific place, and instead place specific named instructions in the generator. That would allow printing the generator tree in some nice assembly format, and then different backends can be plugged in more easily. |
Summary: Since I'm adding a new experiment I figured I'd delete an equivalent sized one. Last year I added an option that runs the Prepack program by invoking Node.js JS runtime which lets us prepack the whole module system and initialization. It's essentially a packager with perfect Node.js module resolution semantics. It did this by modeling Node's native environment as Prepack bindings. This PR removes that whole option. There's a few reasons why I don't think that worked out as a good idea. - It's not solving a real need. It is hard to keep different module systems in tact. There is always something in the ecosystem that breaks down and using the canonical one solves that. However, in practice, if there is a need for bundling the ecosystem itself adapts to the toolchain. So it's not actually that hard to bundle up a CLI even with Webpack, even if it's strictly not 100% compatible, by tweaking a few downstream depenencies. - Running the resulting bundle is tricky. The resulting bundle includes the JS parts of Node. This overlaps with what Node.js adds at runtime so it runs it twice. The ideal is actually to build a custom distribution of Node.js but this is generally overkill for what people want. - Bindings change a lot. While Node.js's API notoriously doesn't change much. The internals do change a lot. By picking the API boundary in the middle of the internals of Node.js, it risks changing with any version. While technically observable changes, nobody else relies on these details. If this option was worth its weight, someone could probably maintain it but so far that has not been the case so we had to disable this option in CI to upgrade Node. However, going forward I think there are alternative approaches we can explore. - First class module system. This is something we really need at some point. A first class module system would be able to load Node.js module files from disk and package them up while excluding others. It doesn't have to be literally Node.js's module system. Close enough is ok. Especially as standards compliant ECMAScript modules get more popular. This lets us target compiling output that runs after Node's initialization. - By introducing havocing and membranes in the boundaries, it becomes possible to initialize Node.js modules without actually knowing the internal of the boundaries. - We've started optimizing residual functions which is much more interesting. However, this requires that code puts some constraints on how it works with its environment. It's not designed to be fully backwards compatible. That's probably a good thing but that also means that we can put constraints on the modules being Prepacked. This removes the ability to prepack Prepack itself which is unfortunate but already wasn't being tested. To speed up Prepack itself, the [LLVM backend](#2264) seems much more useful if it can ever work on Prepack itself. Pull Request resolved: #2267 Differential Revision: D8863788 Pulled By: sebmarkbage fbshipit-source-id: d777ec9a95c8523b3386cfad553d9f691ec59074
This adds an option to emit an LLVM module instead of JavaScript source text. The CLI can print this as either LLVM bitcode or assembly language.
Generators build statements, which builds expressions from values.
Execute the resulting IR through the `lli` command which has to exist on the path.
We need the Value at the time we're evaluating binary expressions since it contains more information than the LLVMValue.
This means that all computations that is required by either branch is eagerly computed. This is unfortunate but without a visitor we don't know if one of them is shared and needs to be outside the branch.
This adds basic string support. We'll use UTF8 as the standard format since it is the most common format coming in and going out. The most common format to interop with is UTF8 through C, C++, Rust etc. interfaces. For indexing we can convert to UCS2 in cases we can't prove that it is safe to index through UTF8. No ropes will be used to keep implementation lightweight and simple. Allocation happens on an arena on the stack. Appending two strings is inlined as memcpy. Other operations will need a standard library that can be called.
Since function arguments are not typed right now, we don't need to further specify the null pointer's type.
If a conditional or logical expression results in two different number types, coerce them to double.
Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired. Before we can review or merge your code, we need you to email cla@fb.com with your details so we can update your status. |
@NTillmann Are the two serialisers going to merge sometime in future? Coming from a contributor's perspective, what changes can we expect in the current serialiser? |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick questions before I read more.
import { prepackFileSync } from "../lib/prepack-node.js"; | ||
import invariant from "../lib/invariant.js"; | ||
|
||
let chalk = require("chalk"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we require chalk and not import it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. This is copy-paste from the other test-serializer.
Ultimately, I'd like to run all the normal serializer tests through the LLVM path. At that point it might make sense to unify the runners.
let fs = require("fs"); | ||
let child_process = require("child_process"); | ||
|
||
function search(dir, relative) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be copy no 6 of this function. It looks like something we should factor out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I ultimately want to use test-serializer to run the same tests. For now I just needed a quick way to get started until I have enough features to actually run those tests.
/* @flow */ | ||
|
||
import { | ||
Module, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sort these.
import { llvmContext } from "./llvm-context.js"; | ||
|
||
export class Intrinsics { | ||
+_module: Module; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intrinsics is not an interface or a type, so what is the meaning of the invariant annotation in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. It just means that we always have to initialize it with a Module in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be using a feature of Flow that is not documented in https://flow.org/en/docs/. My reading of that document is that + is a way to mark an interface property as covariant, which constrains the code accesses the underlying object via the interface to not write to that property, lest it inadvertently violate the type annotation of the property of the underlying object.
There is no documentation that I can find that informs me that + is a way to mark a class property as "must be initialized inside the constructor". I'm also much bemused by this interpretation since this should already the be case since _module MUST always have the type Module.
Where do I find more information on this use of +?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a blog post from 2016 trumps the current documentation. Even so, I don't see anything there that suggests that + means that we always have to initialize the property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just meant that this is the only place where I know +
was documented. Dunno if that changed, and haven't looked into how to use it myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh you're referring to the +
. Yea that just means that this field is read-only which is helpful to make it covariant. In Flow, the classes have the ability to define their own interface so the documentation for interfaces applies here.
This is just a way to say that it is read-only. This is to enforce, that we don't mutate this later on. Mutating it would break the invariant that the lazy initialized functions and types all belong to the same module. If we're generating multiple-modules we need multiple Intrinsics objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covariant (read-only) properties are allowed to be initialized in class constructors in a recent-ish change to the type system. This preserves the semantics of covariant properties since you can’t assign back some larger type.
type Y = {+p: number | string};
class X {
+p: number;
constructor() {
this.p = 42; // Ok
setTimeout(() => {
this.p = 0; // Error
(this: Y).p = 'foo'; // Error
}, 0);
}
m() {
this.p = 0; // Error
(this: Y).p = 'foo'; // Error
}
}
} | ||
|
||
isStringType(type: LLVMType): boolean { | ||
if (!this._stringType) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good idea to return false if _stringType is not defined? Why not return this.stringType.equals(type)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not defined, it'll throw and Flow won't let me call equals on it.
If it is not yet defined, that means that there are no strings defined in this context yet. So whatever this type is, it can't be equal to a string since there are no strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is extremely subtle. You are essentially saying that the only way a there can be a type T so that this.stringType.equals(T) is if T === this.stringType. I'm none too sure that this is desirable, but if it is, there should be a big comment about it and my follow up to that would be why call equals instead of using ===?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this approach is because I don't want to generate unnecessary code in the LLVM output for a bunch of types if this type is never referenced in the output. I still have to test whether something is this type without knowing if it might be though. I suspect there will be a lot more of these cases for other types that I'll need to encode. I wouldn't want to document it for ever type but I can make a global comment about it in this file.
The equals
call is just how the LLVM node bindings work since the wrapper objects don't guarantee JS object equality since the underlying C++ APIs can return instances that the JS bindings doesn't know about. That pattern will be all over the code base. It's pretty common since JS doesn't have operator overloading (yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still befuddled. If the C++ API can return a value that wasn't obtained via the getter of stringType, what guarantee is there that isStringType won't be called before stringType is called and therefore have inconsistent behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ can't make up a type that I haven't created yet. However, if I have created it, the C++ API can return the C++ version of that object from reflection APIs. The JS binding around the C++ bindings doesn't know that there already is a JS instance associated with this C++ object so it creates a new wrapper JS object around the C++ object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, be that as it may then. A source code comment to this effect would be very helpful to a future reader. Ideally some invariant to enforce it would also be desirable.
return builder.createSelect(value, ConstantInt.get(llvmContext, 1), ConstantInt.get(llvmContext, 0)); | ||
} else if (value.type.isDoubleTy()) { | ||
// Number | ||
// TODO: I think we can make this fewer instructions/faster by some clever bit manipulation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the special handling of infinity/NaN? Reading the spec, FPToSI would result in something undefined otherwise. And why mod 2^32? Is this all to emulate x | 0
in JS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the internal operation called by x | 0
yes. <<
, >>
and >>>
uses ToUint32 which is slightly different.
https://tc39.github.io/ecma262/#sec-toint32
I'm not sure what you're asking. The spec requires infinity/nan to have a certain behavior. The FPToSI behavior is somewhat undefined but will in practice simply yield the wrong result so I can't rely on it alone.
The mod 2^32 is speced because it basically means taking the first 32 bits of the fraction after the rounding. This is different than the overflow behavior you would get from using any of the other LLVM operations.
The reason things are defined this way is so that multiple operations are idempotent which helps when you combine multiple of these operations in sequence and asm.js. I don't fully understand it but seems clever.
None of this applies to any single or even few CPU instructions. Interestingly no implementation does the same thing here. Everyone tries their own clever thing. I can't rely on certain platform specific quirks so not all solution would work cross-platform. I have some ideas around 64-bit casting (not currently exposed by the llvm-node bindings). In the end I opted for just the simplest code.
// in a struct to keep track that this is an unsigned int so that we know | ||
// which operations to apply on this. | ||
let unsignedValue = UndefValue.get(state.intrinsics.uint32Type); | ||
return builder.createInsertValue(unsignedValue, value, [0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how because of the spec is written these things become safe for subsequent operations.
This lets us Prepack to native machine code or WebAssembly - without a JS runtime.
Prepack knows a lot about a program that it can evaluate. It is also highly specialized at getting rid of intermediate objects.
Most of the complexity of the serializer has to do with residual objects and closures that might leak to other JS.
Most of the complexity of a JS runtime comes from supporting the object model.
If we forbid leaking objects, and that Prepack has full knowledge of the program, then we know a lot about the types. This won't work with existing programs but new programs written for these constraints could benefit from this.
I wrote a new backend in parallel to the normal serializer. There is not a lot in common with the problem space so I decided to add a new serializer rather than build on the existing one.
Type System
In this first PR, only booleans and numbers are supported at the interop layer but I expect to support closures, symbols, and array buffers. Longer term we can support strings and Typed Objects.
The type system is currently strongly typed so it will reject a program where abstract values yield more than one type.
Functions are modeled by the normal function with return
__abstract(':void', 'linkMethodName')
. The argument types are inferred by the arguments.I model booleans as
i1
, integrals asi32
and other numbers asf64
.Limitations
The limitations are mainly in the same set of problems we're currently investigating. Loops and recursive functions are not allowed.
The generated code must inline everything to completely get rid of all objects. This can yield bloated and suboptimal code.
In the future I hope that we can use arena allocation of custom object structures to temporarily store values created in recursive functions and loops.
Is This Useful?
I could see this as helpful for simpler functions such as animation functions that need to run on a different thread, audio processing functions, simple but highly parallizable functions like shaders etc.
It could potentially be useful for some React components that needs to execute at extreme performance.
Installation
This PR adds an optional dependency on the
llvm-node
project which contains node bindings to LLVM.Downstream users of
prepack
doesn't automatically install these dependencies. Instead they have to be manually installed in the parent project. For this reason, the prepack CLI lazily requires these modules and print an error message if they're not installed.It requires both cmake and LLVM to be installed.
llvm-node
depends on thenan
project which should install automatically but I had to manually installnan
first for some reason.MacOS installation instructions:
Additionally running the
yarn test-llvm
command requires thelli
tool (LLVM interpreter) available on the PATH.Building a Native Program
Compile to LLVM bitcode:
Compile to native assembly:
Link the program to a native executable:
Run it:
Debug by printing the LLVM IR assembly language code:
Future Work