-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use whole-program code-generation for all backends #712
use whole-program code-generation for all backends #712
Conversation
The `ModuleList` used by the collector pass is now no longer re-used in the VM backend, but instead translated to a custom representation (see `produceModules`). In addition, the list with the top-level AST is now discarded after it's passed to code generation, which should reduce memory pressure, and thus peak memory usage, quite a bit.
Extend the set of routines for interacting with `Store` and `SeqMap`.
Store the modules in a `SeqMap` with their position as the key. This conveys intention better and leaves it to the consumer to use a more fitting structure/representation. For the internal representation, `vmbackend` uses a `Store` for the list of modules, which makes the intention more clear and querying the list a bit more ergonomic.
Much like how it works for the VM backend, the C backend now also uses an orchestrator (the new `cbackend` module) that invokes the code generator. The `passes` integration (`myOpen`, `myProcess`, and `myClose`) is removed from `cgen`. Setting up the extra header backend module (used for C header generation) is now the responsibility of the orchestrator. Similar to `vmbackend`, the new `cbackend` also operates on the semantically analyzed AST of the whole program.
The upcoming C code-generation orchestrator needs access to the `IdGenerator` for each module. They're now stored with `FullModule`, allowing the orchestrator to retrieve them later. `vmbackend` is adjusted to make use of the module's `IdGenerator`, which is more precise than using the `ModuleGraph`'s one. Ideally, the backend should not introduce any new symbol and type instances (which is what the `IdGenerator`s are required for), but `transf` currently necessitates that.
Since code generation now only takes place *after* the whole program was semantically analyzed, forwarded procedures and unresolved borrows no longer reach there.
Disabling stack-traces for the system module and all modules it imports differs from what the C code-generator does (stack-traces are only disabled for the `system` module there, but not for the ones it imports). The required access to `PGlobals` via `g.backend` is also a small problem for upcoming `jsgen` refactoring.
Very similar to the introduction of an orchestrator for the C code- generator, but for the JS code-generator. The `passes` integration is removed form `jsgen`, and writing the module to disk and generating the source map moved to the orchestrator.
The pass should only collect the statements into a list, and not introduce its own decision making.
Instead of extracting the `ModuleList` as part of `generateCode`, it is now passed in as a `sink` parameter. This: * makes it easier to move away from storing the `ModuleList` as part of the `ModuleGraph` * moves the mutation to the callsite * allows for implementing the memory consumption optimization employed by `vmbackend` (which is eventually going to be used for the other backends too) in a much cleaner way
With code generation now always happening after all semantic analysis is done, option changes applied via the `push` and `pop` pragmas no longer apply to top-level statements. As a temporary solution, the feature could be made to work by processing `nkPragma` nodes in the orchestrators, but the planned upcoming changes would render this approach unusable again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I left a few minor suggestions mostly to do with a typo here or there in comments.
I saw a few in the PR desc, I'll tackle those after I take a walk.
I was going to suggest it but things are too different right now, but we can probably get away with a single |
Yep, that's what I initially tried as part of #550, but the differences in processing were too large for it to make sense at this time. With that said, having a single entry point into the backend is roughly the direction I'm going in. |
Thank you for the review, @saem .
The comment was misleading, as a backend module is not closed for writing after the call to `finalCodegenActions`.
/merge |
Merge requested by: @saem Contents after the first section break of the PR description has been removed and preserved below:
|
Summary
Change the C and JavaScript backends to use whole-program code-
generation in the same way that the VM and IC backends do. In short,
this means that code generation is now only run after all modules part
of the program were semantically analyzed.
This brings the architecture of the C and JS backends closer to that of
the VM backend, and is the first step towards unifying the backend
processing.
The final goal is to have all backends process code in the same way,
with as much as possible of the pre-processing currently performed by
the code generators being moved to a shared, backend-agnostic layer.
Details
Generalize the
ModuleGraph
pass that the VM backend used for gatheringthe AST of each alive module, and move it into its own module. Compared
to the original implementation, the generalized pass:
them, but the C and JS backends do)
IdGenerator
associated with each moduleSeqMap
instead of a rawseq
vmbackend
is adjusted to work with the generalized collection pass:the module list produced by the pass is translated into the structure
that the rest of the VM backend still expects, but a
Store
with adedicated ID type is now used instead of a raw
seq
(preparing forfuture improvements).
Using the same naming scheme as
vmbackend
, the modulescbackend
and
jsbackend
are introduced -- they implement the code-generationorchestrators for the C and JavaScript backends, respectively. The
passes
integration is removed fromcgen
andjsgen
, as invoking thecode generators is now the responsibility of the orchestrators.
The new orchestrators take the module list produced by the collector
pass and generate the code for it (with
jsbackend
also writing theoutput to disk already). They're very basic at the moment, but in the
future will take on similar responsibilities as
vmbackend
doesfor the VM backend (e.g., dead-code elimination, running
transf
,etc.).
Since forwarded procedures don't reach the code generators anymore, the
special handling for them is removed.
Known Issues
Changes to options performed by
.push
having an effect on top-levelcode relied on semantic analysis and code generation happening in a
pipelined manner.
Since this is no longer the case, disabling or enabling checks for top-
level code stops working for now, but the plan is to bring this feature
back in the future.