use whole-program code-generation for all backends #712

zerbina · 2023-05-19T18:54:08Z

Summary

Change the C and JavaScript backends to use whole-program code-
generation in the same way that the VM and IC backends do. In short,
this means that code generation is now only run after all modules part
of the program were semantically analyzed.

This brings the architecture of the C and JS backends closer to that of
the VM backend, and is the first step towards unifying the backend
processing.

The final goal is to have all backends process code in the same way,
with as much as possible of the pre-processing currently performed by
the code generators being moved to a shared, backend-agnostic layer.

Details

Generalize the ModuleGraph pass that the VM backend used for gathering
the AST of each alive module, and move it into its own module. Compared
to the original implementation, the generalized pass:

doesn't drop declarative nodes (the VM backend doesn't care about
them, but the C and JS backends do)
remembers the IdGenerator associated with each module
doesn't introduce its own module order (that's left to the backend)
uses the more descriptive SeqMap instead of a raw seq

vmbackend is adjusted to work with the generalized collection pass:
the module list produced by the pass is translated into the structure
that the rest of the VM backend still expects, but a Store with a
dedicated ID type is now used instead of a raw seq (preparing for
future improvements).

Using the same naming scheme as vmbackend, the modules cbackend
and jsbackend are introduced -- they implement the code-generation
orchestrators for the C and JavaScript backends, respectively. The
passes integration is removed from cgen and jsgen, as invoking the
code generators is now the responsibility of the orchestrators.

The new orchestrators take the module list produced by the collector
pass and generate the code for it (with jsbackend also writing the
output to disk already). They're very basic at the moment, but in the
future will take on similar responsibilities as vmbackend does
for the VM backend (e.g., dead-code elimination, running transf,
etc.).

Since forwarded procedures don't reach the code generators anymore, the
special handling for them is removed.

Known Issues

Changes to options performed by .push having an effect on top-level
code relied on semantic analysis and code generation happening in a
pipelined manner.

Since this is no longer the case, disabling or enabling checks for top-
level code stops working for now, but the plan is to bring this feature
back in the future.

The `ModuleList` used by the collector pass is now no longer re-used in the VM backend, but instead translated to a custom representation (see `produceModules`). In addition, the list with the top-level AST is now discarded after it's passed to code generation, which should reduce memory pressure, and thus peak memory usage, quite a bit.

Extend the set of routines for interacting with `Store` and `SeqMap`.

Store the modules in a `SeqMap` with their position as the key. This conveys intention better and leaves it to the consumer to use a more fitting structure/representation. For the internal representation, `vmbackend` uses a `Store` for the list of modules, which makes the intention more clear and querying the list a bit more ergonomic.

Much like how it works for the VM backend, the C backend now also uses an orchestrator (the new `cbackend` module) that invokes the code generator. The `passes` integration (`myOpen`, `myProcess`, and `myClose`) is removed from `cgen`. Setting up the extra header backend module (used for C header generation) is now the responsibility of the orchestrator. Similar to `vmbackend`, the new `cbackend` also operates on the semantically analyzed AST of the whole program.

The upcoming C code-generation orchestrator needs access to the `IdGenerator` for each module. They're now stored with `FullModule`, allowing the orchestrator to retrieve them later. `vmbackend` is adjusted to make use of the module's `IdGenerator`, which is more precise than using the `ModuleGraph`'s one. Ideally, the backend should not introduce any new symbol and type instances (which is what the `IdGenerator`s are required for), but `transf` currently necessitates that.

Since code generation now only takes place *after* the whole program was semantically analyzed, forwarded procedures and unresolved borrows no longer reach there.

Disabling stack-traces for the system module and all modules it imports differs from what the C code-generator does (stack-traces are only disabled for the `system` module there, but not for the ones it imports). The required access to `PGlobals` via `g.backend` is also a small problem for upcoming `jsgen` refactoring.

Very similar to the introduction of an orchestrator for the C code- generator, but for the JS code-generator. The `passes` integration is removed form `jsgen`, and writing the module to disk and generating the source map moved to the orchestrator.

The pass should only collect the statements into a list, and not introduce its own decision making.

Instead of extracting the `ModuleList` as part of `generateCode`, it is now passed in as a `sink` parameter. This: * makes it easier to move away from storing the `ModuleList` as part of the `ModuleGraph` * moves the mutation to the callsite * allows for implementing the memory consumption optimization employed by `vmbackend` (which is eventually going to be used for the other backends too) in a much cleaner way

With code generation now always happening after all semantic analysis is done, option changes applied via the `push` and `pop` pragmas no longer apply to top-level statements. As a temporary solution, the feature could be made to work by processing `nkPragma` nodes in the orchestrators, but the planned upcoming changes would render this approach unusable again.

saem

Looks good.

I left a few minor suggestions mostly to do with a typo here or there in comments.

~~I saw a few in the PR desc, I'll tackle those after I take a walk.~~

compiler/backend/collectors.nim

compiler/modules/modulegraphs.nim

tests/overflw/toverflw.nim

saem · 2023-05-19T22:00:48Z

I was going to suggest it but things are too different right now, but we can probably get away with a single backend module, that has a single public routine, something like: generateCode*(g: ModuleGraph, mlist: sink ModuleList, backend: TBackend), this would evolve over time, likely become an iterator producing progress events. But that's all nice to have and that consolidation doesn't buy us enough as things are still not similar enough.

zerbina · 2023-05-20T18:23:17Z

I was going to suggest it but things are too different right now, but we can probably get away with a single backend module

Yep, that's what I initially tried as part of #550, but the differences in processing were too large for it to make sense at this time. With that said, having a single entry point into the backend is roughly the direction I'm going in.

@saem

Thank you for the review, @saem .

The comment was misleading, as a backend module is not closed for writing after the call to `finalCodegenActions`.

saem · 2023-05-20T18:47:53Z

/merge

github-actions · 2023-05-20T18:48:17Z

Merge requested by: @saem

Contents after the first section break of the PR description has been removed and preserved below:

Notes for Reviewers

split out from compiler: unify the backend processing #550

the change has a large impact on the compiler's architecture

zerbina added 4 commits May 19, 2023 19:17

move the 'collect' pass to a dedicated module

ccb2b75

containers: extend the APIs

274a730

Extend the set of routines for interacting with `Store` and `SeqMap`.

zerbina added compiler General compiler tag compiler/backend Related to backend system of the compiler simplification Removal of the old, unused, unnecessary or un/under-specified language features. labels May 19, 2023

zerbina added 9 commits May 19, 2023 20:38

cgen: remove handling of forwarded procedures

4e54bcf

Since code generation now only takes place *after* the whole program was semantically analyzed, forwarded procedures and unresolved borrows no longer reach there.

add a JS code-generation orchestrator

73466ce

Very similar to the introduction of an orchestrator for the C code- generator, but for the JS code-generator. The `passes` integration is removed form `jsgen`, and writing the module to disk and generating the source map moved to the orchestrator.

jsgen: remove handling of forwarded procedures

a967e36

collectors: don't drop declarative nodes

bb74161

The pass should only collect the statements into a list, and not introduce its own decision making.

saem approved these changes May 19, 2023

View reviewed changes

compiler/backend/collectors.nim Outdated Show resolved Hide resolved

compiler/modules/modulegraphs.nim Outdated Show resolved Hide resolved

tests/overflw/toverflw.nim Outdated Show resolved Hide resolved

haxscramper added this to the C backend refactoring milestone May 20, 2023

zerbina modified the milestones: C backend refactoring, JS backend refactoring May 20, 2023

zerbina added 2 commits May 20, 2023 19:36

address review comments

2d8e350

Thank you for the review, @saem .

cbackend: clarify the "close module" comment

6757a31

The comment was misleading, as a backend module is not closed for writing after the call to `finalCodegenActions`.

chore-runner bot added this pull request to the merge queue May 20, 2023

Merged via the queue into nim-works:devel with commit f6b9d84 May 20, 2023

zerbina deleted the whole-program-code-generation branch May 22, 2023 21:31

zerbina mentioned this pull request Jun 27, 2023

compiler: unify the backend processing #550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use whole-program code-generation for all backends #712

use whole-program code-generation for all backends #712

zerbina commented May 19, 2023 •

edited by github-actions bot

Loading

saem left a comment •

edited

Loading

saem commented May 19, 2023

zerbina commented May 20, 2023

saem commented May 20, 2023

github-actions bot commented May 20, 2023

Notes for Reviewers

use whole-program code-generation for all backends #712

use whole-program code-generation for all backends #712

Conversation

zerbina commented May 19, 2023 • edited by github-actions bot Loading

Summary

Details

Known Issues

saem left a comment • edited Loading

Choose a reason for hiding this comment

saem commented May 19, 2023

zerbina commented May 20, 2023

saem commented May 20, 2023

github-actions bot commented May 20, 2023

Notes for Reviewers

zerbina commented May 19, 2023 •

edited by github-actions bot

Loading

saem left a comment •

edited

Loading