Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use whole-program code-generation for all backends #712

Merged
merged 15 commits into from
May 20, 2023

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented May 19, 2023

Summary

Change the C and JavaScript backends to use whole-program code-
generation in the same way that the VM and IC backends do. In short,
this means that code generation is now only run after all modules part
of the program were semantically analyzed.

This brings the architecture of the C and JS backends closer to that of
the VM backend, and is the first step towards unifying the backend
processing.

The final goal is to have all backends process code in the same way,
with as much as possible of the pre-processing currently performed by
the code generators being moved to a shared, backend-agnostic layer.

Details

Generalize the ModuleGraph pass that the VM backend used for gathering
the AST of each alive module, and move it into its own module. Compared
to the original implementation, the generalized pass:

  • doesn't drop declarative nodes (the VM backend doesn't care about
    them, but the C and JS backends do)
  • remembers the IdGenerator associated with each module
  • doesn't introduce its own module order (that's left to the backend)
  • uses the more descriptive SeqMap instead of a raw seq

vmbackend is adjusted to work with the generalized collection pass:
the module list produced by the pass is translated into the structure
that the rest of the VM backend still expects, but a Store with a
dedicated ID type is now used instead of a raw seq (preparing for
future improvements).

Using the same naming scheme as vmbackend, the modules cbackend
and jsbackend are introduced -- they implement the code-generation
orchestrators for the C and JavaScript backends, respectively. The
passes integration is removed from cgen and jsgen, as invoking the
code generators is now the responsibility of the orchestrators.

The new orchestrators take the module list produced by the collector
pass and generate the code for it (with jsbackend also writing the
output to disk already). They're very basic at the moment, but in the
future will take on similar responsibilities as vmbackend does
for the VM backend (e.g., dead-code elimination, running transf,
etc.).

Since forwarded procedures don't reach the code generators anymore, the
special handling for them is removed.

Known Issues

Changes to options performed by .push having an effect on top-level
code relied on semantic analysis and code generation happening in a
pipelined manner.

Since this is no longer the case, disabling or enabling checks for top-
level code stops working for now, but the plan is to bring this feature
back in the future.

The `ModuleList` used by the collector pass is now no longer re-used in
the VM backend, but instead translated to a custom representation (see
`produceModules`).

In addition, the list with the top-level AST is now discarded after it's
passed to code generation, which should reduce memory pressure, and thus
peak memory usage, quite a bit.
Extend the set of routines for interacting with `Store` and `SeqMap`.
Store the modules in a `SeqMap` with their position as the key. This
conveys intention better and leaves it to the consumer to use a more
fitting structure/representation.

For the internal representation, `vmbackend` uses a `Store` for the
list of modules, which makes the intention more clear and querying the
list a bit more ergonomic.
@zerbina zerbina added compiler General compiler tag compiler/backend Related to backend system of the compiler simplification Removal of the old, unused, unnecessary or un/under-specified language features. labels May 19, 2023
Much like how it works for the VM backend, the C backend now also uses
an orchestrator (the new `cbackend` module) that invokes the code
generator.

The `passes` integration (`myOpen`, `myProcess`, and `myClose`) is
removed from `cgen`. Setting up the extra header backend module (used
for C header generation) is now the responsibility of the orchestrator.

Similar to `vmbackend`, the new `cbackend` also operates on the
semantically analyzed AST of the whole program.
The upcoming C code-generation orchestrator needs access to the
`IdGenerator` for each module. They're now stored with `FullModule`,
allowing the orchestrator to retrieve them later.

`vmbackend` is adjusted to make use of the module's `IdGenerator`, which
is more precise than using the `ModuleGraph`'s one. Ideally, the backend
should not introduce any new symbol and type instances (which is what
the `IdGenerator`s are required for), but `transf` currently
necessitates that.
Since code generation now only takes place *after* the whole program was
semantically analyzed, forwarded procedures and unresolved borrows no
longer reach there.
Disabling stack-traces for the system module and all modules it imports
differs from what the C code-generator does (stack-traces are only
disabled for the `system` module there, but not for the ones it
imports).

The required access to `PGlobals` via `g.backend` is also a small
problem for upcoming `jsgen` refactoring.
Very similar to the introduction of an orchestrator for the C code-
generator, but for the JS code-generator.

The `passes` integration is removed form `jsgen`, and writing the module
to disk and generating the source map moved to the orchestrator.
The pass should only collect the statements into a list, and not
introduce its own decision making.
Instead of extracting the `ModuleList` as part of `generateCode`, it is
now passed in as a `sink` parameter. This:
* makes it easier to move away from storing the `ModuleList` as part of
  the `ModuleGraph`
* moves the mutation to the callsite
* allows for implementing the memory consumption optimization employed
  by `vmbackend` (which is eventually going to be used for the other
  backends too) in a much cleaner way
With code generation now always happening after all semantic analysis is
done, option changes applied via the `push` and `pop` pragmas no longer
apply to top-level statements.

As a temporary solution, the feature could be made to work by processing
`nkPragma` nodes in the orchestrators, but the planned upcoming changes
would render this approach unusable again.
Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I left a few minor suggestions mostly to do with a typo here or there in comments.

I saw a few in the PR desc, I'll tackle those after I take a walk.

compiler/backend/collectors.nim Outdated Show resolved Hide resolved
compiler/modules/modulegraphs.nim Outdated Show resolved Hide resolved
tests/overflw/toverflw.nim Outdated Show resolved Hide resolved
@saem
Copy link
Collaborator

saem commented May 19, 2023

I was going to suggest it but things are too different right now, but we can probably get away with a single backend module, that has a single public routine, something like: generateCode*(g: ModuleGraph, mlist: sink ModuleList, backend: TBackend), this would evolve over time, likely become an iterator producing progress events. But that's all nice to have and that consolidation doesn't buy us enough as things are still not similar enough.

@zerbina
Copy link
Collaborator Author

zerbina commented May 20, 2023

I was going to suggest it but things are too different right now, but we can probably get away with a single backend module

Yep, that's what I initially tried as part of #550, but the differences in processing were too large for it to make sense at this time. With that said, having a single entry point into the backend is roughly the direction I'm going in.

Thank you for the review, @saem .
The comment was misleading, as a backend module is not closed for
writing after the call to `finalCodegenActions`.
@saem
Copy link
Collaborator

saem commented May 20, 2023

/merge

@github-actions
Copy link

Merge requested by: @saem

Contents after the first section break of the PR description has been removed and preserved below:


Notes for Reviewers

@chore-runner chore-runner bot added this pull request to the merge queue May 20, 2023
Merged via the queue into nim-works:devel with commit f6b9d84 May 20, 2023
@zerbina zerbina deleted the whole-program-code-generation branch May 22, 2023 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/backend Related to backend system of the compiler compiler General compiler tag simplification Removal of the old, unused, unnecessary or un/under-specified language features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants