compiler: unify the pre-code-gen processing #777

zerbina · 2023-06-27T21:25:08Z

Summary

Introduce a unified backend processing pipeline (the process iterator)
based on the implementation of the vmbackend module, and integrate it
into all three code generation orchestrators.

Discovery of alive entities (which automatically includes dead-code
elimination), applying transformations, and various other backend-
agnostic processing are now all implemented via a single, shared
implementation, making them much easier to maintain and adjust.

As a side-effect of the reworked lifted global (globals defined in
procedure via the .global pragma) handling, multiple issues and bugs
with them are fixed:

initialization and destruction order for them is the same across all
backends
JavaScript backend: lifted global use the pre-init mechanism for
lifted globals now, meaning that they are initialized during the
attached-to module's pre-init phase, instead of when control-flow
first reaches the var|let statement
names of both normal and lifted globals defined inside a for-loop
always refer to the same location now, regardless of whether the used
inline iterator contains multiple yield statement
the name of a lifted global defined inside an .inline iterator
always refers to the exact same location across all usages of the
iterator (each inlined occurrence got its own copy, previously)
locals defined in the initializer expression of lifted globals are now
considered by the move analyser and destructor injection. For example,
the s in var g {.global.} = (block: (var s = @[...]; 1)) is
now destroyed

In addition:

constants marked with exportc are now always included in the
generated code, even when not explicitly used
a regression with --header not including the prototypes of
exportc'ed globals is fixed
the VM backend doesn't generate code for .compileTime globals
anymore
custom .dynlib procedure loading logic is run during the data-init
phase of a module, instead of as part of the module-init procedure

Details

The key changes are:

the code generators don't implement discovery of alive entities
anymore
discovery of alive procedures now uses an iterative approach instead
of a recursive one for the C and JS backends (the VM backend already
did)
the code generators are passed the fully transformed AST as input
emitting the definition code for globals is now fully the
responsibility of the orchestrators
the module destructor produced by modulelowering is no longer
modified in the backend
destruction of lifted globals happens in the new "post-destructor"
module-bound operator
calls to the pre-init and post-destructor operators are now directly
emitted into the main procedure
the orchestrators no longer directly call generateMethodDispatcher
locations marked with .global are consistently treated as lifted
globals (even top-level locations) across the backends

In addition, the module-init procedure is now named the same as the
module it belongs to and longer has an "Init" postfix. This gets around
having to introduce a special case for the init procedure in the C code
generator, now that it treats the module-init procedure like a normal
procedure.

The `process` iterator

At the center of the additions is the process inline iterator (which
represents the unified processing pipeline). It implements the discovery
of alive entities, preparing the procedures for code generation (by
invoking transf and applying the MIR passes), generating method
dispatchers, and lifting the pre-init and post-destructor operators.

Whenever interaction with the caller is required, the iterator produces
(yields) an event that contains the information necessary for the caller
to act on it. The caller (the orchestrators) then, if necessary, pre-
processes the newly discovered entities and, if the event is about code
becoming available, invokes the code generator. Newly discovered
entities are communicated through the DiscoverData structure that both
the iterator and caller have access to.

The iterator's implementation and design is complicated by two things:
late dependencies and (to a lesser degree) first-seen-in-module
tracking.

Late Dependencies

A late dependency of a procedure (or code fragment in general) is a
reference to some procedure that is not visible by scanning the MIR
representation. They occur when a code generator emits a call to a
.compilerproc. The general goal is to remove raising late dependencies
from cgen and jsgen, but until then, the process iterator has to
support the case where the caller registers new procedures with
DiscoverData.

First-seen-in-module Tracking

For the inline procedure handling of the C backend to work, it needs to
know about at least one module an inline procedure is used in. The
first-seen-in-module is used for this, but the unified processing
pipeline has to track it.

`cgen` and `cbackend`

The key changes / important things to note are:

emitting procedures into the correct C file section is now done by the
orchestrator
inline procedures are now only transformed a single time (if used),
instead of once for each module they're used in
header file generation (when using the --header option) is fully
managed by the orchestrator
pre-init procedure generation and everything related to lifted
globals is removed from the code generator
.dynlib procedure handling is still mostly handled by the code
generator
all dedicated module-init procedure is removed; they are, for the
largest part, treated as normal procedures now
all dedicated top-level statement processing is removed from the code
generator

Other changes and internal restructurings:

introduce the internal sfTopLevel flag and use it to: emit a
nimTestErrorFlag call at the exit of a flagged procedure,
special casing of emit and asm statements
split genProcAux into startProc and finishProc
move code generation logic for the temporary TNimNode storage into
genDatInitCode and remove genInitCode
remove support for alive information provided by dce.nim, as the
code generator is not responsible anymore for deciding which
procedures need to be processed

Making sure that a duplicate of an inline procedure is emitted into each
C file the procedure is used in doesn't happen automatically anymore and
thus requires special support by the orchestrator.

`jsgen` and `jsbackend`

split genProc into startProc and finishProc
integrate and adjust to the process iterator
remove the DCE related bits from jsgen

`vmbackend`

integrate and adjust to the process iterator
remove the now-obsolete types and procedures

Constants now use the sequence provided by DiscoverData, removing a
usage of the vmdef.LinkState type from vmbackend.

`transf` and MIR processing

introduce a dedicated goIsCompileTime option for mirgen, so that
definitions of .compileTime globals can be properly ignored when
using the VM backend
remove the workarounds related to lifted globals from the mirbridge
and injectdestructors modules
the transformed body of .inline procedures is no longer
unconditionally cached

With the last bits of the new-style DCE (the one implemented by the
dce module) support removed from cgen, two MIR simplification are
possible:

nkConstSection are discarded by mirgen
the def for a procedure doesn't carry the procdef anymore

Duplicated Globals

There was a problem regarding globals in transf. When introducing
fresh symbols in the context of iterator inlining, globals were also
copied. While that did work previously, it no longer can, as the whole
backend processing now expects all module-level globals to be known
prior to transf.

Keeping the original symbol is possible for to-be-lifted globals
(duplicate definitions of those are filtered out by
produceFragmentsForGlobals), but not for module-level globals, as the
injectdestructors pass requires the latter to have only a single
definition. Therefore, module-level globals are still copied, but
they're now associated with the original symbol, and the
rewriteGlobalDefs procedure uses this information to restore the
original symbols after the injectdestructors has run.

This also fixes the pre-existing bug where the name of a global defined
inside a for-loop body referred to different locations when the for-
loop body was duplicated (due to inline iterators with multiple
yield statements).

Other

don't place module-level globals in a module's struct, and reparent
them to the module's init procedure -- this makes them appear as
lifted globals to transf.extractGlobals (what to do with them was
previously up to the used code generator)
remove the ModuleGraph.globalDestructors list and usages thereof;
the unified backend processing renders it obsolete

Misc

implement the merge operation for the Store container type
implement the append operation for SourceMap
add new iterators for the SeqMap and Store types

It's going to be needed for appending MIR fragments.

Don't conflate "choose nimvm branch" with "in compile-time context". There's now a dedicated `GenOption` for communicating whether the input code is meant to be run at compile-time. This fixes `.compileTime` globals being treated as normal globals when using the VM backend.

With the IC backend now only being a front-end for the new-style orchestrator, the special casing is unnecessary and can be removed.

The extra `BProc` parameter is unnecessary.

The definition is emitted into a section that unrelated to the init procedure (and instead relevant to the data-init procedure), so it makes more sense to emit the code in `genDatInitCode`. In addition, the already unused handling for temporary `TNimType` storage is, along with all related state, removed.

Don't generate code for a procedure or constant on its first usage. Instead, only emit a prototype and setup the `TLoc` -- generating the code for the procedures and constants is going to be the responsibility of the orchestrator. Also remove the "is declared" guard for definitions of globals; this prevents erroneous re-definitions from going unnoticed (they will cause a C compiler error instead).

This is all going to be implemented by the new unified backend processing.

Adjust the public interface of the code generator to support the soon- to-be needs of the orchestrator. `genProcAux` is split up into `startProc` and `finishProc`, with `finishProc` returning the generated code instead of emitting it into a C module section. The `genProc` procedure is kept as a convenience wrapper around both, and expects . The init procedure is, for the most part, finally going to be treated as any other procedure, so all the dedicated code generation logic (`genInitCode`, `genInitProc`) is removed. The leftover handling of top-level is also going to be obsolete, and, as a preparation, is thus removed. Together, this removes all direct interactions with the MIR and `transf` from `cgen`.

Revise the interface of the JS code generator in a similar fashion to to how it was done for the C code generator.

The procedure is only used there.

`canonicalize` unnecessarily re-implemented the echo logic.

As a preparation for the handling of lifted globals becoming a part of the common backend processing, remove all the workarounds for lifted globals that are necessitated by the JS backend doing its own thing.

Don't leave handling the `preInit` calls to each orchestrator. Instead, emit the calls as part of `generateMain`, which also automatically takes care of eliding calls to empty `preInit` procedures.

Instead of having to modify the module destructor (which would be a problem for the unified backend processing), the "post-destructor" works similar to the "pre-init" op (which is the inverse of the former). A call to it is inserted after the call to the normal module destructor.

With no users left, the facility is now obsolete and thus removed.

Generalize (with some extensions) the processing logic from `vmbackend` into an inline iterator (`process`), and use this iterator to replace the dead-code elimination and handling of lifted globals previously implemented by `cgen` and `jsgen`. Other changes: - don't introduce new globals (by way of copying symbols) in `transf` - remove support for globals from `ccgstmts.genSingleVar` - export procedures from `cgen` that are now needed by the orchestrator - update various documentation comments - add tests for the now fixed bugs regarding lifted global

The dependency on the enum-to-string procedure used for enum stringification wasn't communicated back to the orchestrator.

A module's 'init' procedure is now treated the same as any other procedure, with no special assembling by the code generator required. The last remaining user of the `initProc` field is the `.dynlib` procedure handling, which is changed to emit manual calls into the dynlib-init section (which is part of the data-init procedure) instead. In the future, generating the code for loading `.dynlib` procedures is going to happen as part of the `process` iterator.

It's no longer necessary now that the `cgen` orchestrator manages the caching.

The init procedure (i.e., module-level code) still needs to be special- cased in some situations: - module-level `emit` and `asm` should keep working for now - a call to `nimTestErrorFlag` needs to be inserted prior to the return Similar to what was done for `jsgen`, the init procedure is marked with a new internal symbol flag (`sfTopLevel`), which `cgen` then checks for.

For nested globals, a copy is introduced (again), but the copy of the symbol is now eliminated right after the `injectdestructors` pass.

The `generatedSyms` set is not used anymore.

A regression local to this PR. Including the flag was accidentally removed when the `process` iterator was integrated.

It's not pretty, but properly implementing it requires the `process` iterator to manage `.dynlib` procedures and globals (which is planned for a separate PR).

zerbina · 2023-07-02T15:25:06Z

It's almost there, only a fix for the regression with .global for module-level definitions is missing now.

This restores the previous behaviour of top-level globals marked with `.global` being initialized in the module's pre-init procedure. The `once` template depends on this behaviour, but whether it's a good idea is a separate question...

Revert "transf: also extract pure globals" and implement the attempted fix properly. Now, module-level `.global`s are neither placed into the module struct nor do they keep the module as the owner, making them appear as a lifted global to `extractGlobals` (without requiring any changes to the latter).

saem

Partial review: did a first pass just on the tests themselves.

Now going over the implementation.

tests/global/tlifted_global_in_loop_body.nim

tests/global/tlifted_global_in_inline_iterator.nim

saem

Basically done reviewing, I wanted to give backends another pass mostly for improvements like docs, simplifications, etc... nothing major/blocking.

I'm good with it merging before hand.

compiler/utils/containers.nim

compiler/sem/transf.nim

compiler/mir/mirbridge.nim

saem · 2023-07-04T02:46:36Z

compiler/backend/backends.nim

+      ## procedures that are queued for code-generation
+    config: BackendConfig
+
+  Queue*[T] = object


it's like a baby kafka. 👶🏽

compiler/backend/backends.nim

Co-authored-by: Saem Ghani <saemghani+github@gmail.com>

saem · 2023-07-04T23:49:29Z

/merge

github-actions · 2023-07-04T23:49:57Z

Merge requested by: @saem

Contents after the first section break of the PR description has been removed and preserved below:

To-do

remove the undocumented "extension loader" .dynlib feature (as a separate PR)

fix the registration of the nimGetProcAddr late dependency

restore basic --header support

for fixing the JS test failure, make a separate PR that introduces dedicated asm and emit statements to the MIR

write up a summary of the differences between this and the superseded PR

write a proper commit message

fix the remaining two issues via separate PRs

Notes for Reviewers

supersedes compiler: unify the backend processing #550 (which contains additional history and information)

review is likely easiest by going over the separate commits first

is the documentation up-to-date?

is the documentation detailed enough (or too detailed)?

saem · 2023-07-05T03:49:35Z

/merge

zerbina added 30 commits June 27, 2023 20:50

containers: add the merge operation for Store

aeed702

sourcemaps: support appending

fcaf1bb

It's going to be needed for appending MIR fragments.

containers: extend

4de55b2

cgen: don't special case methods for IC

3c1784f

With the IC backend now only being a front-end for the new-style orchestrator, the special casing is unnecessary and can be removed.

cgen: clean up code generation logic for constants

2863883

The extra `BProc` parameter is unnecessary.

cgen: remove handling of lifted globals

df5faa6

This is all going to be implemented by the new unified backend processing.

cgen: remove preInitProc

8cf122f

jsgen: prepare for new style DCE

a35087c

jsgen: remove handling of lifted globals

4b0c971

jsgen: revise the interface

e023656

Revise the interface of the JS code generator in a similar fashion to to how it was done for the C code generator.

move genDestroy to modulelowering.nim

832761e

The procedure is only used there.

mirbridge: remove duplicated code

870db01

`canonicalize` unnecessarily re-implemented the echo logic.

remove the workarounds for lifted globals

47857d6

As a preparation for the handling of lifted globals becoming a part of the common backend processing, remove all the workarounds for lifted globals that are necessitated by the JS backend doing its own thing.

transf: introduce transformBodyWithCache

728f32f

modulelowering: small cleanup

e02fd54

backends: explicit preInit calls

329b30f

Don't leave handling the `preInit` calls to each orchestrator. Instead, emit the calls as part of `generateMain`, which also automatically takes care of eliding calls to empty `preInit` procedures.

remove globalDestructors

dd0ebd0

With no users left, the facility is now obsolete and thus removed.

ccgexprs: register the missing late dependency

2bd8126

The dependency on the enum-to-string procedure used for enum stringification wasn't communicated back to the orchestrator.

backends: treat exported constants as alive

a5b35ee

transf: don't cache .inline procedure

60a9a33

It's no longer necessary now that the `cgen` orchestrator manages the caching.

transf: fix the issue with re-defined globals

f5a942a

For nested globals, a copy is introduced (again), but the copy of the symbol is now eliminated right after the `injectdestructors` pass.

tests: update tests that now work

3f652c0

zerbina added 6 commits July 1, 2023 16:54

jsgen: clean-up leftovers

2a9bc79

The `generatedSyms` set is not used anymore.

jsbackend: restore the stack-trace workaround

852cef6

A regression local to this PR. Including the flag was accidentally removed when the `process` iterator was integrated.

Merge branch 'devel' into unify-the-backend-processing

867db36

tests: add a test for unused, exportc'ed constants

a1d6b01

jsbackend: consider non-magics

ebc8b33

backends: basic scanning for .dynlib expression

36cdb89

It's not pretty, but properly implementing it requires the `process` iterator to manage `.dynlib` procedures and globals (which is planned for a separate PR).

zerbina mentioned this pull request Jul 1, 2023

sempass2: disallow built-ins as procvars in tuples #783

Merged

zerbina added 3 commits July 2, 2023 21:33

transf: also extract pure globals

f551754

This restores the previous behaviour of top-level globals marked with `.global` being initialized in the module's pre-init procedure. The `once` template depends on this behaviour, but whether it's a good idea is a separate question...

Merge branch 'devel' into unify-the-backend-processing

0d32e12

zerbina changed the title ~~compiler: unify the backend processing~~ compiler: unify the pre-code-gen processing Jul 3, 2023

zerbina marked this pull request as ready for review July 3, 2023 19:28

saem reviewed Jul 3, 2023

View reviewed changes

tests/global/tlifted_global_in_loop_body.nim Outdated Show resolved Hide resolved

tests/global/tlifted_global_in_inline_iterator.nim Outdated Show resolved Hide resolved

tests/global/tlifted_global_in_inline_iterator.nim Outdated Show resolved Hide resolved

saem approved these changes Jul 4, 2023

View reviewed changes

compiler/backend/backends.nim Outdated Show resolved Hide resolved

mirbridge: make restoreGlobal an inner procedure

7d0c744

zerbina marked this pull request as draft July 4, 2023 18:12

typo and grammar fixes + doc improvements

c6f1476

Co-authored-by: Saem Ghani <saemghani+github@gmail.com>

zerbina marked this pull request as ready for review July 4, 2023 18:20

chore-runner bot added this pull request to the merge queue Jul 4, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 5, 2023

chore-runner bot added this pull request to the merge queue Jul 5, 2023

Merged via the queue into nim-works:devel with commit 659867f Jul 5, 2023

zerbina deleted the unify-the-backend-processing branch July 6, 2023 18:03

zerbina mentioned this pull request Jul 7, 2023

cgen: fix regression with .nodecl consts #785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: unify the pre-code-gen processing #777

compiler: unify the pre-code-gen processing #777

zerbina commented Jun 27, 2023 •

edited by github-actions bot

Loading

zerbina commented Jul 2, 2023

saem left a comment

saem left a comment

saem Jul 4, 2023

saem commented Jul 4, 2023

github-actions bot commented Jul 4, 2023

To-do

Notes for Reviewers

saem commented Jul 5, 2023

compiler: unify the pre-code-gen processing #777

compiler: unify the pre-code-gen processing #777

Conversation

zerbina commented Jun 27, 2023 • edited by github-actions bot Loading

Summary

Details

The process iterator

Late Dependencies

First-seen-in-module Tracking

cgen and cbackend

jsgen and jsbackend

vmbackend

transf and MIR processing

Duplicated Globals

Other

Misc

zerbina commented Jul 2, 2023

saem left a comment

Choose a reason for hiding this comment

saem left a comment

Choose a reason for hiding this comment

saem Jul 4, 2023

Choose a reason for hiding this comment

saem commented Jul 4, 2023

github-actions bot commented Jul 4, 2023

To-do

Notes for Reviewers

saem commented Jul 5, 2023

zerbina commented Jun 27, 2023 •

edited by github-actions bot

Loading

The `process` iterator

`cgen` and `cbackend`

`jsgen` and `jsbackend`

`vmbackend`

`transf` and MIR processing