-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: unify the pre-code-gen processing #777
compiler: unify the pre-code-gen processing #777
Conversation
It's going to be needed for appending MIR fragments.
Don't conflate "choose nimvm branch" with "in compile-time context". There's now a dedicated `GenOption` for communicating whether the input code is meant to be run at compile-time. This fixes `.compileTime` globals being treated as normal globals when using the VM backend.
With the IC backend now only being a front-end for the new-style orchestrator, the special casing is unnecessary and can be removed.
The extra `BProc` parameter is unnecessary.
The definition is emitted into a section that unrelated to the init procedure (and instead relevant to the data-init procedure), so it makes more sense to emit the code in `genDatInitCode`. In addition, the already unused handling for temporary `TNimType` storage is, along with all related state, removed.
Don't generate code for a procedure or constant on its first usage. Instead, only emit a prototype and setup the `TLoc` -- generating the code for the procedures and constants is going to be the responsibility of the orchestrator. Also remove the "is declared" guard for definitions of globals; this prevents erroneous re-definitions from going unnoticed (they will cause a C compiler error instead).
This is all going to be implemented by the new unified backend processing.
Adjust the public interface of the code generator to support the soon- to-be needs of the orchestrator. `genProcAux` is split up into `startProc` and `finishProc`, with `finishProc` returning the generated code instead of emitting it into a C module section. The `genProc` procedure is kept as a convenience wrapper around both, and expects . The init procedure is, for the most part, finally going to be treated as any other procedure, so all the dedicated code generation logic (`genInitCode`, `genInitProc`) is removed. The leftover handling of top-level is also going to be obsolete, and, as a preparation, is thus removed. Together, this removes all direct interactions with the MIR and `transf` from `cgen`.
Revise the interface of the JS code generator in a similar fashion to to how it was done for the C code generator.
The procedure is only used there.
`canonicalize` unnecessarily re-implemented the echo logic.
As a preparation for the handling of lifted globals becoming a part of the common backend processing, remove all the workarounds for lifted globals that are necessitated by the JS backend doing its own thing.
Don't leave handling the `preInit` calls to each orchestrator. Instead, emit the calls as part of `generateMain`, which also automatically takes care of eliding calls to empty `preInit` procedures.
Instead of having to modify the module destructor (which would be a problem for the unified backend processing), the "post-destructor" works similar to the "pre-init" op (which is the inverse of the former). A call to it is inserted after the call to the normal module destructor.
With no users left, the facility is now obsolete and thus removed.
Generalize (with some extensions) the processing logic from `vmbackend` into an inline iterator (`process`), and use this iterator to replace the dead-code elimination and handling of lifted globals previously implemented by `cgen` and `jsgen`. Other changes: - don't introduce new globals (by way of copying symbols) in `transf` - remove support for globals from `ccgstmts.genSingleVar` - export procedures from `cgen` that are now needed by the orchestrator - update various documentation comments - add tests for the now fixed bugs regarding lifted global
The dependency on the enum-to-string procedure used for enum stringification wasn't communicated back to the orchestrator.
A module's 'init' procedure is now treated the same as any other procedure, with no special assembling by the code generator required. The last remaining user of the `initProc` field is the `.dynlib` procedure handling, which is changed to emit manual calls into the dynlib-init section (which is part of the data-init procedure) instead. In the future, generating the code for loading `.dynlib` procedures is going to happen as part of the `process` iterator.
It's no longer necessary now that the `cgen` orchestrator manages the caching.
The init procedure (i.e., module-level code) still needs to be special- cased in some situations: - module-level `emit` and `asm` should keep working for now - a call to `nimTestErrorFlag` needs to be inserted prior to the return Similar to what was done for `jsgen`, the init procedure is marked with a new internal symbol flag (`sfTopLevel`), which `cgen` then checks for.
For nested globals, a copy is introduced (again), but the copy of the symbol is now eliminated right after the `injectdestructors` pass.
The `generatedSyms` set is not used anymore.
A regression local to this PR. Including the flag was accidentally removed when the `process` iterator was integrated.
It's not pretty, but properly implementing it requires the `process` iterator to manage `.dynlib` procedures and globals (which is planned for a separate PR).
It's almost there, only a fix for the regression with |
This restores the previous behaviour of top-level globals marked with `.global` being initialized in the module's pre-init procedure. The `once` template depends on this behaviour, but whether it's a good idea is a separate question...
Revert "transf: also extract pure globals" and implement the attempted fix properly. Now, module-level `.global`s are neither placed into the module struct nor do they keep the module as the owner, making them appear as a lifted global to `extractGlobals` (without requiring any changes to the latter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review: did a first pass just on the tests themselves.
Now going over the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically done reviewing, I wanted to give backends
another pass mostly for improvements like docs, simplifications, etc... nothing major/blocking.
I'm good with it merging before hand.
## procedures that are queued for code-generation | ||
config: BackendConfig | ||
|
||
Queue*[T] = object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's like a baby kafka. 👶🏽
Co-authored-by: Saem Ghani <saemghani+github@gmail.com>
/merge |
Merge requested by: @saem Contents after the first section break of the PR description has been removed and preserved below:
|
/merge |
Summary
Introduce a unified backend processing pipeline (the
process
iterator)based on the implementation of the
vmbackend
module, and integrate itinto all three code generation orchestrators.
Discovery of alive entities (which automatically includes dead-code
elimination), applying transformations, and various other backend-
agnostic processing are now all implemented via a single, shared
implementation, making them much easier to maintain and adjust.
As a side-effect of the reworked lifted global (globals defined in
procedure via the
.global
pragma) handling, multiple issues and bugswith them are fixed:
backends
lifted globals now, meaning that they are initialized during the
attached-to module's pre-init phase, instead of when control-flow
first reaches the
var|let
statementalways refer to the same location now, regardless of whether the used
inline iterator contains multiple
yield
statement.inline
iteratoralways refers to the exact same location across all usages of the
iterator (each inlined occurrence got its own copy, previously)
considered by the move analyser and destructor injection. For example,
the
s
invar g {.global.} = (block: (var s = @[...]; 1))
isnow destroyed
In addition:
exportc
are now always included in thegenerated code, even when not explicitly used
--header
not including the prototypes ofexportc'ed globals is fixed
.compileTime
globalsanymore
.dynlib
procedure loading logic is run during the data-initphase of a module, instead of as part of the module-init procedure
Details
The key changes are:
anymore
of a recursive one for the C and JS backends (the VM backend already
did)
responsibility of the orchestrators
modulelowering
is no longermodified in the backend
module-bound operator
emitted into the main procedure
generateMethodDispatcher
.global
are consistently treated as liftedglobals (even top-level locations) across the backends
In addition, the module-init procedure is now named the same as the
module it belongs to and longer has an "Init" postfix. This gets around
having to introduce a special case for the init procedure in the C code
generator, now that it treats the module-init procedure like a normal
procedure.
The
process
iteratorAt the center of the additions is the
process
inline iterator (whichrepresents the unified processing pipeline). It implements the discovery
of alive entities, preparing the procedures for code generation (by
invoking
transf
and applying the MIR passes), generating methoddispatchers, and lifting the pre-init and post-destructor operators.
Whenever interaction with the caller is required, the iterator produces
(yields) an event that contains the information necessary for the caller
to act on it. The caller (the orchestrators) then, if necessary, pre-
processes the newly discovered entities and, if the event is about code
becoming available, invokes the code generator. Newly discovered
entities are communicated through the
DiscoverData
structure that boththe iterator and caller have access to.
The iterator's implementation and design is complicated by two things:
late dependencies and (to a lesser degree) first-seen-in-module
tracking.
Late Dependencies
A late dependency of a procedure (or code fragment in general) is a
reference to some procedure that is not visible by scanning the MIR
representation. They occur when a code generator emits a call to a
.compilerproc
. The general goal is to remove raising late dependenciesfrom
cgen
andjsgen
, but until then, theprocess
iterator has tosupport the case where the caller registers new procedures with
DiscoverData
.First-seen-in-module Tracking
For the inline procedure handling of the C backend to work, it needs to
know about at least one module an inline procedure is used in. The
first-seen-in-module is used for this, but the unified processing
pipeline has to track it.
cgen
andcbackend
The key changes / important things to note are:
orchestrator
instead of once for each module they're used in
--header
option) is fullymanaged by the orchestrator
globals is removed from the code generator
.dynlib
procedure handling is still mostly handled by the codegenerator
largest part, treated as normal procedures now
generator
Other changes and internal restructurings:
sfTopLevel
flag and use it to: emit animTestErrorFlag
call at the exit of a flagged procedure,special casing of
emit
andasm
statementsgenProcAux
intostartProc
andfinishProc
TNimNode
storage intogenDatInitCode
and removegenInitCode
dce.nim
, as thecode generator is not responsible anymore for deciding which
procedures need to be processed
Making sure that a duplicate of an inline procedure is emitted into each
C file the procedure is used in doesn't happen automatically anymore and
thus requires special support by the orchestrator.
jsgen
andjsbackend
genProc
intostartProc
andfinishProc
process
iteratorjsgen
vmbackend
process
iteratorConstants now use the sequence provided by
DiscoverData
, removing ausage of the
vmdef.LinkState
type fromvmbackend
.transf
and MIR processinggoIsCompileTime
option formirgen
, so thatdefinitions of
.compileTime
globals can be properly ignored whenusing the VM backend
mirbridge
and
injectdestructors
modules.inline
procedures is no longerunconditionally cached
With the last bits of the new-style DCE (the one implemented by the
dce
module) support removed fromcgen
, two MIR simplification arepossible:
nkConstSection
are discarded bymirgen
def
for a procedure doesn't carry the procdef anymoreDuplicated Globals
There was a problem regarding globals in
transf
. When introducingfresh symbols in the context of iterator inlining, globals were also
copied. While that did work previously, it no longer can, as the whole
backend processing now expects all module-level globals to be known
prior to
transf
.Keeping the original symbol is possible for to-be-lifted globals
(duplicate definitions of those are filtered out by
produceFragmentsForGlobals
), but not for module-level globals, as theinjectdestructors
pass requires the latter to have only a singledefinition. Therefore, module-level globals are still copied, but
they're now associated with the original symbol, and the
rewriteGlobalDefs
procedure uses this information to restore theoriginal symbols after the
injectdestructors
has run.This also fixes the pre-existing bug where the name of a global defined
inside a for-loop body referred to different locations when the for-
loop body was duplicated (due to inline iterators with multiple
yield
statements).Other
them to the module's init procedure -- this makes them appear as
lifted globals to
transf.extractGlobals
(what to do with them waspreviously up to the used code generator)
ModuleGraph.globalDestructors
list and usages thereof;the unified backend processing renders it obsolete
Misc
merge
operation for theStore
container typeappend
operation forSourceMap
SeqMap
andStore
types