-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codegen: don't use PSym
for locals
#914
codegen: don't use PSym
for locals
#914
Conversation
This frees up the name `locals` to be used for the storage of local- related data.
`Local` represents static information about a local variable. `Body` stores a list of them, and `cnkLocal` references an element in the list via `LocalId`.
* `treeRepr` now supports `cnkLocal` * to be able to render references-to-local, `render` now requires a `Body` as input * generalize the existing symbol rendering to also work for non-`PSym` symbols
* locals reach `cgirgen` as `PSym`s, meaning that they need to be translated. When appearing in 'def's, the symbol of a local is translated to a `Local`, a `LocalId` is allocated, and a mapping between the `PSym` and `LocalId` is established (via the `localsMap`) * for procedures, `generateIR` is responsible for setting up the `Local`s representing the result variable and parameters * special handling is required for `except T as e`, where `T` is an imported exception type: these also act as definitions of new locals Creating or translating temporaries no longer requires creating new `PSym`s, which also means that there's one less source of `IdGenerator` interaction.
Because parameters cannot easily be distinguished from other locals, `compat.isLvalue` always treats them l-values. This only affects the code generated (without impacting observable behaviour) for object up- and down-conversion when using the C backend.
The procedure implements synchronizing the number of elements between an `OrdinalSeq` and a `Store`, which is something that the code generators are going to need to do.
* instead of having to use a symbol-id-to-register table (`locals`) plus an additional set (`addressTaken`) for marking locals as requiring an indirection, a single `OrdinalSeq` indexed by the `LocalId` is sufficient now. The `OrdinalSeq` stores the current state (`LocalLoc`) for each local in the processed fragment * parameter setup (`genParams`) becomes simpler: no special handling for the result variable and environment parameter is needed anymore * `genSymAsgn` is split into `genAsgnToLocal` and `genAsgnToGlobal` * to reduce the amount of required changes, `genSym` and `genSymAddr` still support locals for now. This should be cleaned up separately
* the new `Loc` type represents the code-generator state of a local. It is made up of the mangled name and the storage description * `Loc` is intended to be eventually used for globals and constants too * computing the storage mode once (`StorageFlags`) and then storing the it in `Loc` is simpler than storing the symbol flags and recomputing the storage mode on each query * `Loc`s for locals are initialized (via `setupLocalLoc`) when a *definition* statement is encountered: these are currently 'def' nodes and 'except' branches for imported exceptions * since `storage` relies on this information, a symbol kind has to be passed to `setupLocalLoc` * `genSymAddr` is restructured into the generic `addrLoc` that works with both `Loc` and `PSym` * the core part of `genSym` is extracted into the generic `accessLoc` that works with both `Loc` and `PSym` * the interface of `genVarInit` is adjusted to work for both `Loc` and `PSym` Once `Loc` is used for all location in the JS code generator, the compatibility and adapter routines can be removed.
* instead of in a `SymbolMap`, the `TLoc` for locals is stored in an `OrdinalSeq` * the special handling to account for locals being defined multiple times for `.inline` procedures becomes obsolete and is removed * `cnkLocal` nodes cannot appear in the construction expression, and they thus don't need special handling in `hashTree` * testing whether a local location is uninitialized no longer relies on `TSym.locId`, but can just inspect the `TLoc.k` * for detecting parameters, `reifiedOpenArray` now requires access to a `BProc` instance * for detecting parameters, the `mapTypeChooser` template accepting a `CgNode` also requires access to a `BProc` instance * the dispatching based on the specific symbol kind (`skParam`, `skTemp`, `skResult`, etc.) becomes obsolete and is removed * the parameter and result variable setup in `cgen` is adjusted * the separate handling of parameters when writing NDI files is removed; parameters are part of the locals now * checking for `skTemp` symbols when writing NDI files is also not necessary anymore: it is replaced with checking whether the local has both a user-provided *and* mangled name.
When incrementally generating the target-language code for a procedure, the respective code generator's `OrdinalSeq` storing the state of the locals needs to be synchronized with the data from `Body`. The new `genPartial` procedure added to both the C and JS code generators takes care of this; the VM code generator doesn't use incremental code generation for procedures and thus doesn't need a `genPartial` procedure.
Temporaries are now rendered differently.
i would prefer i like the idea of |
I'm not set on using the For better readability, only using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor items that you might want to change; otherwise good to go. Decoupling is going to be a big win.
Admittedly, I did struggle thinking through the broader logical changes, but that's 100% my struggle with focusing these past few days.
@@ -166,6 +166,12 @@ func newSeq*[I; T](x: var OrdinalSeq[I, T], len: int) {.inline.} = | |||
func setLen*[I; T](x: var OrdinalSeq[I, T], len: int) {.inline.} = | |||
setLen(base(x), len) | |||
|
|||
func synchronize*[I; A; B](x: var OrdinalSeq[I, A], s: Store[I, B]) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func synchronize*[I; A; B](x: var OrdinalSeq[I, A], s: Store[I, B]) = | |
func rightSize*[I; A; B](x: var OrdinalSeq[I, A], s: Store[I, B]) = |
I think this is slightly more indicative of the change in size and its destructive nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe I'm misunderstanding, but I don't think it's a destructive operation. s
must be larger than or equal in size to x
, so no existing elements are destroyed/cut off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jeez, i seriously misread it.
rightSize
might still fit, but up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll stay with synchronize
for now. To me, rightSize
sounds like a query rather than a modification.
what about |
I like it! |
Only use a single return for the `cnkLocal` case.
/merge Thank you for the review, @seam, and thank you for the |
Merge requested by: @zerbina Contents after the first section break of the PR description has been removed and preserved below:
|
Summary
Use a dedicated intermediate representation for local
var
/let
,parameters, and the result variable in the code generator IR.
This is the first step towards removing all
PSym
usage from thebackend/code-generators. The goal is to further decouple
sem
from thecode generators, making it possible evolve both (especially their IRs)
more independently from each other.
Details
General design
cgir.Local
stores all relevant information about a localBody
CgNode
IR via thecnkLocal
node,which stores a
LocalId
valueIn most cases, they don't require any extra handling, so them having
their own data types and node kinds would only complicate things
Body
of a procedure, the first local slot isalways reserved for the result variable, even if the procedure has no
return type. At least for now, this approach makes it easier to
ad-hoc compute the IDs for parameters from just their position, which
is still needed in some cases
In terms of data representation, a
Local
stores the type, alignment,flags (currently still
TSymFlags
), and the name (which may be unset).Unlike the MIR, no special distinction is made between temporaries and
user-defined locals.
In addition,
Local
also stores whether the corresponding user-definedlocal is immutable (the
isImmutable
field). This additionalinformation is meant to help the code generators optimize: for example,
the JavaScript code generator doesn't use indirection for
let
locals,even if they have their address taken.
For extending the data about locals, each code generator uses an
OrdinalSeq[LocalId, T]
whereT
is a data type tailored to the needsof the respective code generator. This data-oriented design allows for
easy extension without having to modify the base IR itself.
Overview of the changes
All places where
cnkSym
nodes were previously handled are adjusted toalso handle
cnkLocal
nodes. This usually means moving the local/parameter/result specific handling to a new
cnkLocal
branch.Merging two bodies (
cgir.merge
) also has to take locals into account:the list with the locals from the source body is appended to that of the
destination body, and the
cnkLocal
node in the merged source body areupdated.
The code generators (
cgen
andjsgen
) that support incremental code-generation for procedures need to synchronize their
OrdinalSeq
withthe body's
Store
, after a partial procedure was updated. For thispurpose, the relevant code generators are extended with the
genPartial
procedure.
Changes to
CgNode
renderingcgirutils.render
now requires aBody
instead of only aCgNode
the names of
PSym
- andLocal
-based symbols (through use of the newinternal
Symbol
type)In the output, compiler-inserted locals are now rendered as
:aux_$LocalId
(auxiliary local) instead of:tmp_$id
. The--expandArc
-using tests are adjusted accordingly.Initialization of
Local
sThe setup of the
Local
data for aBody
happens incgirgen
:cgirgen
asPSym
s, meaning that they need to betranslated. When appearing in 'def's, the symbol of a local is
translated to a
Local
, aLocalId
is allocated, and a mappingbetween the
PSym
andLocalId
is established (via thelocalsMap
)generateIR
is responsible for setting up theLocal
s representing the result variable and parametersexcept T as e
, whereT
is animported exception type: these also act as definitions of new locals
Creating or translating temporaries no longer requires creating new
PSym
s, which also means that there's one less source ofIdGenerator
interaction in
cgirgen
. TheIdGenerator
s should ideally be sealed atthe end of semantic analysis, and this is a step towardst this goal.
C code-generator changes
SymbolMap
, theTLoc
for locals is stored inan
OrdinalSeq
visited multiple times for
.inline
procedures becomes obsolete andis removed
cnkLocal
nodes cannot appear in construction expressions, andthey thus don't need special handling in
hashTree
TSym.locId
; inspect theTLoc.k
value is enough (locEmpty
signals an empty/uninitialized location)
reifiedOpenArray
now requires access toa
BProc
instancemapTypeChooser
template accepting aCgNode
also requires access to aBProc
instanceskParam
,skTemp
,skResult
, etc.) becomes obsolete and is removedcgen.startProc
isadjusted
removed; parameters are part of the locals now
skTemp
symbols when writing NDI files is also notnecessary anymore: it is replaced with checking whether the local has
both a user-provided and mangled name.
Name mangling for locals with user-provided names stays the same, but
for locals without user-defined names, an underscore (
_
) plus thestringified integer value of the
LocalId
is now used. Within asingle procedure, this simple mangling guarantees unique identifiers.
Because parameters cannot easily be distinguished from other locals,
compat.isLvalue
now always treats them as l-values. This only affectsthe code generated (without impacting observable behaviour) for object
up- and down-conversion.
JS code-generator changes
Loc
type represents the code-generator's state for a local.It is made up of the mangled name and the location's storage
description
locals
field inTProc
is renamedto
defs
Loc
is intended to be eventually used for globals and constants tooStorageFlags
) and then storing theresult in
Loc
is simpler than storing the symbol flags and kind andrecomputing the storage mode on each query
Loc
s for locals are initialized (viasetupLocalLoc
) when adefinition statement is encountered: these are currently 'def' nodes
and 'except' branches for imported exceptions
storage
relies on this information, a symbol kind has to bepassed to
setupLocalLoc
genSymAddr
is restructured into the genericaddrLoc
, which workswith both
Loc
andPSym
genSym
is extracted into the genericaccessLoc
,which works with both
Loc
andPSym
genVarInit
is adjusted to work for bothLoc
andPSym
Once
Loc
is used for all location in the JS code generator, thecompatibility and adapter routines can be removed.
VM code-generator changes
LocalLoc
type represents the code-generator's state for alocal. The type replaces the symbol-id-to-register table (
locals
)and the
addressTaken
setgenParams
) becomes simpler: no special handlingfor the result variable and environment parameter is needed anymore
genSymAsgn
is split intogenAsgnToLocal
andgenAsgnToGlobal
genSym
andgenSymAddr
still support locals for now. This should be cleaned up separately