Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
codegen: don't use
PSym
for locals (#914)
## Summary Use a dedicated intermediate representation for local `var`/`let`, parameters, and the result variable in the code generator IR. This is the first step towards removing all `PSym` usage from the backend/code-generators. The goal is to further decouple `sem` from the code generators, making it possible evolve both (especially their IRs) more independently from each other. ## Details ### General design * `cgir.Local` stores all relevant information about a local * the information about all locals is owned by and stored as part of `Body` * a local is referenced from the `CgNode` IR via the `cnkLocal` node, which stores a `LocalId` value * the result variable and parameters are also treated as locals. In most cases, they don't require any extra handling, so them having their own data types and node kinds would only complicate things * for simplicity, for the `Body` of a procedure, the first local slot is always reserved for the result variable, even if the procedure has no return type. At least for now, this approach makes it easier to ad-hoc compute the IDs for parameters from just their position, which is still needed in some cases In terms of data representation, a `Local` stores the type, alignment, flags (currently still `TSymFlags`), and the name (which may be unset). Unlike the MIR, no special distinction is made between temporaries and user-defined locals. In addition, `Local` also stores whether the corresponding user-defined local is immutable (the `isImmutable` field). This additional information is meant to help the code generators optimize: for example, the JavaScript code generator doesn't use indirection for `let` locals, even if they have their address taken. For extending the data about locals, each code generator uses an `OrdinalSeq[LocalId, T]` where `T` is a data type tailored to the needs of the respective code generator. This data-oriented design allows for easy extension without having to modify the base IR itself. ### Overview of the changes All places where `cnkSym` nodes were previously handled are adjusted to also handle `cnkLocal` nodes. This usually means moving the local/ parameter/result specific handling to a new `cnkLocal` branch. Merging two bodies (`cgir.merge`) also has to take locals into account: the list with the locals from the source body is appended to that of the destination body, and the `cnkLocal` node in the merged source body are updated. The code generators (`cgen` and `jsgen`) that support incremental code- generation for procedures need to synchronize their `OrdinalSeq` with the body's `Store`, after a partial procedure was updated. For this purpose, the relevant code generators are extended with the `genPartial` procedure. ### Changes to `CgNode` rendering * in order to properly render locals, `cgirutils.render` now requires a `Body` instead of only a `CgNode` * the name disambiguation logic is generalized to support disambiguating the names of `PSym`- and `Local`-based symbols (through use of the new internal `Symbol` type) In the output, compiler-inserted locals are now rendered as `:aux_$LocalId` (auxiliary local) instead of `:tmp_$id`. The `--expandArc`-using tests are adjusted accordingly. ### Initialization of `Local`s The setup of the `Local` data for a `Body` happens in `cgirgen`: * locals reach `cgirgen` as `PSym`s, meaning that they need to be translated. When appearing in 'def's, the symbol of a local is translated to a `Local`, a `LocalId` is allocated, and a mapping between the `PSym` and `LocalId` is established (via the `localsMap`) * for procedures, `generateIR` is responsible for setting up the `Local`s representing the result variable and parameters * special handling is required for `except T as e`, where `T` is an imported exception type: these also act as definitions of new locals Creating or translating temporaries no longer requires creating new `PSym`s, which also means that there's one less source of `IdGenerator` interaction in `cgirgen`. The `IdGenerator`s should ideally be sealed at the end of semantic analysis, and this is a step towardst this goal. ### C code-generator changes * instead of in a `SymbolMap`, the `TLoc` for locals is stored in an `OrdinalSeq` * the special handling to account for definitions of locals being visited multiple times for `.inline` procedures becomes obsolete and is removed * `cnkLocal` nodes cannot appear in construction expressions, and they thus don't need special handling in `hashTree` * testing whether a local location is uninitialized no longer relies on `TSym.locId`; inspect the `TLoc.k` value is enough (`locEmpty` signals an empty/uninitialized location) * for detecting parameters, `reifiedOpenArray` now requires access to a `BProc` instance * for detecting parameters, the `mapTypeChooser` template accepting a `CgNode` also requires access to a `BProc` instance * the dispatching based on the specific symbol kind (`skParam`, `skTemp`, `skResult`, etc.) becomes obsolete and is removed * the parameter and result variable setup in `cgen.startProc` is adjusted * the separate handling of parameters when writing NDI files is removed; parameters are part of the locals now * checking for `skTemp` symbols when writing NDI files is also not necessary anymore: it is replaced with checking whether the local has both a user-provided *and* mangled name. Name mangling for locals with user-provided names stays the same, but for locals without user-defined names, an underscore (`_`) plus the stringified integer value of the `LocalId` is now used. Within a single procedure, this simple mangling guarantees unique identifiers. Because parameters cannot easily be distinguished from other locals, `compat.isLvalue` now always treats them as l-values. This only affects the code generated (without impacting observable behaviour) for object up- and down-conversion. ### JS code-generator changes * the new `Loc` type represents the code-generator's state for a local. It is made up of the mangled name and the location's storage description * in order to free up the name, the `locals` field in `TProc` is renamed to `defs` * `Loc` is intended to be eventually used for globals and constants too * computing the storage mode once (`StorageFlags`) and then storing the result in `Loc` is simpler than storing the symbol flags and kind and recomputing the storage mode on each query * `Loc`s for locals are initialized (via `setupLocalLoc`) when a *definition* statement is encountered: these are currently 'def' nodes and 'except' branches for imported exceptions * since `storage` relies on this information, a symbol kind has to be passed to `setupLocalLoc` * `genSymAddr` is restructured into the generic `addrLoc`, which works with both `Loc` and `PSym` * the core part of `genSym` is extracted into the generic `accessLoc`, which works with both `Loc` and `PSym` * the interface of `genVarInit` is adjusted to work for both `Loc` and `PSym` Once `Loc` is used for all location in the JS code generator, the compatibility and adapter routines can be removed. ### VM code-generator changes * the new `LocalLoc` type represents the code-generator's state for a local. The type replaces the symbol-id-to-register table (`locals`) and the `addressTaken` set * parameter setup (`genParams`) becomes simpler: no special handling for the result variable and environment parameter is needed anymore * `genSymAsgn` is split into `genAsgnToLocal` and `genAsgnToGlobal` * to reduce the amount of required changes, `genSym` and `genSymAddr` still support locals for now. This should be cleaned up separately
- Loading branch information