internal: separate `TLoc` from `TSym` and `TType` #790

zerbina · 2023-07-08T21:46:24Z

Summary

Remove the loc fields from both TSym and TType and leave it to
the code generators how and if they use TLoc -- TSym only stores the
external name plus interface flags now. The benefits:

another point of coupling (through data-types) between semantic
analysis and code generation is removed
the size of an TType instance shrinks from 120 to 96 bytes (with a
64-bit target)
the size of an TSym instance shrinks from 168 to 160 bytes

Because of its lode field, TLoc not being defined in the ast_types
module is also an important prerequisite for introducing a code IR for
the code generators.

Details

The key changes:

move TLoc and the associated types only relevant to the C code
generator to cgendata
remove the loc field from TSym and TType -- TSym stores the
external name and interface flags via the new extname and locFlags
fields
as an interim solution, add the locId field to TSym
fix the name slot in the ast of dispatcher storing the wrong symbol
NDI file generation is now the responsibility of the orchestrator

`cgen`

For associating a PSym with a TLoc, multiple Stores shared across
all C modules are used. The TSym.locId is an index into the
SymbolMap (which is a Store underneath) for the respective symbol
kind (globals, constants, procedures, fields). While a single
SymbolMap could be used (for the non-procedure entities), the plan is
to use dedicated types for the entities in the future, and using
separate stores is a preparation for that.

In order to not re-mangle parameters every time when emitting a
prototype, the locs for parameters are stored with their procedure.

Since locals are known to not outlive their surrounding procedure, the
locs for them are stored in a dedicated SymbolMap with the procedure
context -- once a procedure context goes away, the locals' TLocs can
be, and are, freed already.

In general, the C code generator is now more strict with who is
responsible for filling-in loc information: for constants and globals,
it must only happen once during definition. Procedures and fields still
need to support ad-hoc loc setup, and fields need special-casing because
of .inline procedures and how finally clauses work (i.e., they're
duplicated at a too late time).

A table-based approach where the symbol IDs are mapped to TLocs would
have also worked, but measurements showed that it was ~1.4% slower.

NDI files

Instead of directly updating the NDI file with a new entry when
generating a mangled name, as was done previously, adding the entries is
now the responsibility of the orchestrator: entries for whole-program
entities are added after code generation has finished -- entries for
locals once code generation for a procedure has finished.

For NDI file generation to work, the symbol. This was already the case
for all entities except constants, so constants now also store a nkSym
node in their lode field.

`jsgen`

Due to it being less complex than the C code generator, a table-based
approach is okay for the JavaScript code generator. Only the JavaScript
name is relevant to the code generator, so it maps the symbol IDs
directly to strings and doesn't use TLoc. Separate tables are used
for global entities (globals, constants, procedures, and fields),
locals, and parameters.

Store all locs associated with symbols in lookup tables, with `TSym.loc` only being queried (but never written) for the external name or user- provided flags. In order to efficiently associate a `TSym` with a code-generator `TLoc`, each `TSym` stores a one-based index (`locId`) into one of the code generator's lookup tables. A table-based approach would have too, but it's slightly slower (~1.4% when building the compiler itselfs). In general, `cgen` becomes more strict with locs, now requiring globals, constants, and parameters to have their loc set-up once at definition processing time. Procedures, fields, and locals still need to support ad-hoc loc setup. Other changes: - writing the mangled names to NDI files is temporarily disabled - in symbol definitions contexts, `loc.t` is replaced with the querying the symbol's type directly (both are equivalent) - locs for constants now store a symbol node in their `lode` field

Use a `Table` to map symbols to the corresponding JavaScript name, which removes all modifications of `TSym.loc` from `jsgen`. Due the lesser complexity of the JavaScript code generator compared to the C code generator, a table-based approach works well enough.

`TType` doesn't need a `TLoc` field at all, and `TSym` only needs to store the external name (`extname`) plus the interface flags (`locFlags`). The `TLoc`, `TLocKind`, and `TStorageLoc` types are moved to `cgendata`, with `TLocFlag` staying in the `ast` module for now.

The orchestrator is now responsible for writing the mangled names to the NDI files.

This is made possible by them no longer requiring access to an NDI file.

It stored in the name slot the symbol of the method that the dispatcher was derived from, which now causes problems. With the fix, for all routines reaching the backend, `sym.ast[namePos] == sym` *should* be true.

saem

Looking forward to having this merged in, further breaking dependencies like this is great.

Few minor items, they're mostly typographical and non-blocking.

saem · 2023-07-08T21:58:27Z

compiler/ast/ast.nim

@@ -355,7 +349,8 @@ proc copySym*(s: PSym; id: ItemId): PSym =
  result.magic = s.magic
  result.options = s.options
  result.position = s.position
-  result.loc = s.loc
+  result.extname = s.extname
+  result.locFlags = s.locFlags


too big a change to name it extFlags?

a soft suggestion as a fair bit of this is under heavy transition.

saem · 2023-07-08T22:00:08Z

compiler/ast/ast_types.nim

  TLocFlag* = enum
+    # XXX: `TLocFlag` conflates two things:


ah, even more reason to hold off on extFlags until these have been separated.

Yep, I'm going to split the enum up immediately after this PR.

saem · 2023-07-08T22:02:43Z

compiler/ast/ast_types.nim

+                              ## generated name is to be used
+    locFlags*: TLocFlags      ## additional flags that are relevant to code
+                              ## generation
+    locId*: uint32            ## associates the symbol with a loc in the C code


If two symbols can't be associated with the same location, could we piggy back off of itemId?

(I'm guessing this will result in a 'no' once I've read further in the PR)

It would mean we'd have to store something for each symbol (more memory), but that might balance out as we're storing an extra 4 bytes with the current change. The lack of loc information would have to be tracked on the loc data end, which does mean a deref/index-lookup prior to knowing it doesn't exist.

revisiting this comment, I'm pretty sure this wouldn't be a good approach at present, better to defer it as the necessary data and consequent layout are likely to change dramatically.

saem · 2023-07-08T22:11:02Z

compiler/backend/cbackend.nim

@@ -93,6 +94,14 @@ type

 func hash(x: PSym): int = hash(x.id)

+proc writeMangledLocals(p: BProc) =
+  ## Writes the mangled names of `p`'s locals to the module's NDI file.


this is a nice secondary benefit developing over time, as the mangling becomes more and more consistent it's going to make gdb pretty printers' life all that much easier.

compiler/backend/ccgstmts.nim

compiler/backend/ccgtypes.nim

compiler/backend/jsgen.nim

Co-authored-by: Saem Ghani <saemghani+github@gmail.com>

saem · 2023-07-09T19:20:09Z

/merge

github-actions · 2023-07-09T19:20:34Z

Merge requested by: @saem

Contents after the first section break of the PR description has been removed and preserved below:

Notes for Reviewers

blocks compiler: introduce an IR for the code generators #551

zerbina added 7 commits July 8, 2023 20:20

cbackend: restore NDI generation

397e5af

The orchestrator is now responsible for writing the mangled names to the NDI files.

cgen: less broad interface for the mangle procedures

473b194

This is made possible by them no longer requiring access to an NDI file.

cgmeth: fix the dispatcher AST

8af6d34

It stored in the name slot the symbol of the method that the dispatcher was derived from, which now causes problems. With the fix, for all routines reaching the backend, `sym.ast[namePos] == sym` *should* be true.

make the tcan_compile_nim test work

58d906d

zerbina added refactor Implementation refactor compiler/sem Related to semantic-analysis system of the compiler compiler/backend Related to backend system of the compiler labels Jul 8, 2023

zerbina requested a review from saem July 8, 2023 21:46

saem approved these changes Jul 9, 2023

View reviewed changes

apply suggestions from code review

356ee71

Co-authored-by: Saem Ghani <saemghani+github@gmail.com>

chore-runner bot added this pull request to the merge queue Jul 9, 2023

Merged via the queue into nim-works:devel with commit bb4225d Jul 9, 2023

zerbina deleted the split-loc-from-symbol branch July 9, 2023 21:42

zerbina mentioned this pull request Jul 10, 2023

internal: split up TLocFlag #792

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal: separate `TLoc` from `TSym` and `TType` #790

internal: separate `TLoc` from `TSym` and `TType` #790

zerbina commented Jul 8, 2023 •

edited by github-actions bot

Loading

saem left a comment

saem Jul 8, 2023

saem Jul 8, 2023

zerbina Jul 9, 2023

saem Jul 8, 2023

saem Jul 9, 2023

saem Jul 8, 2023

saem commented Jul 9, 2023

github-actions bot commented Jul 9, 2023

Notes for Reviewers

internal: separate TLoc from TSym and TType #790

internal: separate TLoc from TSym and TType #790

Conversation

zerbina commented Jul 8, 2023 • edited by github-actions bot Loading

Summary

Details

cgen

NDI files

jsgen

saem left a comment

Choose a reason for hiding this comment

saem Jul 8, 2023

Choose a reason for hiding this comment

saem Jul 8, 2023

Choose a reason for hiding this comment

zerbina Jul 9, 2023

Choose a reason for hiding this comment

saem Jul 8, 2023

Choose a reason for hiding this comment

saem Jul 9, 2023

Choose a reason for hiding this comment

saem Jul 8, 2023

Choose a reason for hiding this comment

saem commented Jul 9, 2023

github-actions bot commented Jul 9, 2023

Notes for Reviewers

internal: separate `TLoc` from `TSym` and `TType` #790

internal: separate `TLoc` from `TSym` and `TType` #790

zerbina commented Jul 8, 2023 •

edited by github-actions bot

Loading

`cgen`

`jsgen`