-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal: separate TLoc
from TSym
and TType
#790
Conversation
Store all locs associated with symbols in lookup tables, with `TSym.loc` only being queried (but never written) for the external name or user- provided flags. In order to efficiently associate a `TSym` with a code-generator `TLoc`, each `TSym` stores a one-based index (`locId`) into one of the code generator's lookup tables. A table-based approach would have too, but it's slightly slower (~1.4% when building the compiler itselfs). In general, `cgen` becomes more strict with locs, now requiring globals, constants, and parameters to have their loc set-up once at definition processing time. Procedures, fields, and locals still need to support ad-hoc loc setup. Other changes: - writing the mangled names to NDI files is temporarily disabled - in symbol definitions contexts, `loc.t` is replaced with the querying the symbol's type directly (both are equivalent) - locs for constants now store a symbol node in their `lode` field
Use a `Table` to map symbols to the corresponding JavaScript name, which removes all modifications of `TSym.loc` from `jsgen`. Due the lesser complexity of the JavaScript code generator compared to the C code generator, a table-based approach works well enough.
`TType` doesn't need a `TLoc` field at all, and `TSym` only needs to store the external name (`extname`) plus the interface flags (`locFlags`). The `TLoc`, `TLocKind`, and `TStorageLoc` types are moved to `cgendata`, with `TLocFlag` staying in the `ast` module for now.
The orchestrator is now responsible for writing the mangled names to the NDI files.
This is made possible by them no longer requiring access to an NDI file.
It stored in the name slot the symbol of the method that the dispatcher was derived from, which now causes problems. With the fix, for all routines reaching the backend, `sym.ast[namePos] == sym` *should* be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to having this merged in, further breaking dependencies like this is great.
Few minor items, they're mostly typographical and non-blocking.
@@ -355,7 +349,8 @@ proc copySym*(s: PSym; id: ItemId): PSym = | |||
result.magic = s.magic | |||
result.options = s.options | |||
result.position = s.position | |||
result.loc = s.loc | |||
result.extname = s.extname | |||
result.locFlags = s.locFlags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too big a change to name it extFlags
?
a soft suggestion as a fair bit of this is under heavy transition.
TLocFlag* = enum | ||
# XXX: `TLocFlag` conflates two things: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, even more reason to hold off on extFlags
until these have been separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm going to split the enum up immediately after this PR.
## generated name is to be used | ||
locFlags*: TLocFlags ## additional flags that are relevant to code | ||
## generation | ||
locId*: uint32 ## associates the symbol with a loc in the C code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two symbols can't be associated with the same location, could we piggy back off of itemId
?
(I'm guessing this will result in a 'no' once I've read further in the PR)
It would mean we'd have to store something for each symbol (more memory), but that might balance out as we're storing an extra 4 bytes with the current change. The lack of loc
information would have to be tracked on the loc data end, which does mean a deref/index-lookup prior to knowing it doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revisiting this comment, I'm pretty sure this wouldn't be a good approach at present, better to defer it as the necessary data and consequent layout are likely to change dramatically.
@@ -93,6 +94,14 @@ type | |||
|
|||
func hash(x: PSym): int = hash(x.id) | |||
|
|||
proc writeMangledLocals(p: BProc) = | |||
## Writes the mangled names of `p`'s locals to the module's NDI file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice secondary benefit developing over time, as the mangling becomes more and more consistent it's going to make gdb
pretty printers' life all that much easier.
Co-authored-by: Saem Ghani <saemghani+github@gmail.com>
/merge |
Merge requested by: @saem Contents after the first section break of the PR description has been removed and preserved below:
|
Summary
Remove the
loc
fields from bothTSym
andTType
and leave it tothe code generators how and if they use
TLoc
--TSym
only stores theexternal name plus interface flags now. The benefits:
analysis and code generation is removed
TType
instance shrinks from 120 to 96 bytes (with a64-bit target)
TSym
instance shrinks from 168 to 160 bytesBecause of its
lode
field,TLoc
not being defined in theast_types
module is also an important prerequisite for introducing a code IR for
the code generators.
Details
The key changes:
TLoc
and the associated types only relevant to the C codegenerator to
cgendata
loc
field fromTSym
andTType
--TSym
stores theexternal name and interface flags via the new
extname
andlocFlags
fields
locId
field toTSym
ast
of dispatcher storing the wrong symbolcgen
For associating a
PSym
with aTLoc
, multipleStore
s shared acrossall C modules are used. The
TSym.locId
is an index into theSymbolMap
(which is aStore
underneath) for the respective symbolkind (globals, constants, procedures, fields). While a single
SymbolMap
could be used (for the non-procedure entities), the plan isto use dedicated types for the entities in the future, and using
separate stores is a preparation for that.
In order to not re-mangle parameters every time when emitting a
prototype, the locs for parameters are stored with their procedure.
Since locals are known to not outlive their surrounding procedure, the
locs for them are stored in a dedicated
SymbolMap
with the procedurecontext -- once a procedure context goes away, the locals'
TLoc
s canbe, and are, freed already.
In general, the C code generator is now more strict with who is
responsible for filling-in loc information: for constants and globals,
it must only happen once during definition. Procedures and fields still
need to support ad-hoc loc setup, and fields need special-casing because
of
.inline
procedures and howfinally
clauses work (i.e., they'reduplicated at a too late time).
A table-based approach where the symbol IDs are mapped to
TLoc
s wouldhave also worked, but measurements showed that it was ~1.4% slower.
NDI files
Instead of directly updating the NDI file with a new entry when
generating a mangled name, as was done previously, adding the entries is
now the responsibility of the orchestrator: entries for whole-program
entities are added after code generation has finished -- entries for
locals once code generation for a procedure has finished.
For NDI file generation to work, the symbol. This was already the case
for all entities except constants, so constants now also store a
nkSym
node in their
lode
field.jsgen
Due to it being less complex than the C code generator, a table-based
approach is okay for the JavaScript code generator. Only the JavaScript
name is relevant to the code generator, so it maps the symbol IDs
directly to strings and doesn't use
TLoc
. Separate tables are usedfor global entities (globals, constants, procedures, and fields),
locals, and parameters.