Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: data-oriented storage for TLib #797

Merged
merged 3 commits into from
Jul 15, 2023

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Jul 15, 2023

Summary

Replace storing TLib objects as refs in TSym with storing all
TLib instances belonging to a module in a sequence that a TSym then
stores an index-like reference into.

This makes serializing the symbols and associated TLib instances
easier: instead of storing a PackedLib for each packed symbol, the
packed symbols now only store a module+index tuple.

Details

For simplicity, the list with the libs for a module is stored in a
separate seq in ModuleGraph. The list cannot be stored in
PContext, as it needs to be accessible after semantic analysis, at
which point PContext is already gone.

While the handle to the TLib instance stored with TSym would
ideally be a single index into the TLib list of the module a
symbol is attached to, the way symbols of aliased structural types
currently work (the aliased type's TSym content is partially
copied) makes this not possible, meaning that a symbol also has to
store the ID of the module the referenced TLib instance is part
of.

Storing the TLib objects for a module happens when the module is
closed. While not strictly necessary at the moment, this is a
preparation for storing a PSym in TLib.name that is only setup
when the module is closed.

Aside from reducing the size of rodfiles, not storing a PackedLib
for each PackedSym also fixes another inefficiency, namely that
each deserialized TSym got a unique, possibly duplicated, TLib
instance.

Summary
=======

Replace storing `TLib` objects as `ref`s in `TSym` with storing all
`TLib` instances belonging to a module in a sequence that a `TSym` then
stores an index into.

This slightly reduces the size of `TSym` and, more importantly, makes
serializing the symbols and associated `TLib` instances easier: instead
of storing a `PackedLib` for *each* packed symbol, the packed symbols
now only store an index into the attached-to module's packed lib list.

Details
=======

For simplicity, the list with the libs for a module is stored in a
separate `seq` in `ModuleGraph`. The list cannot be stored in
`PContext`, as it needs to be accessible after semantic analysis, at
which point `PContext` is already gone.

Storing the `TLib` objects for a module happens when the module is
closed. While not strictly necessary at the moment, this is a
preparation for storing a `PSym` in `TLib.name` that is only set
up once the module is closed.

Aside from reducing the size of rodfiles, not storing a `PackedLib`
for each `PackedSym` also fixes another inefficiency, namely that
each deserialized got a unique, possibly duplicated, `TLib`
instance.
A symbol can now reference the `TLib` belonging to a module that is not
the one the symbol is attached to. This works around an issue with
aliased structural types.

Also, use the ID provided by the module's ID generator instead of the
`skModule`s position. While both represent the same thing (a
``FileIndex``), querying the ID generator is more consistent with how it
works elsewhere.
A module ID isn't necessarily the same before and after loading/storing
a rodfile. Loading and storing of `LibId`s now considers this.
@zerbina zerbina added refactor Implementation refactor compiler/sem Related to semantic-analysis system of the compiler compiler/backend Related to backend system of the compiler labels Jul 15, 2023
@zerbina zerbina requested a review from saem July 15, 2023 15:51
Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall approach, I can see it nudging us towards further convergence between existing compiler and IC.

@@ -1615,6 +1615,20 @@ type
name*: Rope
path*: PNode ## can be a string literal!

LibId* = object
## Identifies a ``TLib`` instance. The default value means 'none'.
Copy link
Collaborator

@saem saem Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a unknownLibId or the like as a constant below to further 'reserve' the concept of the default value LibId(module: 0, index: 0) as none.


Update: reading more of the PR, there is a similar tuple type for IC, so we wouldn't end up using that const/this object type all that much. Seems less worth it practically speaking, mostly there to signal intent.

# the index is 1-based, so we can directly use the length
result = LibId(module: module.int32, index: g.libs[module].len.uint32)

proc storeLibs*(g: ModuleGraph, module: int) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
proc storeLibs*(g: ModuleGraph, module: int) =
proc ownTheLibs*(g: ModuleGraph, module: int) =

jk jk 😆

@saem
Copy link
Collaborator

saem commented Jul 15, 2023

/merge

@github-actions
Copy link

Merge requested by: @saem

Contents after the first section break of the PR description has been removed and preserved below:


Notes for Reviewers

@chore-runner chore-runner bot added this pull request to the merge queue Jul 15, 2023
Merged via the queue into nim-works:devel with commit 1793ddd Jul 15, 2023
18 checks passed
@zerbina zerbina deleted the data-oriented-lib-storage branch July 15, 2023 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/backend Related to backend system of the compiler compiler/sem Related to semantic-analysis system of the compiler refactor Implementation refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants