-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: greatly improve hash efficiency in computing attributes #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, the refactoring for AttrStore
looks nice 👍
It would be better if we can add benchmark related code too. |
Here are the benchmark results, run from my laptop: With
Compared with std
We can see |
More benches with touying.
The two files are not so powerful as tablex 😃 |
dbed0cc
to
c3225d5
Compare
Some bench results before this change:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Running benchmark: cargo bench # run all benchmarks
cargo bench -- pretty # run benchmarks of pretty-print only |
This PR resolves a critical issue that the formatter runs surprisingly slow on some large documents (e.g., the touying package), which was noticed earlier by @Myriad-Dreamin but not publicly disclosed.
After conducting an investigation with timing, I found that it is the HashMap in node attribution computation instead of Doc creating and pretty printing that hurts. In discussions with @Enter-tainer and @Myriad-Dreamin, we discovered that the previously used
SyntaxNode
hash value is computed recursively, which should be avoided in our preprocessing. As an alternative, we can use the span as the unique identifier for nodes. The root node needs to be created fromSource
. Otherwise, it would have no valid span attached.With the changes in this PR, the time of formatting
tablex.typ
(the largest one in our test assets) dramatically drops from ~500ms to ~20ms. Additionally, unit tests can be completed much faster, although the improvement is not that significant.Benchmarks will be added later.