Defer hashing in staged ledger diff application #15980

georgeee · 2024-08-26T12:48:26Z

Problem: when performing transaction application, all merkle ledger hashes are recomputed for every account update. This is wasteful for a number of reasons:

Transactions normally contain more than one account update and a hash computed for an account update is sometimes overwritten by a subsequent account update
Ledger's depth is 35, whereas only around 2^18 records are actually populated. This means that each account update induces an wasteful overhead of at least 17 hashes
When an account is touched in a few transactions of the same block, it's gets hashed a few times, whereas in fact only the final hash is truly needed

Solution: defer computation of hashes in mask. When an account is added, it's stacked in a list unhashed_accounts of masks which is processed at the time of next access to hashes.

This fix improved performance of handling 9-account-update transactions by ~60% (measured by #14582 on top of #15979 and #15978).

Explain how you tested your changes:

Measured performance with Add tool to test application of txs to ledger #14582

Checklist:

Dependency versions are unchanged
- Notify Velocity team if dependencies must change in CI
Modified the current draft of release notes with details on what is completed or incomplete within this project
Document code purpose, how to use it
- Mention expected invariants, implicit constraints
Tests were added for the new behavior
- Document test purpose, significance of failures
- Test names should reflect their purpose
All tests pass (CI will check this if you didn't)
Serialized types are in stable-versioned modules
Does this close issues? None

Problem: when performing transaction application, all merkle ledger hashes are recomputed for every account update. This is wasteful for a number of reasons: 1. Transactions normally contain more than one account update and hashes computed by an account update are overwritten by a subsequent account update 2. Ledger's depth is 35, whereas only around 2^18 are actually populated. This means that each account update induces a wasteful overhead of at least 17 hashes Solution: defer computation of hashes in mask. When an account is added, it's stacked in a list `unhashed_accounts` of masks which is processed at the time of next access to hashes. This fix improved performance of handling 9-account-update transactions by ~60% (measured on a laptop).

Defer computation of account hashes to the moment the hashes of the mask will actually be accessed. This is useful if the same account gets overwritten a few times in the same block.

…r-diff-application

…hashing-in-staged-ledger-diff-application

georgeee · 2024-08-27T12:46:04Z

!ci-build-me

…hashing-in-staged-ledger-diff-application

volhovm

After spending >1.5h in total looking into this PR:

It seems like a cool PR / idea to start with.
I have no way of judging if this change is as you intended (unfortunately). This is a dense module describing a non-trivial data structure with close-to-zero documentation. On the positive side, I think I figured out the general sentiment of what you tried to achieve, which is already good.
I'd like to see tests for this module
I'd like to see explicit type annotations AND comments for most internal functions (think >5-10 lines of code; small helpers can be skipped), added in this PR and those moved around. Reading this for the first time trying to guess what the functions are doing without comments/types is hard.

(offtop rant: open module T.S; module T = T.T; let type t = t; let type a x = t.a x; let rec impl a t = impl T.impl go a. 2 step indentation (really hard to tell where code blocks end), 80 symbol wrapping, no explicit type annotations, almost no comments. This codebase is impossible to review without opening a whole IDE. I hope we're moving towards a more reader-friendly codestyle?.. 💀 )

volhovm · 2024-10-03T12:42:38Z