-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Resharding] Offline prototype for using Flat Storage to reconstruct trie #9105
Comments
Existing work for state sync mentioned above: #8927 |
Implemented a batched version of flat storage to trie with batch size as 500 MB so as to not overwhelm memory usage. Please see attached PR. I ran this using the storage from a mainnet canary node as that has flat state, and using a standard
Couple of observations
This hopefully means it should be feasible to do a one time migration for the trie and flat storage for validator nodes. We still however need to think about the practicality of archival nodes. |
This also means that we likely can afford to perform resharding by reconstructing trie from flat storage every time. The resharding will be happening in the background and the time constraint is that catchup + resharding fit within one epoch with some healthy margin.
This issue is primarily about the resharding strategy so it should be orthogonal to the archival data. When resharding we keep the old data as is so there won't be a need for any lengthy migrations on archival nodes. Archival nodes are important to consider but they only come into play when considering the storage structure. Let us consider it as part of that workflow. |
As background context, there are three candidate solutions for the resharding problem: (S1) restructure the trie so that shard range corresponds to a trie range; (S2) when resharding, shallow copy the account subtries while prefixing each trie node with account ID or prefix; (S3) when resharding, always reconstruct new tries from flat storage.
(S1) requires a protocol upgrade and data migration; (S2) requires a data migration, and both of these data migrations require rebuilding the entire trie. Iterating through the old trie to rebuild the new trie is infeasible and will not complete within an epoch. Therefore, all three solutions require the ability to reconstruct a trie from flat storage; the only difference being that (S3) requires it permanently and (S1) and (S2) only require it for the initial migration.
For this task, we need to build a prototype, an offline tool to reconstruct tries using flat storage. The goal is to (G1) show that it works; (G2) estimate how long it takes. This will help us make a decision for whether (S3) is good enough for the foreseeable future, or we need to pursue (S1) or (S2).
In addition to the ability to reconstructing tries, we also need the ability to snapshot the flat storage state at the beginning of an epoch.
There is existing work by @Longarithm to use flat storage to perform state sync, so this should already include these abilities.
The text was updated successfully, but these errors were encountered: