You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to rethink our current state storage implementation, not only to improve it but also to comply with the standards required by the protocol. This issue presents our current solution in contrast with Parity's solution, highlighting the changes we should make along with the benefits we would gain.
Current solution
Currently, our state storage solution, used among other things by the runtime, has various responsibilities:
Apply state changes for both tries and child tries (put, set, delete, clear).
The protocol specifies that the state is a collection of keys and values Polkadot spec. This state must be re-arranged in a trie to calculate the merkle hash and to generate and verify merkle proofs. But internally, we always store the state in the form of a trie when this is only necessary when calculating the hash or generating proofs.
In our case, our state storage implementation get tries from memory and, if not found, loads them by looking up in the DB.
In order to do that we need to feed the storage structure with a DB connection and the in memory tries structure.
To handle the state transitions with transactions we use our current TrieState structure that contains 2 in memory tries used as transient tries to handle the changes and apply them if it is needed.
Here is a simplified diagram of our architecture:
Parity's solution
Parity built a trie crate that simplifies the basic tasks of a tree. The difference with our solution is that they do not store the data in the form of a trie but rather store it in a key-value database, and they have a TrieDB wrapper with trie functions over this database but do not have a trie built in memory at all times.
Each TrieDB represents a trie along with its root node and all its nodes, allowing the same database to be shared for all the tries that need to be represented.
This represents various benefits.
Benefits
Simpler and more scalable
Our current solution looks more like spaghetti code relating different components in a way that is very difficult to follow and understand. It makes the solution complex to understand and could raise issues in the future if we do the wrong changes.
Parity's solution is more straightforward, easy to understand and scalable.
It allows us to implement lazy loading
With our current solution the only lazy loading behavior we have is when the state storage is loading tries from the database and putting them into memory.
We have many problems with that:
It works for tries but not for nodes, when we are loading a trie from the DB we are decoding the entire trie with all its nodes when we only need to use only one node value.
It stores the decoded trie into memory "forever" because we don't have a caching mechanism
Even if we implement the cache it will work for tries but not for nodes so the saved memory will be not as much as we can gain lazy loading nodes
If we want to implement lazy loading of nodes with our current solution we will have to change the way a node is decoded to prevent decoding its childs and we will going to have some repeated code with different behaviors for tries and nodes.
If we implement a solution similar than Parity's trie crate we will have Tries with a similar behavior than Parity's TrieDB and we are going to load nodes from database and decoding them only if we need their values, with the ability to cache the most frequently used in a simple manner
Reduce memory usage
Since we are going to have everything in database and only the most frequently used nodes / values in memory we are going to reduce our memory usage significantly.
A way to solve this using Parity's crate is having a local nodes / values cache for every trie so any trie could be able to cache its nodes independently solving this issue #3623
We stop thinking about trees and start thinking about the state as keys and values.
If we use this solution we are going to start saving a collection of key value pairs instead of saving tries in memory, the "Trie" will be an abstract structure on top of a database that will be re arranging this key value pairs collection into a trie only when it is needed.
In the past, was difficult to implement things like trie V1 because our internal implementation is too couple with the Trie structure, this change of mindset will help us to decouple the data from the representation and we could be free to change the Trie structure in the future with less risks of breaking something.
We can implement nested transactions more easily
We are going to stop using our current TrieState implementation to handle transactions and we could do it in the same TrieDB simplifying the transaction management as well as our architecture.
I think the best way to implement this is like we did with Grandpa. We have to translate the parity create to golang and then start using it in our client.
We already have a PR that we can use as starting point #3390
Issue summary
We need to rethink our current state storage implementation, not only to improve it but also to comply with the standards required by the protocol. This issue presents our current solution in contrast with Parity's solution, highlighting the changes we should make along with the benefits we would gain.
Current solution
Currently, our state storage solution, used among other things by the runtime, has various responsibilities:
The protocol specifies that the state is a collection of keys and values Polkadot spec. This state must be re-arranged in a trie to calculate the merkle hash and to generate and verify merkle proofs. But internally, we always store the state in the form of a trie when this is only necessary when calculating the hash or generating proofs.
In our case, our state storage implementation get tries from memory and, if not found, loads them by looking up in the DB.
In order to do that we need to feed the storage structure with a DB connection and the in memory tries structure.
To handle the state transitions with transactions we use our current TrieState structure that contains 2 in memory tries used as transient tries to handle the changes and apply them if it is needed.
Here is a simplified diagram of our architecture:
Parity's solution
Parity built a trie crate that simplifies the basic tasks of a tree. The difference with our solution is that they do not store the data in the form of a trie but rather store it in a key-value database, and they have a TrieDB wrapper with trie functions over this database but do not have a trie built in memory at all times.
Each TrieDB represents a trie along with its root node and all its nodes, allowing the same database to be shared for all the tries that need to be represented.
This represents various benefits.
Benefits
Simpler and more scalable
Our current solution looks more like spaghetti code relating different components in a way that is very difficult to follow and understand. It makes the solution complex to understand and could raise issues in the future if we do the wrong changes.
Parity's solution is more straightforward, easy to understand and scalable.
It allows us to implement lazy loading
With our current solution the only lazy loading behavior we have is when the state storage is loading tries from the database and putting them into memory.
We have many problems with that:
If we want to implement lazy loading of nodes with our current solution we will have to change the way a node is decoded to prevent decoding its childs and we will going to have some repeated code with different behaviors for tries and nodes.
If we implement a solution similar than Parity's trie crate we will have Tries with a similar behavior than Parity's TrieDB and we are going to load nodes from database and decoding them only if we need their values, with the ability to cache the most frequently used in a simple manner
Reduce memory usage
Since we are going to have everything in database and only the most frequently used nodes / values in memory we are going to reduce our memory usage significantly.
A way to solve this using Parity's crate is having a local nodes / values cache for every trie so any trie could be able to cache its nodes independently solving this issue #3623
We stop thinking about trees and start thinking about the state as keys and values.
If we use this solution we are going to start saving a collection of key value pairs instead of saving tries in memory, the "Trie" will be an abstract structure on top of a database that will be re arranging this key value pairs collection into a trie only when it is needed.
In the past, was difficult to implement things like trie V1 because our internal implementation is too couple with the Trie structure, this change of mindset will help us to decouple the data from the representation and we could be free to change the Trie structure in the future with less risks of breaking something.
We can implement nested transactions more easily
We are going to stop using our current TrieState implementation to handle transactions and we could do it in the same TrieDB simplifying the transaction management as well as our architecture.
That will help us to solve this issue #3508
Necessary Changes
I think the best way to implement this is like we did with Grandpa. We have to translate the parity create to golang and then start using it in our client.
We already have a PR that we can use as starting point #3390
Other information and links
Parity's Trie Crate
TrieDB
https://github.com/paritytech/trie/blob/d3e63773ed044a709f719bd6455b9ab1751aae7c/trie-db/src/triedb.rs#L120
Substrate Cache
https://github.com/paritytech/polkadot-sdk/blob/d43d2fe23b12355eed4afa6dd3e971a248269262/substrate/primitives/trie/src/cache/mod.rs#L18
Related issues
The text was updated successfully, but these errors were encountered: