Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add adr-001 for node key refactoring #608
feat: add adr-001 for node key refactoring #608
Changes from 6 commits
4f9a004
8646a9a
6e5b081
4ee7804
9468ee2
0c2d610
1459a18
d8d0bf9
88885f1
5272de4
006d76c
7cc7280
4e6044f
0bf7486
a0dcc0e
ee11dff
d9f7d2e
10184ac
5f94844
85a90e7
8fb87b6
6bbf7f9
204d881
f743311
bf85d92
b064ede
6095bc4
7873267
fb0f182
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is nodeKey computed, I remember it's sth like "version+path"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it is a just sequenced integer.
for example, we assign the
nodeKey
of the new node astree.nonce + 1
every time when create nodeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems better to use "version/seq", where seq is only unique inside the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, the
nodeKey
is unique globally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"version/seq" is globally unique as well, seq itself is locally unique within the version. For each version, the nodes are written in a batch anyway, we only need to maintain the seq in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to update the left and right node keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here,
leftNodeKey
andrightNodeKey
refer to the children node key in the db. If we update the node key asversion|seq
, we should also update left and right node keys, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that every time we have a new version we update all referenced node keys to refer to that version rather than the original version where they were created? That seems really complex. I feel like I'm maybe not understanding something really basic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yihuang , I agree with you. If we keep the local nonce as
int32
, there is not much increase in storage size because we can remove theversion
from node body writes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the size difference should be minimal, assuming version is
int64
:node key,
8bytes
vs8+4bytes
, but with prefix compression, the amortized difference should be smaller, local nonce only use 2 bytes in most cases, so the 10 bytes prefix is shared and compressed away.node body, we can remove the version field, save a
varint(version) + 1 tag byte
, usually 4 to 5bytes for production chains.leftNodeKey
andrightNodeKey
,varint(global nonce)
vsvarint(version << 32 + local nonce)
, orvarint(version) + varint(local nonce) + 1 more byte to tag the extra field
.Say there's 1000000 blocks, and 2000 new nodes for each block, the new global nonce would be
1000000 * 2000
, the two encodings:len(varint(1000000 * 2000)) == 5
len(varint(1000000)) + len(varint(2000)) + 1 == 6
len(varint('uint64', (1000000<<32)+2000)) == 8
I think two fields wins, the difference is only 1 byte, and no bitwise operations:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one consequence is proof generation is slower, previously we can get child hash directly, now we need to load the child node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using the ics23 proof, which requires size, version, and height so I believe it doesn't lead to any delays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are using nonce as
nodeKey
, how are we getting data locality when traversing or writing the tree to disk?I thought that the whole idea with the path encoded as a key was that writes would be sequential, contributing to lower frequency of compactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nodes are all immutable, the
nodeKey
is sequential, so all the new nodes are written sequentially.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the tree path describes the data locality well. The problem is how to keep the path when rotating the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we need to split the
leftNodeKey
intoleftNodeVersion
andleftNodeNonce
, likewise forrightNodeKey
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I updated the node struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these fields, if we already have
leftNode
andrightNode
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftNode
andrightNode
is only meaningful on memory side, we need these fields to get children from the storage side.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way to migrate is using the state streamer, resync the chain, and recreate the new iavl db with the state changes, then switch.
Iterate iavl tree version by version sounds pretty slow, lots of random db access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, so validators need to update at once (at least 67% of them). So we need a migration process anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not consensus breaking, since the root hashes don't change, validators don't need to update at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably depends on the migration mechanism. I think the one described below won't guarantee the same IAVL tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, today IAVL key is a hash, so this update will change the keys stored in the IAVL, hence it will change an order in the tree, and the tree Merkle Hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can migrate IAVL nodes one to one, change references between them, but the hashes of each node is not changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrate a archive node and pruned node will end up with different node keys, but it's probably ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent node nonces could make the state-sync snapshot inconsistent as well?
Then I think using the path as identity is more deterministic than sequential nonce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the global nonce they would be different but version | local nonce should be same, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, in migration, it depends on the traversal order, this is easy to specify, like preorder+ascending. but when handling the insertion, we should also specify the precise logic of nonce assignments, to allow alternative implementations be compatible with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the inconsistency of nodeKey is not a big problem, the only thing we need is to match the proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test this on a live network with state sync and a simple migration to see if it works to be sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends on how we export the state sync snapshot I think, if we re-map the nodekey, then the stored one don't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the nonce assignment should happens in the
SaveBranch
, which should do a preorder ascending traversal, and prunes the historical branches. So the logic is pretty precise and deterministic here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version
itself, and just assign the nonce.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowadays, all the "mainstream" db backends are lsm tree, but it should help there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessng by bTree we refer to IAVL tree? However, I agree that "bTree" seems confusing in this context because the alternative to LSM is the first thing that comes to mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the major benefits might also be reduced compactions since we always commit semi-sorted data (by nonce) to the underlying LSM.
Since we commit versioned data, there wouldn't be a need for as many merge operations as today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will update this part and add more description of LSM