feat: remove orphans #646

cool-develope · 2022-12-08T20:09:22Z

Context

We assume the intermediate versions are unnecessary and by keeping the versions in sequence, we don't need to load all versions and we can keep only the fristVersion and latestVersion. It will resolve #637 .

What does this PR do?

Remove DeleteVersion, DeleteVesions, and DeleteVersionsRange
Add a new API of DeleteVersionsTo
Remove orphans from the storage
Remove LazyLoadVersion

mutable_tree.go

nodedb.go

+	defer itr.Close()
+
+	nversion := int64(0)
+	for ; itr.Valid(); itr.Next() {


cool-develope · 2022-12-08T20:25:41Z

@yihuang , it is overlapped with your PR #641, I am open to your opinions. Please review it.

lgtm-com · 2022-12-08T20:28:20Z

This pull request introduces 1 alert when merging 946c32d into 992120d - view on LGTM.com

new alerts:

1 for Unreachable statement

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

yihuang · 2022-12-09T01:20:09Z

We assume the intermediate versions are unnecessary and by keeping the versions in sequence

There's a use case to keep every nth version as snapshots to rebuild any version on the fly by replaying the change set from versiondb or file streamer outputs. I think it should be easy to support the current behaviour.

I believe your prune(diff) algorithm is equavalent to the one in #641, adding some small adjustments.
Currently the tree diff implementation looks something like this:

def find_orphaned_nodes(v1, v2):
    pass

def delete_version(v):
    for orphaned_node in find_orphaned_nodes(v, v+1):
        delete_node(orphaned_node)

Do several adjustments:

find successor version dynamically
find predecessor version, and don't delete nodes beyond predecessor version.
Then it'll work perfectly with non-continuous versions.

def delete_version(v):
    predecessor = find_predecessor(v)  # default to 0 if no predecessor
    successor = find_successor(v) # if there's no successor, we should return error
    for orphaned_node in find_orphaned_nodes(v, successor):
        if orphaned_node.version <= predecessor:
            # this node must be referenced by the predecessor version, so don't delete
        else:
            delete_node(orphaned_node)

The orphaned_node.version <= predecessor check can be further embed into the tree traversals itself to make it more performant.

The diff algorithm is just an implementation detail, the main issue is why drop the support for the gap versions, I think it's a useful feature. I believe we should at least discuss these two issues(remove orphan and support gap versions, maybe the slow startup issue as well) separately, they can be solved separately.

yihuang · 2022-12-09T01:21:19Z

It will resolve #637 .

I think startup time issue alone can be solved by lazy load mode.

mmsqe · 2022-12-09T01:29:00Z

iterator.go

+		return
+	}
+	node := iter.GetNode()
+	iter.nodesToVisit = iter.nodesToVisit[:len(iter.nodesToVisit)-1]


May I ask is this rm last node to allow visit left node if skipped (when left and right node exist in last iteration)?

I think this is to pop the visited node from the stack?

right, it simulates pre-order DFS, so it assumes the root of subtree is already visited and pops it from the stack.

tac0turtle · 2022-12-09T09:16:42Z

I would argue we should reduce the feature set and off load things like old commitments outside the node. Something like commitment streaming would solve this issue and would reduce the complexity in the code we maintain.

Secondly the feature in the sdk of intermittent states was removed a while ago and I don't think anyone has come asking for it back outside of crypto.com I believe.

Thirdly, there is a long term objective of trying experiment with different treee algorithms. The more we dice the feature set of iavl the easier it will be to swap trees for research in the future. Also this feature wasn't present on adr40 so surprised we need to keep it here.

yihuang · 2022-12-09T09:46:21Z

I would argue we should reduce the feature set and off load things like old commitments outside the node. Something like commitment streaming would solve this issue and would reduce the complexity in the code we maintain.

Secondly the feature in the sdk of intermittent states was removed a while ago and I don't think anyone has come asking for it back outside of crypto.com I believe.

Thirdly, there is a long term objective of trying experiment with different treee algorithms. The more we dice the feature set of iavl the easier it will be to swap trees for research in the future. Also this feature wasn't present on adr40 so surprised we need to keep it here.

I understand the desire to simplify things here, it's just I don't see supporting this feature make other things too complicated. But yeah, maybe in the long run, people do need to have alternative implementations, as long as we keep the root hashes compatible.

cool-develope · 2022-12-09T13:20:46Z

tbh, I don't like the word lazy loading, I think we should keep the primary functionalities as small as possible and can provide more full extra functionalities, the typical case of extra functionality is import/export.
for example, we can solve the version gap issue by arguing import/export. we can add fromVersion parameter to export, so that it will export only new nodes which are updated after fromVersion. In this way, we can separate the storage as operating storage and snapshots storage.

yihuang · 2022-12-12T03:07:35Z

nodedb.go

+		}
+		pNode := prevIter.GetNode()
+
+		if orgNode != nil && bytes.Equal(pNode.hash, orgNode.hash) {


Are you sure preorder traversal is correct for this, I'm still trying to reason about the correctness of this algorithm, it seems we need an in-order traversal here which is ordered by node keys.
What's important here is curIter and prevIter must visit the sequence of orgNodes in the same order, right?
Or we can keep the set of orgNode in memory, so we can traverse the trees in whatever order.

But we do need preorder to try to skip the subtree early on, so we need to check twice for a branch node both preorder and inorder.
For example, if the branch node match the current orgNode, we want to check with preorder to avoid traverse into the subtree.
But if the branch node's left children will match the current orgNode, the branch node need to be checked again against the new orgNode, because it could be a match too.

pre-order or in-order are not important here, right? Because we only care about the order of leaves. I don't like keep orgNodes or orphans in memory, those amounts may be enormous.

No, orgNode could be branch nodes

yihuang · 2022-12-12T03:23:49Z

It seems always deleting versions from the beginning is not optimal, because in that case we always do a full diff without a predecessor to limit the traversal space, just an intuition though, need closer analysis.

cool-develope · 2022-12-12T15:06:16Z

It seems always deleting versions from the beginning is not optimal, because in that case we always do a full diff without a predecessor to limit the traversal space, just an intuition though, need closer analysis.

yeah, you are right. It is designed before iterator implementation, I will re-struct it

cool-develope · 2022-12-12T21:09:17Z

It seems always deleting versions from the beginning is not optimal, because in that case we always do a full diff without a predecessor to limit the traversal space, just an intuition though, need closer analysis.

yeah, you are right. It is designed before iterator implementation, I will re-struct it

It seems like there is no way for now, maybe there will be a way using new node key (version + path)

yihuang · 2022-12-19T04:11:00Z

With new node key format, we can do the diff with two simple db iterations, and:

if the nonce is assigned ordered by node.key, we can do two iterations simultaneously with constant memory.
otherwise, we can do two iterations separately, keeping the set shared node keys in temporary memory.

tac0turtle · 2023-01-02T11:31:08Z

@cool-develope should we review and merge this before #650

cool-develope · 2023-01-03T12:50:02Z

@cool-develope should we review and merge this before #650

no, #650 entirely belongs its implementation.
I want to close this PR when merging #650 for review purposes or we can merge this first.

yihuang · 2023-01-04T02:04:37Z

@cool-develope can you address the traversal order issue, I used the temporary map approach to extract state changes here but following your diff algorithm in general.

yihuang · 2023-01-10T17:01:30Z

For example, the current node (say N1) in prevIter don't match the current orgNode, but one node in it's left branch matched it and cause orgNode to bump to N1, but it's won't be checked anymore, because the N1 in prevIter is already checked.

No, if N1 is not matched, then we will keep traverse within N1's subtree. The skip is applied only if we meet the same node with orgNode.

Say N1 don't match current orgNode O1, but N1's left child match O1, is it possible that a future orgNode will be N1? In that case, N1 is already checked, won't match again. I hope it's not possible, but we need to prove it.

cool-develope · 2023-01-10T17:07:10Z

Say N1 don't match current orgNode O1, but N1's left child match O1, is it possible that a future orgNode will be N1? In that case, N1 is already checked, won't match again. I hope it's not possible, but we need to prove it.

no, it is not possible, there is no overlapping between orgNodes, just think as a group of subtree. the orgNodes should be a group of subtree, right?

yihuang · 2023-01-10T17:46:51Z

Say N1 don't match current orgNode O1, but N1's left child match O1, is it possible that a future orgNode will be N1? In that case, N1 is already checked, won't match again. I hope it's not possible, but we need to prove it.

no, it is not possible, there is no overlapping between orgNodes, just think as a group of subtree. the orgNodes should be a group of subtree, right?

Yeah, I guess that make sense.

yihuang · 2023-01-11T01:51:26Z

I guess the reasoning goes like this:

The sequence of orgNode is a list of root nodes of disjoint sub-trees, because when traversing curIter, the sub-tree is skipped when root node is found.
A sequence of root nodes of disjoint sub-trees are visited in the same order in different versions with pre-order traversal, is there a more formal argument for this assumption?

yihuang · 2023-01-11T02:17:22Z

nodedb.go

+		}
+		pNode := prevIter.GetNode()
+
+		if orgNode != nil && bytes.Equal(pNode.hash, orgNode.hash) {


is it more efficient to do an extra pointer comparison here: (pNode == orgNode || bytes.Equal(pNode.hash, orgNode.hash))

that's a good point, if they are from cache

fix test

cool-develope · 2023-01-11T13:38:35Z

I guess the reasoning goes like this:

The sequence of orgNode is a list of root nodes of disjoint sub-trees, because when traversing curIter, the sub-tree is skipped when root node is found.

A sequence of root nodes of disjoint sub-trees are visited in the same order in different versions with pre-order traversal, is there a more formal argument for this assumption?

no, looks good, say again the order method is not important here

yihuang · 2023-01-11T13:53:02Z

I guess the reasoning goes like this:

The sequence of orgNode is a list of root nodes of disjoint sub-trees, because when traversing curIter, the sub-tree is skipped when root node is found.

A sequence of root nodes of disjoint sub-trees are visited in the same order in different versions with pre-order traversal, is there a more formal argument for this assumption?

no, looks good, say again the order method is not important here

order is important here, if the sequence of orgNode are visited in different order in two versions, that's definitely problematic, although that seems not to be the case here.

cool-develope · 2023-01-11T13:55:28Z

order is important here, if the sequence of orgNode are visited in different order in two versions, that's definitely problematic, although that seems not to be the case here.

sorry, you are right, I mean the traversal order is not important, for example in-order traverse of prevIter will also work

yihuang · 2023-01-11T13:56:57Z

order is important here, if the sequence of orgNode are visited in different order in two versions, that's definitely problematic, although that seems not to be the case here.

sorry, you are right, I mean the traversal order is not important, for example in-order traverse of prevIter will also work

yeah, agree, as long as they visit the orgNodes in the same order.

nodedb.go

kocubinski · 2023-01-17T23:24:48Z

Seems generally OK but I think we should have some tests for DeleteVersionsTo and traverseOrphans. I'm still proving to myself that that algorithm works as expected. Thorough coverage of edge cases would help.

iterator.go

kocubinski

Tested, audited, LGTM. I still think we should have some basic tests of traverseOprhans though. Roughly what I did in my branch but more formal, I just compared STDOUT to my expectations. I can commit some of these later this week if you want.

It will probably also be useful to run some benchmark tests on larger data sets validating inputs/outputs at a high level. More like fuzz testing.

yihuang

I'm not sure about dropping the support for deleting versions in middle, but other than that, LGTM.

fix test rename try to lower memory usage fix lint

Co-authored-by: Marko <marbar3778@yahoo.com>

Co-authored-by: Marko <marbar3778@yahoo.com> (cherry picked from commit c90c009)

…#665) Co-authored-by: yihuang <huang@crypto.com>

…osmos#658) (cosmos#665) Co-authored-by: yihuang <huang@crypto.com>

cool-develope requested a review from a team as a code owner December 8, 2022 20:09

github-advanced-security bot found potential problems Dec 8, 2022

View reviewed changes

mutable_tree.go Fixed Show fixed Hide fixed

nodedb.go

defer itr.Close()

nversion := int64(0)

for ; itr.Valid(); itr.Next() {

Check warning

Code scanning / CodeQL

Unreachable statement

This statement is unreachable.

cool-develope requested a review from tac0turtle December 8, 2022 20:26

mmsqe reviewed Dec 9, 2022

View reviewed changes

mmsqe mentioned this pull request Dec 9, 2022

optim: skipping more subtrees for pruning mode crypto-com/python-iavl#9

Merged

yihuang reviewed Dec 12, 2022

View reviewed changes

yihuang mentioned this pull request Dec 12, 2022

simplify tree diff algorithm crypto-com/python-iavl#12

Merged

yihuang mentioned this pull request Dec 13, 2022

feat: delete orphan nodes by traversing trees #641

Closed

cool-develope mentioned this pull request Dec 22, 2022

feat: refactor the node key as version + path #650

Closed

tac0turtle assigned aaronc and testinginprod Jan 2, 2023

yihuang mentioned this pull request Jan 3, 2023

feat: Add API TraverseStateChanges to extract state changes from iavl versions #654

Merged

cool-develope added 3 commits January 3, 2023 09:16

remove orphans

12caa26

update CHANGELOG

57f3a5f

fix lint issues

2a6eedc

unlock mechanism

6ecc833

yihuang reviewed Jan 11, 2023

View reviewed changes

yihuang added a commit to yihuang/iavl that referenced this pull request Jan 11, 2023

Optimize diff algorithm with insightes from cosmos#646

a1ee7c6

yihuang added a commit to yihuang/iavl that referenced this pull request Jan 11, 2023

Optimize diff algorithm with insightes from cosmos#646

fee0d57

fix test

kocubinski reviewed Jan 17, 2023

View reviewed changes

nodedb.go Show resolved Hide resolved

kocubinski reviewed Jan 17, 2023

View reviewed changes

nodedb.go Outdated Show resolved Hide resolved

kocubinski reviewed Jan 18, 2023

View reviewed changes

iterator.go Outdated Show resolved Hide resolved

cool-develope added 2 commits January 18, 2023 13:27

fix conflicts

8fa7018

resolve conflicts

bb1050f

kocubinski approved these changes Jan 18, 2023

View reviewed changes

cool-develope mentioned this pull request Jan 19, 2023

feat: refactor the export traversal order as pre-order #662

Closed

cool-develope requested a review from yihuang January 19, 2023 13:13

yihuang approved these changes Jan 19, 2023

View reviewed changes

kocubinski mentioned this pull request Jan 19, 2023

feat: Add unit test for traverseOrphans #663

Merged

cool-develope merged commit bc94180 into master Jan 23, 2023

cool-develope deleted the 592/remove_orphans branch January 23, 2023 17:24

yihuang added a commit to yihuang/iavl that referenced this pull request Jan 26, 2023

Optimize diff algorithm with insightes from cosmos#646

21d1932

fix test rename try to lower memory usage fix lint

tac0turtle added a commit that referenced this pull request Jan 27, 2023

perf: Optimize diff algorithm with insights from #646 (#658)

c90c009

Co-authored-by: Marko <marbar3778@yahoo.com>

mergify bot pushed a commit that referenced this pull request Jan 27, 2023

perf: Optimize diff algorithm with insights from #646 (#658)

c78396a

Co-authored-by: Marko <marbar3778@yahoo.com> (cherry picked from commit c90c009)

tac0turtle pushed a commit that referenced this pull request Jan 27, 2023

perf: Optimize diff algorithm with insights from #646 (backport #658) (…

a5366e7

…#665) Co-authored-by: yihuang <huang@crypto.com>

ankurdotb pushed a commit to cheqd/iavl that referenced this pull request Feb 28, 2023

perf: Optimize diff algorithm with insights from cosmos#646 (backport c…

73e787d

…osmos#658) (cosmos#665) Co-authored-by: yihuang <huang@crypto.com>

coderabbitai bot mentioned this pull request May 13, 2024

chore: changelog onto master #946

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: remove orphans #646

feat: remove orphans #646

cool-develope commented Dec 8, 2022

cool-develope commented Dec 8, 2022

lgtm-com bot commented Dec 8, 2022

yihuang commented Dec 9, 2022 •

edited

Loading

yihuang commented Dec 9, 2022 •

edited

Loading

mmsqe Dec 9, 2022

yihuang Dec 9, 2022

cool-develope Dec 9, 2022

tac0turtle commented Dec 9, 2022 •

edited

Loading

yihuang commented Dec 9, 2022

cool-develope commented Dec 9, 2022 •

edited

Loading

yihuang Dec 12, 2022 •

edited

Loading

yihuang Dec 12, 2022 •

edited

Loading

cool-develope Dec 12, 2022

yihuang Dec 12, 2022

yihuang commented Dec 12, 2022 •

edited

Loading

cool-develope commented Dec 12, 2022

cool-develope commented Dec 12, 2022

yihuang commented Dec 19, 2022 •

edited

Loading

tac0turtle commented Jan 2, 2023

cool-develope commented Jan 3, 2023 •

edited

Loading

yihuang commented Jan 4, 2023 •

edited

Loading

yihuang commented Jan 10, 2023 •

edited

Loading

cool-develope commented Jan 10, 2023

yihuang commented Jan 10, 2023

yihuang commented Jan 11, 2023 •

edited

Loading

yihuang Jan 11, 2023 •

edited

Loading

cool-develope Jan 18, 2023

cool-develope commented Jan 11, 2023

yihuang commented Jan 11, 2023

cool-develope commented Jan 11, 2023

yihuang commented Jan 11, 2023

kocubinski commented Jan 17, 2023

kocubinski left a comment •

edited

Loading

yihuang left a comment

feat: remove orphans #646

feat: remove orphans #646

Conversation

cool-develope commented Dec 8, 2022

Context

What does this PR do?

cool-develope commented Dec 8, 2022

lgtm-com bot commented Dec 8, 2022

yihuang commented Dec 9, 2022 • edited Loading

yihuang commented Dec 9, 2022 • edited Loading

mmsqe Dec 9, 2022

Choose a reason for hiding this comment

yihuang Dec 9, 2022

Choose a reason for hiding this comment

cool-develope Dec 9, 2022

Choose a reason for hiding this comment

tac0turtle commented Dec 9, 2022 • edited Loading

yihuang commented Dec 9, 2022

cool-develope commented Dec 9, 2022 • edited Loading

yihuang Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

yihuang Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

cool-develope Dec 12, 2022

Choose a reason for hiding this comment

yihuang Dec 12, 2022

Choose a reason for hiding this comment

yihuang commented Dec 12, 2022 • edited Loading

cool-develope commented Dec 12, 2022

cool-develope commented Dec 12, 2022

yihuang commented Dec 19, 2022 • edited Loading

tac0turtle commented Jan 2, 2023

cool-develope commented Jan 3, 2023 • edited Loading

yihuang commented Jan 4, 2023 • edited Loading

yihuang commented Jan 10, 2023 • edited Loading

cool-develope commented Jan 10, 2023

yihuang commented Jan 10, 2023

yihuang commented Jan 11, 2023 • edited Loading

yihuang Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

cool-develope Jan 18, 2023

Choose a reason for hiding this comment

cool-develope commented Jan 11, 2023

yihuang commented Jan 11, 2023

cool-develope commented Jan 11, 2023

yihuang commented Jan 11, 2023

kocubinski commented Jan 17, 2023

kocubinski left a comment • edited Loading

Choose a reason for hiding this comment

yihuang left a comment

Choose a reason for hiding this comment

yihuang commented Dec 9, 2022 •

edited

Loading

yihuang commented Dec 9, 2022 •

edited

Loading

tac0turtle commented Dec 9, 2022 •

edited

Loading

cool-develope commented Dec 9, 2022 •

edited

Loading

yihuang Dec 12, 2022 •

edited

Loading

yihuang Dec 12, 2022 •

edited

Loading

yihuang commented Dec 12, 2022 •

edited

Loading

yihuang commented Dec 19, 2022 •

edited

Loading

cool-develope commented Jan 3, 2023 •

edited

Loading

yihuang commented Jan 4, 2023 •

edited

Loading

yihuang commented Jan 10, 2023 •

edited

Loading

yihuang commented Jan 11, 2023 •

edited

Loading

yihuang Jan 11, 2023 •

edited

Loading

kocubinski left a comment •

edited

Loading