feat: delete orphan nodes by traversing trees #641

yihuang · 2022-12-05T10:23:33Z

Don't need the orphan bookkeepings.

Consequences

Positives

don't need to store and maintain the orphan records.
- smaller db size
- faster set/delete operations

Negatives

The pruning operation should be slower than current approach, especially pruning by range, the theoretical complexity should be the same, O(N) where N is the size of orphaned nodes, but the constant factor is heavier, the new approach need to load the nodes, while the old approach just iterate orphan records and delete nodes by hashes.

Alternative

If it's considered too controversial to downgrade performance of online pruning, an alternative is to just provide an option to not storing the orphan records for archive nodes, who can always do pruning offline using the algorithm described here.

cool-develope · 2022-12-06T14:09:45Z

@yihuang , this PR looks like it conflicted with my work. And your approach is not working on the current iavl structure. I think it is possible under the assumption of keeping always the range versions, right?

yihuang · 2022-12-06T14:12:18Z

@yihuang , this PR looks like it conflicted with my work. And your approach is not working on the current iavl structure. I think it is possible under the assumption of keeping always the range versions, right?

it do works on current iavl structure, I think it's agnostic to node key format, I'm able to do round-trip test on our testnet production db using the python implementation.
What do you mean by range versions?

cool-develope · 2022-12-06T14:13:30Z

for example, we are keeping the versions of 3, 5, 7, 9.
In that case, you can remove version 5? or if we are keeping the versions of 5 to 10, you can remove the version 8?

yihuang · 2022-12-06T14:14:13Z

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

cool-develope · 2022-12-06T14:20:00Z

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)?

yihuang · 2022-12-06T14:21:37Z

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)?

yeah, you have a point, if the version 2 is already deleted.

To make it work, we need to check if the version of orphaned nodes still exists, which will further slow down the process. but it's not an issue in the default mode that loads all the versions into memory on startup.

cool-develope · 2022-12-06T14:23:58Z

yeah, you have a point.

This is why the current orphans keep fromVersion and toVersion fields, To remove orphans completely, we should assume keeping only the range of versions (for example 1000 - 1100, 1100 is the latest version, 1000 is the oldest available version)

cool-develope · 2022-12-06T14:25:18Z

please refer cosmos/cosmos-sdk#12989

yihuang · 2022-12-06T14:28:13Z

yeah, you have a point.

This is why the current orphans keep fromVersion and toVersion fields, To remove orphans completely, we should assume keeping only the range of versions (for example 1000 - 1100, 1100 is the latest version, 1000 is the oldest available version)

What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory.

cool-develope · 2022-12-06T14:34:18Z

What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory.

not sure for me, we already had same issues when load all versions in the archive node, so I am suggesting keep only the first and last version instead of versions map, for pruning it will provide only can delete the first version. Like this way we can delete the oldest versions one by one

cool-develope · 2022-12-06T14:37:31Z

anyhow, I am in preparing of the separated PR to remove orphans, I am worried we are on same topic

yihuang · 2022-12-06T14:43:54Z

What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory.

not sure for me, we already had same issues when load all versions in the archive node, so I am suggesting keep only the first and last version instead of versions map

The lazy mode?

for pruning it will provide only can delete the first version. Like this way we can delete the oldest versions one by one

delete the first version is a special case in this PR, the previous version will be 0, and it'll simply delete all orphaned nodes.

cool-develope · 2022-12-06T15:20:00Z

tbh, the lazy mode is unclear to me. When will trigger the lazy load or function?

yihuang · 2022-12-06T21:43:26Z

tbh, the lazy mode is unclear to me. When will trigger the lazy load or function?

Whenever you need the root information of a particular version, get from cache or load from db?

yihuang · 2022-12-07T02:19:24Z

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)?

After multiple failures to reproduce the issue, I realize the issue don't exists, the reasoning is like this:

If a node is created in version 2 and referenced by version 5, it must also be referenced by all the versions between 2 and 5.
so checking the node version against 3 is enough, because 3 must have referenced the node too, and the node will eventually be deleted when version 3 is deleted.

mmsqe · 2022-12-07T02:37:04Z

diff.go

+	if l.height <= 0 {
+		panic("already at leaf layer")
+	}
+	nodes := make([]*Node, 0, len(l.nodes)*2+len(l.pendingNodes))


not sure if need this allocate since it's assign nodes = ... later

do you mean nodes = append(nodes, ...?

yea seems optional, just not sure why keep capacity when assign

yea seems optional, just not sure why keep capacity when assign

capacity is to reduce the number of reallocations during the append.

yihuang · 2022-12-07T03:35:58Z

@yihuang , this PR looks like it conflicted with my work.

Sorry, I don't know about that, I thought you are working for solutions for the new node key format.
Do you think this approach make sense, what's the difference with your plan?

cool-develope · 2022-12-07T11:42:31Z

Sorry, I don't know about that, I thought you are working for solutions for the new node key format.
Do you think this approach make sense, what's the difference with your plan?

Never mind, I'd like to refactor the diff algorithm.

// Traverse the subtree with a given node as the root.
func (ndb *nodeDB) traverseTree(hash []byte, fn func(node *Node) (bool, error)) error {
	if len(hash) == 0 {
		return nil
	}

	node, err := ndb.GetNode(hash)
	if err != nil {
		return err
	}

	stop, err := fn(node)
	if err != nil || stop {
		return err
	}

	if node.leftHash != nil {
		if err := ndb.traverseTree(node.leftHash, fn); err != nil {
			return err
		}
	}
	if node.rightHash != nil {
		if err := ndb.traverseTree(node.rightHash, fn); err != nil {
			return err
		}
	}

	return nil
}


func (ndb *nodeDB) deleteOrphans(version int64) ([][]byte, error) {
	nRoot, err := ndb.getRoot(version + 1)
	if err != nil {
		return nil, err
	}

	originalNodes := make([]*Node, 0)
	if err := ndb.traverseTree(nRoot, func(node *Node) (bool, error) {
		if node.version > version {
			return false, nil
		}
		originalNodes = append(originalNodes, node)
		return true, nil
	}); err != nil {
		return nil, err
	}

	cRoot, err := ndb.getRoot(version)
	if err != nil {
		return nil, err
	}
	
	index := 0
	if err := ndb.traverseTree(cRoot, func(node *Node) (bool, error) {
		if index < len(originalNodes) && bytes.Equal(node.hash, originalNodes[index].hash) {
			index++
			return true, nil
		}
		if err = ndb.batch.Delete(ndb.nodeKey(node.hash)); err != nil {
			return true, err
		}
		return false, nil
	}); err != nil {
		return nil, err
	}
	return orphans, err
}

Loading all orphans in memory is so expensive, and also your implementation looks complicated.

cool-develope · 2022-12-07T11:49:57Z

FYI, removing orphans will impact node key refactoring, this is why I am interested in your PR.

yihuang · 2022-12-07T14:35:09Z

Loading all orphans in memory is so expensive, and also your implementation looks complicated.

My algorithm is mainly try to skip the common subtrees early on, so we only need to traverse the branches contains actual differences.

- removed the orphan bookkeepings

yihuang · 2022-12-09T09:24:19Z

To help understanding the algorithm, i created some visualizations:

The example iavl tree, with versions 4 and 5, the nodes are labaled as $VERSION: $KEY\nheight: $HEIGHT:
example-tree.pdf

The pruning graph demonstrate the result of deleting version 4, it contains all the nodes loaded by the algorithm, the deleted nodes have dotted lines, you can notice there are two nodes coming from version 2 and 3 which are not deleted:
pruning-graph.pdf

Another one done on our production testnet db:
graph-production2.pdf

tac0turtle · 2022-12-09T12:08:55Z

curious to hear if there is a performance improvement with this over what exists?

yihuang · 2022-12-09T15:34:53Z

curious to hear if there is a performance improvement with this over what exists?

The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree.
The main advantages are:

db size reduction, don't need to store the orphan records (24% reduction in my test on our testnet rocksdb node).
main operations faster, don't need to calculate and insert the orphan records.

tac0turtle · 2022-12-09T16:15:45Z

curious to hear if there is a performance improvement with this over what exists?

The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree. The main advantages are:

db size reduction, don't need to store the orphan records (24% reduction in my test on our testnet rocksdb node).

main operations faster, don't need to calculate and insert the orphan records.

can you provide benchmarks on the slow down. I think there is some margin of slow down for a trade off that could be accepted and with fast node system it shouldn't make a difference

yihuang · 2022-12-09T16:38:28Z

curious to hear if there is a performance improvement with this over what exists?

The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree. The main advantages are:

db size reduction, don't need to store the orphan records (24% reduction in my test on our testnet rocksdb node).

main operations faster, don't need to calculate and insert the orphan records.

can you provide benchmarks on the slow down. I think there is some margin of slow down for a trade off that could be accepted and with fast node system it shouldn't make a difference

sure, I'll do them next week, the basic intuition is it depends by a lot on the node cache, if we are pruning a recent version whose orphaned nodes are still hot in the node cache, then it'll be pretty fast.

yihuang · 2022-12-13T02:20:18Z

@tac0turtle @cool-develope I'm closing this PR for following reasons:

@cool-develope 's diff algorithm in feat: remove orphans #646 is simpler, I've tried it in python-iavl, both algorithm visits the same number of nodes, but feat: remove orphans #646 is simpler to implement. Although I think there's some issues with tree visit order which I've left comments in the PR.
We can already start deleting orphan records now for archive nodes even without this PR, when needed we can always create separate offline tools to do the pruning.
we can focus the discussion in feat: remove orphans #646, about the best long term solution.

yihuang force-pushed the diff-tree branch from 881da21 to 3566366 Compare December 5, 2022 10:25

yihuang changed the title ~~delete orphan nodes by diff tree~~ feat: delete orphan nodes by diff tree Dec 5, 2022

yihuang changed the title ~~feat: delete orphan nodes by diff tree~~ feat: delete orphan nodes by traversing trees Dec 5, 2022

yihuang mentioned this pull request Dec 7, 2022

fix: pruning don't delete all orphaned nodes crypto-com/python-iavl#8

Closed

mmsqe reviewed Dec 7, 2022

View reviewed changes

yihuang marked this pull request as ready for review December 7, 2022 08:12

yihuang requested a review from a team as a code owner December 7, 2022 08:12

yihuang force-pushed the diff-tree branch from f2b7ede to 76b7155 Compare December 7, 2022 08:14

feat: delete orphan nodes by diff tree

24425e7

- removed the orphan bookkeepings

yihuang force-pushed the diff-tree branch from bc39f25 to 24425e7 Compare December 8, 2022 02:51

yihuang added a commit to yihuang/iavl that referenced this pull request Dec 8, 2022

feat: delete orphan nodes by diff tree (backport: cosmos#641)

014c98c

- removed the orphan bookkeepings

yihuang added 2 commits December 8, 2022 17:42

cleanup and save some disk hits

9129183

filter subtrees traversal with predecessor

b85400a

cool-develope mentioned this pull request Dec 8, 2022

feat: remove orphans #646

Merged

diff options

80c252e

yihuang force-pushed the diff-tree branch from d0f6ea4 to 80c252e Compare December 9, 2022 06:21

yihuang closed this Dec 13, 2022

yihuang deleted the diff-tree branch December 13, 2022 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: delete orphan nodes by traversing trees #641

feat: delete orphan nodes by traversing trees #641

yihuang commented Dec 5, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022 •

edited

Loading

yihuang commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022

cool-develope commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022

yihuang commented Dec 7, 2022 •

edited

Loading

mmsqe Dec 7, 2022

yihuang Dec 7, 2022 •

edited

Loading

mmsqe Dec 7, 2022

yihuang Dec 7, 2022

yihuang commented Dec 7, 2022 •

edited

Loading

cool-develope commented Dec 7, 2022

cool-develope commented Dec 7, 2022

yihuang commented Dec 7, 2022

yihuang commented Dec 9, 2022 •

edited

Loading

tac0turtle commented Dec 9, 2022

yihuang commented Dec 9, 2022 •

edited

Loading

tac0turtle commented Dec 9, 2022

yihuang commented Dec 9, 2022

yihuang commented Dec 13, 2022 •

edited

Loading

feat: delete orphan nodes by traversing trees #641

feat: delete orphan nodes by traversing trees #641

Conversation

yihuang commented Dec 5, 2022 • edited Loading

Consequences

Positives

Negatives

Alternative

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022 • edited Loading

cool-develope commented Dec 6, 2022 • edited Loading

yihuang commented Dec 6, 2022 • edited Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022 • edited Loading

cool-develope commented Dec 6, 2022

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022

cool-develope commented Dec 6, 2022 • edited Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022 • edited Loading

cool-develope commented Dec 6, 2022

yihuang commented Dec 6, 2022

yihuang commented Dec 7, 2022 • edited Loading

mmsqe Dec 7, 2022

Choose a reason for hiding this comment

yihuang Dec 7, 2022 • edited Loading

Choose a reason for hiding this comment

mmsqe Dec 7, 2022

Choose a reason for hiding this comment

yihuang Dec 7, 2022

Choose a reason for hiding this comment

yihuang commented Dec 7, 2022 • edited Loading

cool-develope commented Dec 7, 2022

cool-develope commented Dec 7, 2022

yihuang commented Dec 7, 2022

yihuang commented Dec 9, 2022 • edited Loading

tac0turtle commented Dec 9, 2022

yihuang commented Dec 9, 2022 • edited Loading

tac0turtle commented Dec 9, 2022

yihuang commented Dec 9, 2022

yihuang commented Dec 13, 2022 • edited Loading

yihuang commented Dec 5, 2022 •

edited

Loading

yihuang commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022 •

edited

Loading

yihuang commented Dec 6, 2022 •

edited

Loading

yihuang commented Dec 6, 2022 •

edited

Loading

cool-develope commented Dec 6, 2022 •

edited

Loading

yihuang commented Dec 6, 2022 •

edited

Loading

yihuang commented Dec 7, 2022 •

edited

Loading

yihuang Dec 7, 2022 •

edited

Loading

yihuang commented Dec 7, 2022 •

edited

Loading

yihuang commented Dec 9, 2022 •

edited

Loading

yihuang commented Dec 9, 2022 •

edited

Loading

yihuang commented Dec 13, 2022 •

edited

Loading