Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/state, trie: fix trie flush order for proper pruning #25581

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Aug 23, 2022

The trie commit rework PR made an unfortunate assumption (https://github.com/ethereum/go-ethereum/pull/25320/files#diff-8348a172e9fd3d3eb93c445d4ca58b8753b0f6d626c7af7db3b30b820d0788daR772) that the flush order doens't matter across storage tries and the account trie.

Unfortunately, this is false, because even though we do reference counting internally, the trie dirty cache drips trie nodes to disk when it's full, and the drip order is the insertion order. If the insertion order doesn't adhere to a strict child -> parent relationship, then we can end up with dangling storage trie nodes in the dirty cache, and missing subtries on disk.

As long as Geth keeps running, it is fine because the dirty cache won't just flat out drop the data, and will eventually leak it to disk. But on shutdown it will happen that the unreferenced dangling storage nodes will be dropped. Upon restart, there might be nodes missing from disk.

E.g.

  1. R -> A -> S, this is our "new trie", which we shove into the dirty cache.
  2. But because the account / storage insertion order is wrong, we insert them as A, R, S
  3. The dirty cache hits it's limit (2), so we need to flush the oldest item, A
  4. We terminate Geth, so we will flush the latest trie R, which references A, already on disk
  5. We terminate, losing S

This PR fixes the insertion order so all storage tries are flushed first and only then the account trie. It also adds a test which creates a bunch of contracts with random slots, flushes them partially to disk and terminates. If the account true is not added to the dirty cache last, the partial flush will result in data loss.

@karalabe karalabe added this to the 1.10.23 milestone Aug 23, 2022
Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@karalabe karalabe merged commit 9ed10b9 into ethereum:master Aug 23, 2022
sidhujag pushed a commit to syscoin/go-ethereum that referenced this pull request Aug 24, 2022
core/state, trie: fix trie flush order for proper pruning
@PatrickAlphaC
Copy link

Run into some weird errors rewinding with debug.setHead("0xblock-number-in-hex"), so I just ended up resyncing from scratch.

I kept getting:

Unhandled trie error: missing trie node

As it was resyncing and it spooked me so I just killed the chaindata folder.

(For anyone else upgrading from 1.10.22 -> 1.10.23)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants