-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Race condition on diffLayer #22540
Conversation
40700cc
to
70a8d2c
Compare
Could you provide some more info on how you encountered it? Do you have a stack trace? Was it during a test? Your change does two things, turning an So any more info about how you encountered this would likely clear this up for me |
@holiman Sorry - was late and I was too brief. I found it the usual way, with So
I'll try to look through tmux history for the -race stack traces |
The PR is definitely problematic because it serializes reads in the spanshots, and even keeps it locked for disk access. If the underlying issue is the // Check the bloom filter first whether there's even a point in reaching into
// all the maps in all the layers below
dl.lock.RLock()
hit := dl.diffed.Contains(storageBloomHasher{accountHash, storageHash})
if !hit {
hit = dl.diffed.Contains(destructBloomHasher(accountHash))
}
var origin *diskLayer
if !hit {
origin = dl.origin // extract origin while holding the lock
}
dl.lock.RUnlock()
// If the bloom filter misses, don't even bother with traversing the memory
// diff layers, reach straight into the bottom persistent disk layer
if origin != nil {
snapshotBloomStorageMissMeter.Mark(1)
return origin.Storage(accountHash, storageHash)
}
// The bloom filter hit, start poking in the internal maps
return dl.storage(accountHash, storageHash, 0) Would this solve the issue @fxfactorial? |
Though I guess we'd need to look through the code now, because account and whatnot accessors will use the same patterns as the faulty storage above. We definitely need the same fix in |
I think that would suffice. There are 1-2 more accesses into |
yes - i think so - I can clean up the other spots if you like as well (lmk where to look, i see you mentioned some places) |
@fxfactorial Do you want to fix this? Would be nice to get it merged. |
70a8d2c
to
671a360
Compare
@holiman force pushed - covered the AccountRLP method as well - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please remove the iterate.sh
, probably an accidental addition :)
671a360
to
71e8df6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@karalabe Please merge if you think this fix is OK. |
71e8df6
to
7e2d908
Compare
@karalabe ping - anything else for merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
Cherry pick bug fixes from upstream for snapshots, which will enable higher transaction throughput. It also enables snapshots by default (which is one of the commits pulled from upstream). Upstream commits included: 68754f3 cmd/utils: grant snapshot cache to trie if disabled (ethereum#21416) 3ee91b9 core/state/snapshot: reduce disk layer depth during generation a15d71a core/state/snapshot: stop generator if it hits missing trie nodes (ethereum#21649) 43c278c core/state: disable snapshot iteration if it's not fully constructed (ethereum#21682) b63e3c3 core: improve snapshot journal recovery (ethereum#21594) e640267 core/state/snapshot: fix journal recovery from generating old journal (ethereum#21775) 7b7b327 core/state/snapshot: update generator marker in sync with flushes 167ff56 core/state/snapshot: gethring -> gathering typo (ethereum#22104) d2e1b17 snapshot, trie: fixed typos, mostly in snapshot pkg (ethereum#22133) c4deebb core/state/snapshot: add generation logs to storage too 5e9f5ca core/state/snapshot: write snapshot generator in batch (ethereum#22163) 18145ad core/state: maintain one more diff layer (ethereum#21730) 04a7226 snapshot: merge loops for better performance (ethereum#22160) 994cdc6 cmd/utils: enable snapshots by default 9ec3329 core/state/snapshot: ensure Cap retains a min number of layers 52e5c38 core/state: copy the snap when copying the state (ethereum#22340) a31f6d5 core/state/snapshot: fix panic on missing parent 61ff3e8 core/state/snapshot, ethdb: track deletions more accurately (ethereum#22582) c79fc20 core/state/snapshot: fix data race in diff layer (ethereum#22540) Other changes Commit f9b5530 (not from upstream) fixes an incorrect default DatabaseCache value due to an earlier bad merge. Tested Automated tests Testing on a private testnet Backwards compatibility Enabling snapshots by default is a breaking change in terms of the CLI flags, but will not cause backwards incompatibility between the node and other nodes. Co-authored-by: Péter Szilágyi <peterke@gmail.com> Co-authored-by: gary rong <garyrong0905@gmail.com> Co-authored-by: Melvin Junhee Woo <melvin.woo@groundx.xyz> Co-authored-by: Martin Holst Swende <martin@swende.se> Co-authored-by: Edgar Aroutiounian <edgar.factorial@gmail.com>
I encountered this race condition and it happens at difflayer.go:223 wrt
dl.origin = origin
but .Storage usesdl.origin
atreturn dl.origin.Storage(accountHash, storageHash)
in difflayer.go.