-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth/protocols/snap: snap sync testing #22179
Conversation
I think this shows two errors, or, maybe let's call them "discussion points".
|
a90a015
to
31c5d21
Compare
I pushed another fix, now there's only one remaining stall, in |
@karalabe in 01e4846 , I added revertals to storage requests when they fail due to bad proofs. The testcases in this PR hit both those clauses individually, hence why they were added. However, the same scenario applies for all types of requests, code, trie etc. Should we add revertals for all other cases when a response is invalid? And if so, maybe we should do it in a more generic fashion, instead of adding them one by one? |
Another question. We do it in
We do not do it in
|
eth/protocols/snap/sync.go
Outdated
size := common.StorageSize(len(hashes) * common.HashLength) | ||
for _, account := range accounts { | ||
size += common.StorageSize(len(account)) | ||
} | ||
for _, node := range proof { | ||
size += common.StorageSize(len(node)) | ||
} | ||
logger := peer.logger.New("reqid", id) | ||
logger := peer.Log().New("reqid", peer.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, pls revert. We need the request id (in id
), not the peer's id.
f26251b
to
f85a106
Compare
Last remaining blocker:
.. For now. And the question about the bloom filter above |
@rjl493456442 do you have any good ideas on what the correct fix for the prover would be? The case is basically a very very small trie, consisting of a shortnode as root, iiuc. |
@@ -622,6 +629,7 @@ func (s *Syncer) loadSyncStatus() { | |||
log.Debug("Scheduled account sync task", "from", task.Next, "last", task.Last) | |||
} | |||
s.tasks = progress.Tasks | |||
s.snapped = len(s.tasks) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a dumb comment, but everywhere else we need to hold a lock for accessing s
especially s.snapped
. The only instance I can find where loadSyncStatus()
is called is in Line 528, but we don't hold the lock there anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a bit funky. loadSyncStatus
touches a lot of internals in an unsafe way, and right after, the code accesses s.tasks
.
s.loadSyncStatus()
if len(s.tasks) == 0 && s.healer.scheduler.Pending() == 0 {
So probably that lock-holding should be extended to cover that portion too.
I don't think it's a real issue, however, because I don't think we enter there in a concurrent way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK loadSyncStatus is only called when you create the thing ,so there's no concurrency at that point.
0c3df19
to
be0ab3b
Compare
Rebased on top of @rjl493456442 's panic fix |
caf1e3c
to
88dc6ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, although would be nice to look into how we could speed it up, .Currently it's enormously slow, even with parallel execution.
* eth/protocols/snap: make timeout configurable * eth/protocols/snap: snap sync testing * eth/protocols/snap: test to trigger panic * eth/protocols/snap: fix race condition on timeouts * eth/protocols/snap: return error on cancelled sync * squashme: updates + test causing panic + properly serve accounts in order * eth/protocols/snap: revert failing storage response * eth/protocols/snap: revert on bad responses (storage, code) * eth/protocols/snap: fix account handling stall * eth/protocols/snap: fix remaining revertal-issues * eth/protocols/snap: timeouthandler for bytecode requests * eth/protocols/snap: debugging + fix log message * eth/protocols/snap: fix misspelliings in docs * eth/protocols/snap: fix race in bytecode handling * eth/protocols/snap: undo deduplication of storage roots * synctests: refactor + minify panic testcase * eth/protocols/snap: minor polishes * eth: minor polishes to make logs more useful * eth/protocols/snap: remove excessive logs from the test runs * eth/protocols/snap: stress tests with concurrency * eth/protocols/snap: further fixes to test cancel channel handling * eth/protocols/snap: extend test timeouts on CI Co-authored-by: Péter Szilágyi <peterke@gmail.com>
This is a work in progress - the tests now just hang, which isn't ideal. However, the tests do enable some fairly deep testing of the snap protocol, and the reason I rebased it again and am working on it, is to see if it can be used to trigger #22172.
Also, in general, I think it would be good to have these kinds of fairly high-level protocol tests.