Snapshot restoration overhaul #11219

dvdplm · 2019-10-29T23:18:27Z

This PR does several things and is quite noisy.
The general thrust is to make snapshot syncing easier to understand and debug and in the meanwhile makes a few tweaks to better adapt to the current harsh snapshotting environment.

Separate snapshot sync logging from general sync, -l snapshot_sync and improve logging messages in general.
Add documentation and comments where missing/lacking.
Ensure we pick the snapshot with the highest blocknumber whenever there's a tie.

Selecting a peer to sync a snapshot from gives the preference to snapshots that are seeded by 3 or more peers. On todays ethereum it is rare that nodes have snapshots from the same point in the chain. This means we always have a single peer to download from. Before this PR we did not sort the peers based on how recent their snapshot is when there's a tie (i.e. a single peer has a snapshot). Fixed here.
Do not skip snapshots further away than 30k block from the highest block seen.
Currently we look for peers that seed snapshots that are close to the highest block seen on the network (where "close" means within 30k blocks of the tip). When a node starts up we wait for some time (5sec, increased here to 10sec) to let peers connect. If we find a suitable peer to sync a snapshot from at the end of that delay, we start the download; if none is found we start slow-syncing (or, if --warp-barrier is used we stall).
Before this PR, when looking for a suitable snapshot we use the highest block seen on the network to check if a peer has a snapshot that is within 30k blocks of that highest block number. This means that in a situation where all available snapshots are old, we will often fail to start a snapshot at all. What's worse is that the longer we delay starting a snapshot sync (to let more peers connect, in the hope of finding a good snapshot), the more likely we are to have seen a high block and thus the less likely we become to accept a snapshot. This behaviour is quite surprising and tricky to reason about.
Here this highest-seen blocknumber comparison is removed and we switch to the simpler: pick the best suitable snapshot we find in 10sec.
A small but possibly controversial change: before this PR when a remote peers sends us a snapshot chunk we've already seen, we disconnect the peer and stop the download. After this PR we ignore the chunk and keep going instead (change). I do not think this is problematic but I have never seen a dupe happen so can't really be sure.
Don't warp-sync twice

With the highest-seen blocknumber criteria gone, a node can end up using a snapshot that is a fair bit away from the tip, which means that after completion it needs to slow-sync for quite a bit to catch up to the tip. Before this change it can happen that a restored node finds a new snapshot and starts warp syncing again; this PR adds a check to see if our own best block is higher than the --warp-barrier and if so, does not start another sync.
Avoid excessive looping over Vecs.

Use a HashSet instead of a Vec to store the pending chunks. Remove items from the set as chunks are processed. Calculate and store the total number of chunks in the Snapshot struct instead of counting pending chunks each time.

Use `snapshot_sync` as logging target

* master: Type annotation for next_key() matching of json filter options (#11192) Crypto primitives removed from ethkey (#11174)

More docs, comments, todos

…try to fix snapshot startup problems Docs, todos, comments

* master: [informant]: `MillisecondDuration` -> `as_millis()` (#11211) Step duration map configuration parameter ported from the POA Network fork (#10902) Upgrade jsonrpc to latest (#11206) [export hardcoded sync]: use debug for `H256` (#11204) Pause pruning while snapshotting (#11178)

Log client version on peer connect

…ock seen Currently we look for peers that seed snapshots that are close to the highest block seen on the network (where "close" means withing 30k blocks). When a node starts up we wait for some time (5sec, increased here to 10sec) to let peers connect and if we have found a suitable peer to sync a snapshot from at the end of that delay, we start the download; if none is found and --warp-barrier is used we stall, otherwise we start a slow-sync. When looking for a suitable snapshot, we use the highest block seen on the network to check if a peer has a snapshot that is within 30k blocks of that highest block number. This means that in a situation where all available snapshots are older than that, we will often fail to start a snapshot at all. What's worse is that the longer we delay starting a snapshot sync (to let more peers connect, in the hope of finding a good snapshot), the more likely we are to have seen a high block and thus the more likely we become to accept a snapshot. This commit removes this comparison with the highest blocknumber criteria entirely and picks the best snapshot we find in 10sec.

…f they happen to send us a duplicate chunk (just ignore the chunk and keep going) Resolve some documentation todos, add more

Check if our own block is beyond the given warp barrier (can happen after we've completed a warp sync but are not quite yet synced up to the tip) and if so, don't sync. More docs, resolve todos. Dial down some `sync` debug level logging to trace

… work item Use a HashSet instead of a Vec and remove items from the set as chunks are processed. Calculate and store the total number of chunks in the `Snapshot` struct instead of counting pending chunks each time.

Cargo.toml

ethcore/sync/src/chain/handler.rs

ethcore/snapshot/src/service.rs

ethcore/sync/src/chain/handler.rs

ethcore/sync/src/snapshot_sync.rs

ethcore/sync/src/sync_io.rs

ngotchac · 2019-10-30T13:17:26Z

I left a couple of comments above. One thing I didn't mention is the change to accept far-away snapshots. I'm not quite convinced this would be a good idea: as a user, if I warp-sync my node I would expect to be able to use it a few minutes after (syncing to the top of the chain). Without any limits, I could actually spend quite some time syncing a snapshot and then a couple of days to the top of the chain. And if there's a new and better snapshot while I'm syncing the old one, I'm basically screwed.

dvdplm · 2019-10-30T14:32:33Z

the change to accept far-away snapshots.

Yeah, this is the most important change here. Hopefully they will not be "far far away". The idea is: use the best possible snapshot we can find.

as a user, if I warp-sync my node I would expect to be able to use it a few minutes after

That would be nice, but that is simply not the reality today. Obviously we can hope that after the Pause pruning while snapshotting PR things will improve, but I just don't think we should expect a return to the good old days where many nodes produce snapshots every 5k blocks. Today this is not the case.

I also think that the previous logic was a bit wonky, where the choice of peer to download from was highly dependent on seeing a high block on the chain. This led to a lot of head scratching wondering why a decent snapshot was rejected all of a sudden. I do not want to revert to that.

And if there's a new and better snapshot while I'm syncing the old one, I'm basically screwed.

Why do you think that? You will not see the new better snapshot for sure (this was the case before this PR too) but once you've restored the one you were working on you could always restart the node with a higher --warp-barrier and do it again. Not saying that's good obviously, but it's not terrible either. Note that users don't really have a way to know what snapshots are out there so they don't really know what to expect (and I think we should consider adding a parity command or an oracle that give this information).

ethcore/types/src/snapshot.rs

ethcore/sync/src/chain/mod.rs

ordian · 2019-10-30T11:13:25Z

ethcore/snapshot/src/service.rs

@@ -676,7 +676,6 @@ impl<C> Service<C> where C: SnapshotClient + ChainInfo {
 		} else if manifest.state_hashes.contains(&hash) {
 			true
 		} else {
-			warn!(target: "snapshot", "Hash of the content of {:?} not present in the manifest block/state hashes.", path);


why removed?

Because it's not an abnormal condition, it's expected when we start syncing.

To elaborate on that: when we see that there was a partial restoration going on before us, we try to re-use the chunks we already have on disk. This warning looks scary to users but it's not really something we need to worry about I think: it just means we can't use the chunk because we're now working on a different snapshot.
I can put it back as a trace! perhaps?

Is that a yes? put the log back as a trace!?

either way is fine :)

ethcore/snapshot/src/lib.rs

ordian · 2019-10-30T11:29:08Z

ethcore/sync/src/chain/mod.rs

 const HEADERS_TIMEOUT: Duration = Duration::from_secs(15);
 const BODIES_TIMEOUT: Duration = Duration::from_secs(20);
 const RECEIPTS_TIMEOUT: Duration = Duration::from_secs(10);
 const FORK_HEADER_TIMEOUT: Duration = Duration::from_secs(3);
-const SNAPSHOT_MANIFEST_TIMEOUT: Duration = Duration::from_secs(5);
+/// Max time to wait for a peer to send us a snapshot manifest.
+const SNAPSHOT_MANIFEST_TIMEOUT: Duration = Duration::from_secs(15);


what is an average manifest size? why the timeout is tripled?

The manifest files are fairly small, ~100-150kb. It's an rlp-encoded ManifestData, containing the hashes of the block/state chunks you see on disk in your snapshot/ folder.
Why tripled? I simply wanted to give all possible snapshotting peers a decent chance to be considered.

I think you're mistaken: this is a timeout for the Snapshot Manifest packet to arrive from a peer after it's being asked.

It kind of boils down to the same thing though right?

a manifest request is sent to a peer and the asking is set to SnapshotManifest

every now and then we check if they are still in PeerAsking::SnapshotManifest

when the peer responds with the manifest rlp we reset the asking

…unless of course the timeout is triggered before they respond, in which case they are disconnected

Did I get that right?

Either way I updated the doc string, not sure why it's not showing up here.

I thought you were mistaken with WAIT_PEERS_TIMEOUT, which is used to wait and see if there are more than 3 peers with the same snapshot, and go ahead anyway after the timeout.
The SNAPSHOT_MANIFEST_TIMEOUT is just a timeout on the request, which in most of the case will be fulfilled before this timeout anyway, since the manifest is quite small. So increase it shouldn't change the behavior at all.

ethcore/sync/src/snapshot_sync.rs

Cargo.toml

ordian · 2019-10-30T16:10:52Z

ethcore/sync/src/snapshot_sync.rs

-		trace!(target: "sync", "Ignored unknown chunk: {:x}", hash);
-		Err(())
+
+		self.pending_block_chunks.take(&hash)


this removes hash from the pending_block_chunks, it wasn't present before

Right, this is what I tried to describe under "Avoid excessive looping over Vecs. ". Before, the pending_*_chunks were Vecs, initialized once and not mutated during the restoration and used to look up hashes (like here) and to calculate the total number of chunks. I refactored it to use HashSet and remove chunks as we go. Here it also saves a clone().

ngotchac · 2019-10-31T14:04:30Z

ethcore/sync/src/snapshot_sync.rs

+	/// memory usage down.
+	// See https://github.com/paritytech/parity-common/issues/255
+	#[ignore_malloc_size_of = "no impl for IndexSet (yet)"]
+	pending_block_chunks: IndexSet<H256>,


I'm not sure it's worth the extra dependency on indexmap. The only benefit is removing the chunk for the set when it's done.

ngotchac

Look good, I'm just still not convinced changing the snapshot manifest timeout from 5s to 15s is necessary though.

dvdplm · 2019-10-31T14:50:49Z

@ngotchac reverted to 5 sec. :)

dvdplm · 2019-10-31T15:12:35Z

@s3krit this one may be tricky to backport, just a heads up.

* master: (21 commits) ropsten #6631425 foundation #8798209 (#11201) Update list of bootnodes for xDai chain (#11236) ethcore/res: add mordor testnet configuration (#11200) [chain specs]: activate `Istanbul` on mainnet (#11228) [builtin]: support `multiple prices and activations` in chain spec (#11039) Insert explicit warning into the panic hook (#11225) Snapshot restoration overhaul (#11219) Fix docker centos build (#11226) retry on gitlab system failures (#11222) Update bootnodes. (#11203) Use provided usd-per-eth value if an endpoint is specified (#11209) Use a lock instead of atomics for snapshot Progress (#11197) [informant]: `MillisecondDuration` -> `as_millis()` (#11211) Step duration map configuration parameter ported from the POA Network fork (#10902) Upgrade jsonrpc to latest (#11206) [export hardcoded sync]: use debug for `H256` (#11204) Pause pruning while snapshotting (#11178) Type annotation for next_key() matching of json filter options (#11192) Crypto primitives removed from ethkey (#11174) Made ecrecover implementation trait public (#11188) ...

* Comments and todos Use `snapshot_sync` as logging target * fix compilation * More todos, more logs * Fix picking snapshot peer: prefer the one with the highest block number More docs, comments, todos * Adjust WAIT_PEERS_TIMEOUT to be a multiple of MAINTAIN_SYNC_TIMER to try to fix snapshot startup problems Docs, todos, comments * Tabs * Formatting * Don't build new rlp::EMPTY_LIST_RLP instances * Dial down debug logging * Don't warn about missing hashes in the manifest: it's normal Log client version on peer connect * Cleanup * Do not skip snapshots further away than 30k block from the highest block seen Currently we look for peers that seed snapshots that are close to the highest block seen on the network (where "close" means withing 30k blocks). When a node starts up we wait for some time (5sec, increased here to 10sec) to let peers connect and if we have found a suitable peer to sync a snapshot from at the end of that delay, we start the download; if none is found and --warp-barrier is used we stall, otherwise we start a slow-sync. When looking for a suitable snapshot, we use the highest block seen on the network to check if a peer has a snapshot that is within 30k blocks of that highest block number. This means that in a situation where all available snapshots are older than that, we will often fail to start a snapshot at all. What's worse is that the longer we delay starting a snapshot sync (to let more peers connect, in the hope of finding a good snapshot), the more likely we are to have seen a high block and thus the more likely we become to accept a snapshot. This commit removes this comparison with the highest blocknumber criteria entirely and picks the best snapshot we find in 10sec. * lockfile * Add a `ChunkType::Dupe` variant so that we do not disconnect a peer if they happen to send us a duplicate chunk (just ignore the chunk and keep going) Resolve some documentation todos, add more * tweak log message * Don't warp sync twice Check if our own block is beyond the given warp barrier (can happen after we've completed a warp sync but are not quite yet synced up to the tip) and if so, don't sync. More docs, resolve todos. Dial down some `sync` debug level logging to trace * Avoid iterating over all snapshot block/state hashes to find the next work item Use a HashSet instead of a Vec and remove items from the set as chunks are processed. Calculate and store the total number of chunks in the `Snapshot` struct instead of counting pending chunks each time. * Address review grumbles * Log correct number of bytes written to disk * Revert ChunkType::Dup change * whitespace grumble * Cleanup debugging code * Fix docs * Fix import and a typo * Fix test impl * Use `indexmap::IndexSet` to ensure chunk hashes are accessed in order * Revert increased SNAPSHOT_MANIFEST_TIMEOUT: 5sec should be enough

* [CI] check evmbin build (#11096) * Correct EIP-712 encoding (#11092) * [client]: Fix for incorrectly dropped consensus messages (#11082) (#11086) * Update hardcoded headers (foundation, classic, kovan, xdai, ewc, ...) (#11053) * Add cargo-remote dir to .gitignore (?) * Update light client headers: ropsten 6631425 foundation 8798209 (#11201) * Update list of bootnodes for xDai chain (#11236) * ethcore/res: add mordor testnet configuration (#11200) * [chain specs]: activate Istanbul on mainnet (#11228) * [builtin]: support multiple prices and activations in chain spec (#11039) * [receipt]: add sender & receiver to RichReceipts (#11179) * [ethcore/builtin]: do not panic in blake2pricer on short input (#11180) * Made ecrecover implementation trait public (#11188) * Fix docker centos build (#11226) * Update MIX bootnodes. (#11203) * Insert explicit warning into the panic hook (#11225) * Use provided usd-per-eth value if an endpoint is specified (#11209) * Cleanup stratum a bit (#11161) * Add Constantinople EIPs to the dev (instant_seal) config (#10809) (already backported) * util Host: fix a double Read Lock bug in fn Host::session_readable() (#11175) * ethcore client: fix a double Read Lock bug in fn Client::logs() (#11172) * Type annotation for next_key() matching of json filter options (#11192) * Upgrade jsonrpc to latest (#11206) * [dependencies]: jsonrpc 14.0.1 (#11183) * Upgrade to jsonrpc v14 (#11151) * Switching sccache from local to Redis (#10971) * Snapshot restoration overhaul (#11219) * Add new line after writing block to hex file. (#10984) * Pause pruning while snapshotting (#11178) * Change how RPCs eth_call and eth_estimateGas handle "Pending" (#11127) * Fix block detail updating (#11015) * Make InstantSeal Instant again #11186 * Filter out some bad ropsten warp snapshots (#11247)

dvdplm added 19 commits October 23, 2019 15:28

Comments and todos

a2a3257

Use `snapshot_sync` as logging target

Merge branch 'master' into dp/chore/ensure-we-ignore-old-snapshots

c543a46

* master: Type annotation for next_key() matching of json filter options (#11192) Crypto primitives removed from ethkey (#11174)

fix compilation

86f0450

More todos, more logs

e250df2

Fix picking snapshot peer: prefer the one with the highest block number

5655285

More docs, comments, todos

Adjust WAIT_PEERS_TIMEOUT to be a multiple of MAINTAIN_SYNC_TIMER to …

9e79bc0

…try to fix snapshot startup problems Docs, todos, comments

Tabs

ec6e4a5

Formatting

74ce5e8

Don't build new rlp::EMPTY_LIST_RLP instances

241675f

Dial down debug logging

b8a1504

Don't warn about missing hashes in the manifest: it's normal

956e237

Log client version on peer connect

Cleanup

f7c7a8a

lockfile

34cef68

Add a ChunkType::Dupe variant so that we do not disconnect a peer i…

fff43cc

…f they happen to send us a duplicate chunk (just ignore the chunk and keep going) Resolve some documentation todos, add more

tweak log message

f4ccf20

Don't warp sync twice

decb18c

Check if our own block is beyond the given warp barrier (can happen after we've completed a warp sync but are not quite yet synced up to the tip) and if so, don't sync. More docs, resolve todos. Dial down some `sync` debug level logging to trace

dvdplm self-assigned this Oct 29, 2019

dvdplm added the A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. label Oct 29, 2019

dvdplm marked this pull request as ready for review October 30, 2019 10:30

dvdplm requested a review from ngotchac October 30, 2019 10:30

dvdplm added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Oct 30, 2019

dvdplm requested a review from ordian October 30, 2019 10:30

dvdplm added B0-patch-stable 🕷 Pull request should also be back-ported to the stable branch. B1-patch-beta 🕷🕷 M4-core ⛓ Core client code / Rust. labels Oct 30, 2019

ngotchac reviewed Oct 30, 2019

View reviewed changes

Address review grumbles

ebab778

ordian reviewed Oct 30, 2019

View reviewed changes

dvdplm added 4 commits October 30, 2019 16:43

Log correct number of bytes written to disk

2e75fb4

Revert ChunkType::Dup change

5970b80

whitespace grumble

100ca6b

Cleanup debugging code

bd9fc20

ordian added this to the 2.7 milestone Oct 30, 2019

Fix docs

29a26c9

ordian reviewed Oct 30, 2019

View reviewed changes

ordian approved these changes Oct 30, 2019

View reviewed changes

dvdplm added 3 commits October 30, 2019 17:47

Fix import and a typo

ac4cd10

Fix test impl

81d46cb

Use indexmap::IndexSet to ensure chunk hashes are accessed in order

d2c347a

ngotchac reviewed Oct 31, 2019

View reviewed changes

ngotchac approved these changes Oct 31, 2019

View reviewed changes

Revert increased SNAPSHOT_MANIFEST_TIMEOUT: 5sec should be enough

1ebd119

ngotchac added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Oct 31, 2019

dvdplm merged commit 8c2199d into master Oct 31, 2019

ordian deleted the dp/chore/ensure-we-ignore-old-snapshots branch October 31, 2019 15:25

This was referenced Nov 5, 2019

v2.5.10 stable #11239

Merged

v2.6.5 beta #11240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshot restoration overhaul #11219

Snapshot restoration overhaul #11219

dvdplm commented Oct 29, 2019 •

edited

Loading

ngotchac commented Oct 30, 2019

dvdplm commented Oct 30, 2019

ordian Oct 30, 2019

dvdplm Oct 30, 2019

dvdplm Oct 30, 2019

dvdplm Oct 30, 2019

ordian Oct 30, 2019

ordian Oct 30, 2019

dvdplm Oct 30, 2019 •

edited

Loading

ngotchac Oct 30, 2019

dvdplm Oct 30, 2019

ngotchac Oct 30, 2019

ordian Oct 30, 2019

dvdplm Oct 30, 2019 •

edited

Loading

ngotchac Oct 31, 2019

ngotchac left a comment

dvdplm commented Oct 31, 2019

dvdplm commented Oct 31, 2019

Snapshot restoration overhaul #11219

Snapshot restoration overhaul #11219

Conversation

dvdplm commented Oct 29, 2019 • edited Loading

ngotchac commented Oct 30, 2019

dvdplm commented Oct 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvdplm Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvdplm Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngotchac left a comment

Choose a reason for hiding this comment

dvdplm commented Oct 31, 2019

dvdplm commented Oct 31, 2019

dvdplm commented Oct 29, 2019 •

edited

Loading

dvdplm Oct 30, 2019 •

edited

Loading

dvdplm Oct 30, 2019 •

edited

Loading