-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
Use `snapshot_sync` as logging target
More docs, comments, todos
…try to fix snapshot startup problems Docs, todos, comments
Log client version on peer connect
…ock seen Currently we look for peers that seed snapshots that are close to the highest block seen on the network (where "close" means withing 30k blocks). When a node starts up we wait for some time (5sec, increased here to 10sec) to let peers connect and if we have found a suitable peer to sync a snapshot from at the end of that delay, we start the download; if none is found and --warp-barrier is used we stall, otherwise we start a slow-sync. When looking for a suitable snapshot, we use the highest block seen on the network to check if a peer has a snapshot that is within 30k blocks of that highest block number. This means that in a situation where all available snapshots are older than that, we will often fail to start a snapshot at all. What's worse is that the longer we delay starting a snapshot sync (to let more peers connect, in the hope of finding a good snapshot), the more likely we are to have seen a high block and thus the more likely we become to accept a snapshot. This commit removes this comparison with the highest blocknumber criteria entirely and picks the best snapshot we find in 10sec.
…f they happen to send us a duplicate chunk (just ignore the chunk and keep going) Resolve some documentation todos, add more
Check if our own block is beyond the given warp barrier (can happen after we've completed a warp sync but are not quite yet synced up to the tip) and if so, don't sync. More docs, resolve todos. Dial down some `sync` debug level logging to trace
… work item Use a HashSet instead of a Vec and remove items from the set as chunks are processed. Calculate and store the total number of chunks in the `Snapshot` struct instead of counting pending chunks each time.
I left a couple of comments above. One thing I didn't mention is the change to accept far-away snapshots. I'm not quite convinced this would be a good idea: as a user, if I warp-sync my node I would expect to be able to use it a few minutes after (syncing to the top of the chain). Without any limits, I could actually spend quite some time syncing a snapshot and then a couple of days to the top of the chain. And if there's a new and better snapshot while I'm syncing the old one, I'm basically screwed. |
Yeah, this is the most important change here. Hopefully they will not be "far far away". The idea is: use the best possible snapshot we can find.
That would be nice, but that is simply not the reality today. Obviously we can hope that after the Pause pruning while snapshotting PR things will improve, but I just don't think we should expect a return to the good old days where many nodes produce snapshots every 5k blocks. Today this is not the case. I also think that the previous logic was a bit wonky, where the choice of peer to download from was highly dependent on seeing a high block on the chain. This led to a lot of head scratching wondering why a decent snapshot was rejected all of a sudden. I do not want to revert to that.
Why do you think that? You will not see the new better snapshot for sure (this was the case before this PR too) but once you've restored the one you were working on you could always restart the node with a higher |
@@ -676,7 +676,6 @@ impl<C> Service<C> where C: SnapshotClient + ChainInfo { | |||
} else if manifest.state_hashes.contains(&hash) { | |||
true | |||
} else { | |||
warn!(target: "snapshot", "Hash of the content of {:?} not present in the manifest block/state hashes.", path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's not an abnormal condition, it's expected when we start syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate on that: when we see that there was a partial restoration going on before us, we try to re-use the chunks we already have on disk. This warning looks scary to users but it's not really something we need to worry about I think: it just means we can't use the chunk because we're now working on a different snapshot.
I can put it back as a trace!
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a yes? put the log back as a trace!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way is fine :)
ethcore/sync/src/chain/mod.rs
Outdated
const HEADERS_TIMEOUT: Duration = Duration::from_secs(15); | ||
const BODIES_TIMEOUT: Duration = Duration::from_secs(20); | ||
const RECEIPTS_TIMEOUT: Duration = Duration::from_secs(10); | ||
const FORK_HEADER_TIMEOUT: Duration = Duration::from_secs(3); | ||
const SNAPSHOT_MANIFEST_TIMEOUT: Duration = Duration::from_secs(5); | ||
/// Max time to wait for a peer to send us a snapshot manifest. | ||
const SNAPSHOT_MANIFEST_TIMEOUT: Duration = Duration::from_secs(15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is an average manifest size? why the timeout is tripled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifest files are fairly small, ~100-150kb. It's an rlp-encoded ManifestData
, containing the hashes of the block/state chunks you see on disk in your snapshot/
folder.
Why tripled? I simply wanted to give all possible snapshotting peers a decent chance to be considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're mistaken: this is a timeout for the Snapshot Manifest packet to arrive from a peer after it's being asked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kind of boils down to the same thing though right?
- a manifest request is sent to a peer and the
asking
is set toSnapshotManifest
- every now and then we check if they are still in
PeerAsking::SnapshotManifest
- when the peer responds with the manifest rlp we reset the
asking
- …unless of course the timeout is triggered before they respond, in which case they are disconnected
Did I get that right?
Either way I updated the doc string, not sure why it's not showing up here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you were mistaken with WAIT_PEERS_TIMEOUT
, which is used to wait and see if there are more than 3 peers with the same snapshot, and go ahead anyway after the timeout.
The SNAPSHOT_MANIFEST_TIMEOUT
is just a timeout on the request, which in most of the case will be fulfilled before this timeout anyway, since the manifest is quite small. So increase it shouldn't change the behavior at all.
trace!(target: "sync", "Ignored unknown chunk: {:x}", hash); | ||
Err(()) | ||
|
||
self.pending_block_chunks.take(&hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this removes hash
from the pending_block_chunks
, it wasn't present before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is what I tried to describe under "Avoid excessive looping over Vec
s. ". Before, the pending_*_chunks
were Vec
s, initialized once and not mutated during the restoration and used to look up hashes (like here) and to calculate the total number of chunks. I refactored it to use HashSet
and remove chunks as we go. Here it also saves a clone()
.
/// memory usage down. | ||
// See https://github.com/paritytech/parity-common/issues/255 | ||
#[ignore_malloc_size_of = "no impl for IndexSet (yet)"] | ||
pending_block_chunks: IndexSet<H256>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth the extra dependency on indexmap
. The only benefit is removing the chunk for the set when it's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good, I'm just still not convinced changing the snapshot manifest timeout from 5s to 15s is necessary though.
@ngotchac reverted to 5 sec. :) |
@s3krit this one may be tricky to backport, just a heads up. |
* master: (21 commits) ropsten #6631425 foundation #8798209 (#11201) Update list of bootnodes for xDai chain (#11236) ethcore/res: add mordor testnet configuration (#11200) [chain specs]: activate `Istanbul` on mainnet (#11228) [builtin]: support `multiple prices and activations` in chain spec (#11039) Insert explicit warning into the panic hook (#11225) Snapshot restoration overhaul (#11219) Fix docker centos build (#11226) retry on gitlab system failures (#11222) Update bootnodes. (#11203) Use provided usd-per-eth value if an endpoint is specified (#11209) Use a lock instead of atomics for snapshot Progress (#11197) [informant]: `MillisecondDuration` -> `as_millis()` (#11211) Step duration map configuration parameter ported from the POA Network fork (#10902) Upgrade jsonrpc to latest (#11206) [export hardcoded sync]: use debug for `H256` (#11204) Pause pruning while snapshotting (#11178) Type annotation for next_key() matching of json filter options (#11192) Crypto primitives removed from ethkey (#11174) Made ecrecover implementation trait public (#11188) ...
* Comments and todos Use `snapshot_sync` as logging target * fix compilation * More todos, more logs * Fix picking snapshot peer: prefer the one with the highest block number More docs, comments, todos * Adjust WAIT_PEERS_TIMEOUT to be a multiple of MAINTAIN_SYNC_TIMER to try to fix snapshot startup problems Docs, todos, comments * Tabs * Formatting * Don't build new rlp::EMPTY_LIST_RLP instances * Dial down debug logging * Don't warn about missing hashes in the manifest: it's normal Log client version on peer connect * Cleanup * Do not skip snapshots further away than 30k block from the highest block seen Currently we look for peers that seed snapshots that are close to the highest block seen on the network (where "close" means withing 30k blocks). When a node starts up we wait for some time (5sec, increased here to 10sec) to let peers connect and if we have found a suitable peer to sync a snapshot from at the end of that delay, we start the download; if none is found and --warp-barrier is used we stall, otherwise we start a slow-sync. When looking for a suitable snapshot, we use the highest block seen on the network to check if a peer has a snapshot that is within 30k blocks of that highest block number. This means that in a situation where all available snapshots are older than that, we will often fail to start a snapshot at all. What's worse is that the longer we delay starting a snapshot sync (to let more peers connect, in the hope of finding a good snapshot), the more likely we are to have seen a high block and thus the more likely we become to accept a snapshot. This commit removes this comparison with the highest blocknumber criteria entirely and picks the best snapshot we find in 10sec. * lockfile * Add a `ChunkType::Dupe` variant so that we do not disconnect a peer if they happen to send us a duplicate chunk (just ignore the chunk and keep going) Resolve some documentation todos, add more * tweak log message * Don't warp sync twice Check if our own block is beyond the given warp barrier (can happen after we've completed a warp sync but are not quite yet synced up to the tip) and if so, don't sync. More docs, resolve todos. Dial down some `sync` debug level logging to trace * Avoid iterating over all snapshot block/state hashes to find the next work item Use a HashSet instead of a Vec and remove items from the set as chunks are processed. Calculate and store the total number of chunks in the `Snapshot` struct instead of counting pending chunks each time. * Address review grumbles * Log correct number of bytes written to disk * Revert ChunkType::Dup change * whitespace grumble * Cleanup debugging code * Fix docs * Fix import and a typo * Fix test impl * Use `indexmap::IndexSet` to ensure chunk hashes are accessed in order * Revert increased SNAPSHOT_MANIFEST_TIMEOUT: 5sec should be enough
* [CI] check evmbin build (#11096) * Correct EIP-712 encoding (#11092) * [client]: Fix for incorrectly dropped consensus messages (#11082) (#11086) * Update hardcoded headers (foundation, classic, kovan, xdai, ewc, ...) (#11053) * Add cargo-remote dir to .gitignore (?) * Update light client headers: ropsten 6631425 foundation 8798209 (#11201) * Update list of bootnodes for xDai chain (#11236) * ethcore/res: add mordor testnet configuration (#11200) * [chain specs]: activate Istanbul on mainnet (#11228) * [builtin]: support multiple prices and activations in chain spec (#11039) * [receipt]: add sender & receiver to RichReceipts (#11179) * [ethcore/builtin]: do not panic in blake2pricer on short input (#11180) * Made ecrecover implementation trait public (#11188) * Fix docker centos build (#11226) * Update MIX bootnodes. (#11203) * Insert explicit warning into the panic hook (#11225) * Use provided usd-per-eth value if an endpoint is specified (#11209) * Cleanup stratum a bit (#11161) * Add Constantinople EIPs to the dev (instant_seal) config (#10809) (already backported) * util Host: fix a double Read Lock bug in fn Host::session_readable() (#11175) * ethcore client: fix a double Read Lock bug in fn Client::logs() (#11172) * Type annotation for next_key() matching of json filter options (#11192) * Upgrade jsonrpc to latest (#11206) * [dependencies]: jsonrpc 14.0.1 (#11183) * Upgrade to jsonrpc v14 (#11151) * Switching sccache from local to Redis (#10971) * Snapshot restoration overhaul (#11219) * Add new line after writing block to hex file. (#10984) * Pause pruning while snapshotting (#11178) * Change how RPCs eth_call and eth_estimateGas handle "Pending" (#11127) * Fix block detail updating (#11015) * Make InstantSeal Instant again #11186 * Filter out some bad ropsten warp snapshots (#11247)
This PR does several things and is quite noisy.
The general thrust is to make snapshot syncing easier to understand and debug and in the meanwhile makes a few tweaks to better adapt to the current harsh snapshotting environment.
Separate snapshot sync logging from general sync,
-l snapshot_sync
and improve logging messages in general.Add documentation and comments where missing/lacking.
Ensure we pick the snapshot with the highest blocknumber whenever there's a tie.
Selecting a peer to sync a snapshot from gives the preference to snapshots that are seeded by 3 or more peers. On todays ethereum it is rare that nodes have snapshots from the same point in the chain. This means we always have a single peer to download from. Before this PR we did not sort the peers based on how recent their snapshot is when there's a tie (i.e. a single peer has a snapshot). Fixed here.
Do not skip snapshots further away than 30k block from the highest block seen.
Currently we look for peers that seed snapshots that are close to the highest block seen on the network (where "close" means within 30k blocks of the tip). When a node starts up we wait for some time (5sec, increased here to 10sec) to let peers connect. If we find a suitable peer to sync a snapshot from at the end of that delay, we start the download; if none is found we start slow-syncing (or, if
--warp-barrier
is used we stall).Before this PR, when looking for a suitable snapshot we use the highest block seen on the network to check if a peer has a snapshot that is within 30k blocks of that highest block number. This means that in a situation where all available snapshots are old, we will often fail to start a snapshot at all. What's worse is that the longer we delay starting a snapshot sync (to let more peers connect, in the hope of finding a good snapshot), the more likely we are to have seen a high block and thus the less likely we become to accept a snapshot. This behaviour is quite surprising and tricky to reason about.
Here this highest-seen blocknumber comparison is removed and we switch to the simpler: pick the best suitable snapshot we find in 10sec.
A small but possibly controversial change: before this PR when a remote peers sends us a snapshot chunk we've already seen, we disconnect the peer and stop the download. After this PR we ignore the chunk and keep going instead (change). I do not think this is problematic but I have never seen a dupe happen so can't really be sure.Don't warp-sync twice
With the highest-seen blocknumber criteria gone, a node can end up using a snapshot that is a fair bit away from the tip, which means that after completion it needs to slow-sync for quite a bit to catch up to the tip. Before this change it can happen that a restored node finds a new snapshot and starts warp syncing again; this PR adds a check to see if our own best block is higher than the
--warp-barrier
and if so, does not start another sync.Avoid excessive looping over
Vec
s.Use a
HashSet
instead of aVec
to store the pending chunks. Remove items from the set as chunks are processed. Calculate and store the total number of chunks in theSnapshot
struct instead of counting pending chunks each time.