Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

ethcore: fix detection of major import #9552

Merged
merged 2 commits into from
Oct 8, 2018
Merged

ethcore: fix detection of major import #9552

merged 2 commits into from
Oct 8, 2018

Conversation

andresilva
Copy link
Contributor

After a sync round completes successfully we call Chainsync::reset, which in turn will set the sync state to the initial state by calling ChainSync::get_init_state. If warp sync is enabled the initial state will be set to WaitingPeers, which would be incorrect since it would make the client try to start a warp restore and also leads to incorrect detection of major import state. The fix is to force the state to Idle after the sync is complete. Also reverted changes introduced in #9112. I tested this by hammering the eth_syncing RPC and comparing the results to current master. On current master whenever a new block is imported eth_syncing would temporarily return that the node is syncing.

Fixes #9428.

@andresilva andresilva added A0-pleasereview 🤓 Pull request needs code review. B0-patchthis 🕷 M4-core ⛓ Core client code / Rust. labels Sep 13, 2018
@5chdn 5chdn added this to the 2.2 milestone Sep 13, 2018
This was referenced Sep 13, 2018
@debris debris added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Sep 14, 2018
@ngotchac
Copy link
Contributor

LGTM, but I wonder if it wouldn't make more sense to pass a state to reset then, instead of overwriting it instantly @andresilva ?

Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it will break warp-barrier. If WaitingForPeers is not the expected state we should change get_init_state or use a different method in reset

Resetting to WaitingForPeers should be fine if there is a timeout set that will fall back to regular sync if we can't warp.

@@ -700,6 +700,7 @@ impl ChainSync {
fn complete_sync(&mut self, io: &mut SyncIo) {
trace!(target: "sync", "Sync complete");
self.reset(io);
self.state = SyncState::Idle;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset is setting the state to ChainSync::get_init_state(self.warp_sync, io.chain()); already. Setting it to Idle here will break warp-barrier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, you're right @tomusdrw !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it does break the warp barrier, but I also don't think the current behavior of setting to get_init_state is correct, that's why I forced it to Idle. Every time we're synced with the tip of the chain we go into WaitingPeers state which tries to trigger a snapshot sync. If you monitor eth_syncing it goes into syncing state briefly every time a new block is imported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to set it to get_init_state until WAIT_PEERS_TIMEOUT is elapsed, and force it to idle afterwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/chain/mod.rs#L451 should be modified as you are saying, set to Idle when WAIT_PEERS_TIMEOUT is elapsed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I was wrong. This doesn't break the warp-barrier, it keeps on searching for the given snapshot until it finds it. It doesn't break the warp barrier because it will be stuck in WaitingPeers state and therefore will never complete any sync round (i.e. complete_sync won't be called which would force the state to Idle). It does break the informant because of the changes I made to is_major_importing, it will print as if the node was syncing while it is actually searching for the snapshot. Though I'm not sure what is the correct semantics here? When eth_syncing returns false should it mean that the node just isn't syncing, or that it's not syncing because it doesn't need to? I.e. it is already synced to the tip of the chain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess if we're in complete_sync, then we've already reached warp barrier. We can make reset take an additional state param and pass either ChainSync::get_init_state(self.warp_sync, io.chain()) or SyncState::Idle, but I don't see it as blocking this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix should be moved from complete_sync() to reset() – it's brittle to have the state first set in reset() and then clobbered right afterwards in complete_sync().
I am new to this code and might be completely wrong here, but it seems like we're conflating state with configuration (WarpSync::Enabled). Maybe I'm making the wrong assumption here, but isn't it true that once warp syncing is done we want it turned off until the node is restarted? Or are there cases when a node reaches the tip of the chain at one point t and starts syncing normally and then at t + n need to go back to warp syncing? If not, then Idle would be the correct state as soon as complete_sync is called?
In other words: if we're beyond the warp barrier or at the tip, calling complete_sync() should put us in Idle (but I'd move that logic to reset() if possible).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are there cases when a node reaches the tip of the chain at one point t and starts syncing normally and then at t + n need to go back to warp syncing?

Yes, we can switch back to warp syncing e.g. if our node went offline for a while (or just slow) and is too far away (SNAPSHOT_RESTORE_THRESHOLD) from the current tip (see maybe_start_snapshot_sync, which is called regularly on timer).

/// be considered a syncing state.
pub fn is_major_importing_or_waiting(sync_state: Option<SyncState>, queue_info: BlockQueueInfo, waiting_is_syncing_state: bool) -> bool {
/// Check if client is during major sync or during block import.
pub fn is_major_importing(sync_state: Option<SyncState>, queue_info: BlockQueueInfo) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flag was introduced quite recently, no? WaitingForPeers is the initial state in case we are attempting warp sync, and we don't really want to start regular sync just yet (we first wait for at least couple of warp-peers before falling back to regular sync).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was added recently (#9112), but it's not necessary if we don't switch into WaitingPeers state on the beginning of each sync round, and in that case we should always consider WaitingPeers to be a syncing state.

@5chdn 5chdn added A4-gotissues 💥 Pull request is reviewed and has significant issues which must be addressed. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Sep 14, 2018
@5chdn
Copy link
Contributor

5chdn commented Sep 18, 2018

Please rebase on master, sorry for the inconvenience.

@5chdn 5chdn added A0-pleasereview 🤓 Pull request needs code review. and removed A4-gotissues 💥 Pull request is reviewed and has significant issues which must be addressed. labels Sep 25, 2018
@@ -700,6 +700,7 @@ impl ChainSync {
fn complete_sync(&mut self, io: &mut SyncIo) {
trace!(target: "sync", "Sync complete");
self.reset(io);
self.state = SyncState::Idle;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess if we're in complete_sync, then we've already reached warp barrier. We can make reset take an additional state param and pass either ChainSync::get_init_state(self.warp_sync, io.chain()) or SyncState::Idle, but I don't see it as blocking this PR.

@ordian ordian added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Sep 25, 2018
@andresilva
Copy link
Contributor Author

Added state parameter to the reset method, defaults to initial state if None is provided. Rebased over master.

@5chdn 5chdn merged commit dc14cce into master Oct 8, 2018
@5chdn 5chdn deleted the andre/fix-sync-state branch October 8, 2018 11:04
5chdn pushed a commit that referenced this pull request Oct 8, 2018
* sync: set state to idle after sync is completed

* sync: refactor sync reset
dvdplm added a commit that referenced this pull request Oct 9, 2018
…mon-deps

* origin/master:
  fix (light/provider) : Make `read_only executions` read-only (#9591)
  ethcore: fix detection of major import (#9552)
  return 0 on error (#9705)
  ethcore: delay ropsten hardfork (#9704)
  make instantSeal engine backwards compatible, closes #9696 (#9700)
  Implement CREATE2 gas changes and fix some potential overflowing (#9694)
  Don't hash the init_code of CREATE. (#9688)
  ethcore: minor optimization of modexp by using LR exponentiation (#9697)
  removed redundant clone before each block import (#9683)
5chdn added a commit that referenced this pull request Oct 9, 2018
* parity-version: bump beta to 2.1.2

* docs(rpc): push the branch along with tags (#9578)

* docs(rpc): push the branch along with tags

* ci: remove old rpc docs script

* Remove snapcraft clean (#9585)

* Revert " add snapcraft package image (master) (#9584)"

This reverts commit ceaedbb.

* Update package-snap.sh

* Update .gitlab-ci.yml

* ci: fix regex 🙄 (#9597)

* docs(rpc): annotate tag with the provided message (#9601)

* Update ropsten.json (#9602)

* HF in POA Sokol (2018-09-19) (#9607)

poanetwork/poa-chain-spec#86

* fix(network): don't disconnect reserved peers (#9608)

The priority of && and || was borked.

* fix failing node-table tests on mac os, closes #9632 (#9633)

* ethcore-io retries failed work steal (#9651)

* ethcore-io uses newer version of crossbeam && retries failed work steal

* ethcore-io non-mio service uses newer crossbeam

* remove master from releasable branches (#9655)

* remove master from releasable branches

need backporting in beta 
fix https://gitlab.parity.io/parity/parity-ethereum/-/jobs/101065 etc

* add except for snap packages for master

* Test fix for windows cache name... (#9658)

* Test fix for windows cache name...

* Fix variable name.

* fix(light_fetch): avoid race with BlockNumber::Latest (#9665)

* Calculate sha3 instead of sha256 for push-release. (#9673)

* Calculate sha3 instead of sha256 for push-release.

* Add pushes to the script.

* Hardfork the testnets (#9562)

* ethcore: propose hardfork block number 4230000 for ropsten

* ethcore: propose hardfork block number 9000000 for kovan

* ethcore: enable kip-4 and kip-6 on kovan

* etcore: bump kovan hardfork to block 9.2M

* ethcore: fix ropsten constantinople block number to 4.2M

* ethcore: disable difficulty_test_ropsten until ethereum/tests are updated upstream

* ci: fix push script (#9679)

* ci: fix push script

* Fix copying & running on windows.

* CI: Remove unnecessary pipes (#9681)

* ci: reduce gitlab pipelines significantly

* ci: build pipeline for PR

* ci: remove dead weight

* ci: remove github release script

* ci: remove forever broken aura tests

* ci: add random stuff to the end of the pipes

* ci: add wind and mac to the end of the pipe

* ci: remove snap artifacts

* ci: (re)move dockerfiles

* ci: clarify job names

* ci: add cargo audit job

* ci: make audit script executable

* ci: ignore snap and docker files for rust check

* ci: simplify audit script

* ci: rename misc to optional

* ci: add publish script to releaseable branches

* ci: more verbose cp command for windows build

* ci: fix weird binary checksum logic in push script

* ci: fix regex in push script for windows

* ci: simplify gitlab caching

* docs: align README with ci changes

* ci: specify default cargo target dir

* ci: print verbose environment

* ci: proper naming of scripts

* ci: restore docker files

* ci: use docker hub file

* ci: use cargo home instead of cargo target dir

* ci: touch random rust file to trigger real builds

* ci: set cargo target dir for audit script

* ci: remove temp file

* ci: don't export the cargo target dir in the audit script

* ci: fix windows unbound variable

* docs: fix gitlab badge path

* rename deprecated gitlab ci variables

https://docs.gitlab.com/ee/ci/variables/#9-0-renaming

* ci: fix git compare for nightly builds

* test: skip c++ example for all platforms but linux

* ci: add random rust file to trigger tests

* ci: remove random rust file

* disable cpp lib test for mac, win and beta (#9686)

* cleanup ci merge

* ci: fix tests

* fix bad-block reporting no reason (#9638)

* ethcore: fix detection of major import (#9552)

* sync: set state to idle after sync is completed

* sync: refactor sync reset

* Don't hash the init_code of CREATE. (#9688)

* Docker: run as parity user (#9689)

* Implement CREATE2 gas changes and fix some potential overflowing (#9694)

* Implement CREATE2 gas changes and fix some potential overflowing

* Ignore create2 state tests

* Split CREATE and CREATE2 in gasometer

* Generalize rounding (x + 31) / 32 to to_word_size

* make instantSeal engine backwards compatible, closes #9696 (#9700)

* ethcore: delay ropsten hardfork (#9704)

* fix (light/provider) : Make `read_only executions` read-only (#9591)

* `ExecutionsRequest` from light-clients as read-only

This changes so all `ExecutionRequests` from light-clients are executed
as read-only which the `virtual``flag == true ensures.

This boost up the current transaction to always succeed

Note, this only affects `eth_estimateGas` and `eth_call` AFAIK.

* grumbles(revert renaming) : TransactionProof

* grumbles(trace) : remove incorrect trace

* grumbles(state/prove_tx) : explicit `virt`

Remove the boolean flag to determine that a `state::prove_transaction`
whether it should be executed in a virtual context or not.

Because of that also rename the function to
`state::prove_transction_virtual` to make more clear

* CI: Skip docs job for nightly (#9693)

* ci: force-tag wiki changes

* ci: force-tag wiki changes

* ci: skip docs job for master and nightly

* ci: revert docs job checking for nightly tag

* ci: exclude docs job from nightly builds in gitlab script
5chdn pushed a commit that referenced this pull request Oct 9, 2018
* sync: set state to idle after sync is completed

* sync: refactor sync reset
5chdn added a commit that referenced this pull request Oct 10, 2018
* parity-version: bump stable to 2.0.7

* HF in POA Sokol (2018-09-19) (#9607)

poanetwork/poa-chain-spec#86

* fix failing node-table tests on mac os, closes #9632 (#9633)

* fix(light_fetch): avoid race with BlockNumber::Latest (#9665)

* CI: Remove unnecessary pipes (#9681)

* ci: reduce gitlab pipelines significantly

* ci: build pipeline for PR

* ci: remove dead weight

* ci: remove github release script

* ci: remove forever broken aura tests

* ci: add random stuff to the end of the pipes

* ci: add wind and mac to the end of the pipe

* ci: remove snap artifacts

* ci: (re)move dockerfiles

* ci: clarify job names

* ci: add cargo audit job

* ci: make audit script executable

* ci: ignore snap and docker files for rust check

* ci: simplify audit script

* ci: rename misc to optional

* ci: add publish script to releaseable branches

* ci: more verbose cp command for windows build

* ci: fix weird binary checksum logic in push script

* ci: fix regex in push script for windows

* ci: simplify gitlab caching

* docs: align README with ci changes

* ci: specify default cargo target dir

* ci: print verbose environment

* ci: proper naming of scripts

* ci: restore docker files

* ci: use docker hub file

* ci: use cargo home instead of cargo target dir

* ci: touch random rust file to trigger real builds

* ci: set cargo target dir for audit script

* ci: remove temp file

* ci: don't export the cargo target dir in the audit script

* ci: fix windows unbound variable

* docs: fix gitlab badge path

* rename deprecated gitlab ci variables

https://docs.gitlab.com/ee/ci/variables/#9-0-renaming

* ci: fix git compare for nightly builds

* test: skip c++ example for all platforms but linux

* ci: add random rust file to trigger tests

* ci: remove random rust file

* disable cpp lib test for mac, win and beta (#9686)

* cleanup ci merge

* parity: bump clib

* ci: fix tests

* ci: disable c++ example

* Docker: run as parity user (#9689)

* CI: Skip docs job for nightly (#9693)

* ci: force-tag wiki changes

* ci: force-tag wiki changes

* ci: skip docs job for master and nightly

* ci: revert docs job checking for nightly tag

* ci: exclude docs job from nightly builds in gitlab script

* fix (light/provider) : Make `read_only executions` read-only (#9591)

* `ExecutionsRequest` from light-clients as read-only

This changes so all `ExecutionRequests` from light-clients are executed
as read-only which the `virtual``flag == true ensures.

This boost up the current transaction to always succeed

Note, this only affects `eth_estimateGas` and `eth_call` AFAIK.

* grumbles(revert renaming) : TransactionProof

* grumbles(trace) : remove incorrect trace

* grumbles(state/prove_tx) : explicit `virt`

Remove the boolean flag to determine that a `state::prove_transaction`
whether it should be executed in a virtual context or not.

Because of that also rename the function to
`state::prove_transction_virtual` to make more clear

* ethcore: fix detection of major import (#9552)

* sync: set state to idle after sync is completed

* sync: refactor sync reset

* parity: revert clib bump and fix tests

* Fix path to parity.h (#9274)

* Fix path to parity.h

* Fix other paths as well

* ethcore-io retries failed work steal (#9651)

* ethcore-io uses newer version of crossbeam && retries failed work steal

* ethcore-io non-mio service uses newer crossbeam
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When importing new blocks Parity goes into sync mode very briefly, giving false ethstats info
7 participants