Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Fix light client informant while syncing #9932

Merged
merged 6 commits into from
Nov 26, 2018
Merged

Conversation

ngotchac
Copy link
Contributor

From #9385 , the informant wasn't displaying correctly the syncing state. This is because the state of LightSync had deadlock issues, and the default to is_major_importing was set to false if the lock couldn't be acquired. This lead to false logs in the informant.

In this PR, is_idle is added, and gets updated everytime the state is updated, such as is_major_importing can check this boolean instead of waiting for state to be unlocked.

@ngotchac ngotchac added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Nov 16, 2018
const EMPTY_QUEUE: usize = 3;

let queue_info = self.client.as_light_client().queue_info();
let is_verifying = queue_info.unverified_queue_size + queue_info.verified_queue_size > EMPTY_QUEUE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there was a reason in #5002 not to take into account verified_queue_size
https://github.com/paritytech/parity-ethereum/pull/5002/files#diff-e8c26f6188b2b873a7467abd3041b843R584
@rphmeier? Although that was a long time ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that. But from my local testing, it lead to wrong information being displayed (Import logs too soon)

ethcore/sync/src/light_sync/mod.rs Show resolved Hide resolved
ethcore/sync/src/light_sync/mod.rs Show resolved Hide resolved
ethcore/sync/src/light_sync/mod.rs Outdated Show resolved Hide resolved
@ordian ordian requested a review from cheme November 22, 2018 08:52
@ordian ordian added this to the 2.3 milestone Nov 22, 2018
@ngotchac ngotchac mentioned this pull request Nov 22, 2018
Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Basically we are kicking verification queues mutexes and a new mutex instead of the state mutex, that should totally works, and indeed it works well (at least during 4hours when the deadlock happens usually in less than one).
So it is yet another additional mutex, but until a more impacting refacto to reduce those, this PR is very welcome and nice work (thanks for fixing this 👍 ) .

I also did put a short comment with a suggestion (but it is just a suggestion).

/// A wrapper around the SyncState that makes sure to
/// update the giving reference to `is_idle`
#[derive(Debug)]
struct SyncStateWrapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure about the usefulness of this struct, it seems that set function is only called once in set_state of LightClient, since fields are private.
Ok, I got it, it avoid bad use in future client inner implementation, maybe then we could move is_idle to it:

struct SyncStateWrapper {
  state: Mutex::new(SyncState::Idle),
  is_idle: Mutex::new(true),
}

and keep deref to state. Then we would have an easier set function (without handle ref).
This could be a bad idea if we need to keep is_idle mutex open (does not seems so).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that you want to keep an open SyncState Mutex while is_idle still being accessible. With the suggested changes, you cannot have a lock on SyncStateWrapper (otherwise is_idle wouldn't be accessible), so you would have to expose the inner SyncState lock, which could be used to directly change the state, without updating is_idle. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, my suggestion was your second hypothesis (no mutex around SyncStateWrapper) , it does need to exposes the inner lock, on the other hand it keeps both locks together.
There is maybe less chance to mistake with current version, a mistake would be to pass any &mut bool to set_state which would be clearly intentional.

@5chdn 5chdn added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Nov 25, 2018
@ordian ordian added B1-patch-beta 🕷🕷 B0-patch-stable 🕷 Pull request should also be back-ported to the stable branch. labels Nov 26, 2018
@ordian ordian merged commit 0d59319 into master Nov 26, 2018
@ordian ordian deleted the ng-pip-informant-fix branch November 26, 2018 11:05
@5chdn 5chdn mentioned this pull request Nov 27, 2018
12 tasks
5chdn pushed a commit that referenced this pull request Nov 27, 2018
* Add `is_idle` to LightSync to check importing status

* Use SyncStateWrapper to make sure is_idle gets updates

* Update is_major_import to use verified queue size as well

* Add comment for `is_idle`

* Add Debug to `SyncStateWrapper`

* `fn get` -> `fn into_inner`
@5chdn 5chdn mentioned this pull request Nov 27, 2018
10 tasks
5chdn pushed a commit that referenced this pull request Nov 27, 2018
* Add `is_idle` to LightSync to check importing status

* Use SyncStateWrapper to make sure is_idle gets updates

* Update is_major_import to use verified queue size as well

* Add comment for `is_idle`

* Add Debug to `SyncStateWrapper`

* `fn get` -> `fn into_inner`
5chdn added a commit that referenced this pull request Nov 28, 2018
* version: bump stable to 2.1.7

* Adjust requests costs for light client (#9925)

* PIP Table Cost relative to average peers instead of max peers

* Add tracing in PIP new_cost_table

* Update stat peer_count

* Use number of leeching peers for Light serve costs

* Fix test::light_params_load_share_depends_on_max_peers (wrong type)

* Remove (now) useless test

* Remove `load_share` from LightParams.Config
Prevent div. by 0

* Add LEECHER_COUNT_FACTOR

* PR Grumble: u64 to u32 for f64 casting

* Prevent u32 overflow for avg_peer_count

* Add tests for LightSync::Statistics

* Fix empty steps (#9939)

* Don't send empty step twice or empty step then block.

* Perform basic validation of locally sealed blocks.

* Don't include empty step twice.

* prevent silent errors in daemon mode, closes #9367 (#9946)

* Fix light client informant while syncing (#9932)

* Add `is_idle` to LightSync to check importing status

* Use SyncStateWrapper to make sure is_idle gets updates

* Update is_major_import to use verified queue size as well

* Add comment for `is_idle`

* Add Debug to `SyncStateWrapper`

* `fn get` -> `fn into_inner`

*  ci: rearrange pipeline by logic (#9970)

* ci: rearrange pipeline by logic

* ci: rename docs script

* Add readiness check for docker container (#9804)

* Update Dockerfile

Since parity is built for "mission critical use", I thought other operators may see the need for this.

Adding the `curl` and `jq` commands allows for an extremely simple health check to be usable in container orchestrators.

For example. Here is a health check for a parity docker container running in Kubernetes.

This can be setup as a readiness Probe that would prevent clustered nodes that aren't ready from serving traffic.

```bash
#!/bin/bash

ETH_SYNCING=$(curl -X POST --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' http://localhost:8545 -H 'Content-Type: application/json')
RESULT=$(echo "$ETH_SYNCING | jq -r .result)

if [ "$RESULT" == "false" ]; then
  echo "Parity is ready to start accepting traffic"
  exit 0
else
  echo "Parity is still syncing the blockchain"
  exit 1
fi
```

* add sync check script

* Fix docker script (#9854)


* Dockerfile: change source path of the newly added check_sync.sh (#9869)

* Do not use the home directory as the working dir in docker (#9834)

* Do not create a home directory.

* Re-add -m flag

* fix docker build (#9971)

* bump smallvec to 0.6 in ethcore-light, ethstore and whisper (#9588)

* bump smallvec to 0.6 in ethcore-light, ethstore and whisper

* bump transaction-pool

* Fix test.

* patch cargo to use tokio-proto from git repo

this makes sure we no longer depend on smallvec 0.2.1 which is
affected by servo/rust-smallvec#96

* use patched version of untrusted 0.5.1

* ci: allow audit to fail
5chdn added a commit that referenced this pull request Nov 29, 2018
* version: bump beta to 2.2.2

* Add experimental RPCs flag (#9928)

* WiP

* Enable experimental RPCs.

* Keep existing blocks when restoring a Snapshot (#8643)

* Rename db_restore => client

* First step: make it compile!

* Second step: working implementation!

* Refactoring

* Fix tests

* PR Grumbles

* PR Grumbles WIP

* Migrate ancient blocks interating backward

* Early return in block migration if snapshot is aborted

* Remove RwLock getter (PR Grumble I)

* Remove dependency on `Client`: only used Traits

* Add test for recovering aborted snapshot recovery

* Add test for migrating old blocks

* Fix build

* PR Grumble I

* PR Grumble II

* PR Grumble III

* PR Grumble IV

* PR Grumble V

* PR Grumble VI

* Fix one test

* Fix test

* PR Grumble

* PR Grumbles

* PR Grumbles II

* Fix tests

* Release RwLock earlier

* Revert Cargo.lock

* Update _update ancient block_ logic: set local in `commit`

* Update typo in ethcore/src/snapshot/service.rs

Co-Authored-By: ngotchac <ngotchac@gmail.com>

* Adjust requests costs for light client (#9925)

* PIP Table Cost relative to average peers instead of max peers

* Add tracing in PIP new_cost_table

* Update stat peer_count

* Use number of leeching peers for Light serve costs

* Fix test::light_params_load_share_depends_on_max_peers (wrong type)

* Remove (now) useless test

* Remove `load_share` from LightParams.Config
Prevent div. by 0

* Add LEECHER_COUNT_FACTOR

* PR Grumble: u64 to u32 for f64 casting

* Prevent u32 overflow for avg_peer_count

* Add tests for LightSync::Statistics

* Fix empty steps (#9939)

* Don't send empty step twice or empty step then block.

* Perform basic validation of locally sealed blocks.

* Don't include empty step twice.

* prevent silent errors in daemon mode, closes #9367 (#9946)

* Fix a deadlock (#9952)

* Update informant:
  - decimal in Mgas/s
  - print every 5s (not randomly between 5s and 10s)

* Fix dead-lock in `blockchain.rs`

* Update locks ordering

* Fix light client informant while syncing (#9932)

* Add `is_idle` to LightSync to check importing status

* Use SyncStateWrapper to make sure is_idle gets updates

* Update is_major_import to use verified queue size as well

* Add comment for `is_idle`

* Add Debug to `SyncStateWrapper`

* `fn get` -> `fn into_inner`

*  ci: rearrange pipeline by logic (#9970)

* ci: rearrange pipeline by logic

* ci: rename docs script

* fix docker build (#9971)

* Deny unknown fields for chainspec (#9972)

* Add deny_unknown_fields to chainspec

* Add tests and fix existing one

* Remove serde_ignored dependency for chainspec

* Fix rpc test eth chain spec

* Fix starting_nonce_test spec

* Improve block and transaction propagation (#9954)

* Refactor sync to add priority tasks.

* Send priority tasks notifications.

* Propagate blocks, optimize transactions.

* Implement transaction propagation. Use sync_channel.

* Tone down info.

* Prevent deadlock by not waiting forever for sync lock.

* Fix lock order.

* Don't use sync_channel to prevent deadlocks.

* Fix tests.

* Fix unstable peers and slowness in sync (#9967)

* Don't sync all peers after each response

* Update formating

* Fix tests: add `continue_sync` to `Sync_step`

* Update ethcore/sync/src/chain/mod.rs

Co-Authored-By: ngotchac <ngotchac@gmail.com>

* fix rpc middlewares

* fix Cargo.lock

* json: resolve merge in spec

* rpc: fix starting_nonce_test

* ci: allow nightl job to fail
niklasad1 pushed a commit that referenced this pull request Dec 16, 2018
* Add `is_idle` to LightSync to check importing status

* Use SyncStateWrapper to make sure is_idle gets updates

* Update is_major_import to use verified queue size as well

* Add comment for `is_idle`

* Add Debug to `SyncStateWrapper`

* `fn get` -> `fn into_inner`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. B0-patch-stable 🕷 Pull request should also be back-ported to the stable branch. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants