Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Fix deadlock in network-devp2p #11013

Merged
merged 2 commits into from
Sep 2, 2019

Conversation

AtkinsChang
Copy link
Contributor

Fix two deadlock in devp2p module.

Reproduce

deadlock will occur in 5 minutes with this docker-compose.yaml.

version: '3'
services:
  parity:
    image: parity/parity:latest
    command: --chain kovan --unsafe-expose --jsonrpc-apis=all --node-key=test --no-warp

  rpc_client:
    image: alpine
    command:
      - sh
      - -c
      - |-
        apk --no-cache add curl jq
        while :
          do
          curl --data '[
            {"method":"parity_addReservedPeer","params":["enode://5ff409de959e62e822273a53e42a603a81676538f58b8c708014ff057e48d2ec15c1273ff963ea162ccba8358587143e5d694564eeffd0cb3f9520272fb3c48e@parity:30303"],"id":1,"jsonrpc":"2.0"},
            {"method":"eth_syncing","params":[],"id":2,"jsonrpc":"2.0"}
          ]' -H "Content-Type: application/json" -X POST parity:8545 2>/dev/null | jq
        done
    depends_on:
      - parity

Explanation

In fn connect_peers(&self, io: &IoContext<NetworkIoMessage>), try to acquire mutex for Session while holding r-lock for reserved_peers:
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L576
https://github.com/paritytech/parity-ethereum/blob/00124b5a4bf8a38ad6c660b462a5e5addf13cab3/util/network-devp2p/src/host.rs#L603-L604

in fn session_readable(&self, token: StreamToken, io: &IoContext<NetworkIoMessage>), there are two code path trying to acquire r-lock for reserved_peers while holding mutex for Session:

In the situation that:

  • thread A is in connect_peers holding r-lock for reserved_peers and waiting the session mutex
  • thread B is in session_readable holding mutex for session and trying to acquire r-lock for reserved_peers

if there is another thread C acquiring w-lock for reserved_peers, thread B will be blocked until there are no more writers which hold / waiting for the lock cause the dead lock among three threads.

Another deadlock is cause by reserved_nodes and node in fn update_nodes(&self, _io: &IoContext<NetworkIoMessage>, node_changes: TableUpdates) and fn connect_peers(&self, io: &IoContext<NetworkIoMessage>).

@parity-cla-bot
Copy link

It looks like @AtkinsChang signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

@jam10o-new jam10o-new added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Aug 31, 2019
@dvdplm dvdplm requested a review from ngotchac August 31, 2019 07:19
@dvdplm
Copy link
Collaborator

dvdplm commented Sep 1, 2019

Deadlocks for me in less than a minute.

@dvdplm dvdplm added B0-patch-stable 🕷 Pull request should also be back-ported to the stable branch. B1-patch-beta 🕷🕷 labels Sep 1, 2019
@dvdplm dvdplm requested a review from tomusdrw September 1, 2019 16:58
Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing catches, thanks a lot!

@@ -722,12 +722,13 @@ impl Host {
let session_result = session.lock().readable(io, &self.info.read());
match session_result {
Err(e) => {
let reserved_nodes = self.reserved_nodes.read();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other occurences where we need reserved_nodes and session.lock()?

As a rule of thumb, we try to keep the lock order in the same order as fields declaration. Here we lock something that is not part of the original struct, but since it's retrieve from self.sessions I'd say that we should keep the lock ordering as well.

Otherwise let's put a comment on top of sessions: and note what order the session.lock() should be acquired in.

Copy link
Collaborator

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff. I just tested the branch against kovan and cannot make it deadlock. Let's wait for @ngotchac's review and then merge.

Copy link
Contributor

@ngotchac ngotchac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch !

@dvdplm dvdplm merged commit 05f9606 into openethereum:master Sep 2, 2019
ordian added a commit to ordian/parity that referenced this pull request Sep 2, 2019
* master:
  EIP 1108: Reduce alt_bn128 precompile gas costs (openethereum#11008)
  Fix deadlock in `network-devp2p` (openethereum#11013)
dvdplm added a commit that referenced this pull request Sep 2, 2019
* master:
  EIP 1108: Reduce alt_bn128 precompile gas costs (#11008)
  Fix deadlock in `network-devp2p` (#11013)
  Implement EIP-1283 reenable transition, EIP-1706 and EIP-2200 (#10191)
dvdplm added a commit that referenced this pull request Sep 3, 2019
…he-right-place

* master:
  Extract snapshot to own crate (#11010)
  Edit publish-onchain.sh to use https (#11016)
  EIP 1108: Reduce alt_bn128 precompile gas costs (#11008)
  Fix deadlock in `network-devp2p` (#11013)
  Implement EIP-1283 reenable transition, EIP-1706 and EIP-2200 (#10191)
  EIP 1884 Re-pricing of trie-size dependent operations (#10992)
s3krit pushed a commit that referenced this pull request Sep 11, 2019
* Fix deadlock in `network-devp2p`

* Add note for locking order in `network-devp2p`
s3krit pushed a commit that referenced this pull request Sep 11, 2019
* Fix deadlock in `network-devp2p`

* Add note for locking order in `network-devp2p`
This was referenced Sep 12, 2019
s3krit added a commit that referenced this pull request Sep 12, 2019
* add more tx tests (#11038)
* Fix parallel transactions race-condition (#10995)
* Add blake2_f precompile (#11017)
* [trace] introduce trace failed to Ext (#11019)
* Edit publish-onchain.sh to use https (#11016)
* Fix deadlock in network-devp2p (#11013)
* EIP 1108: Reduce alt_bn128 precompile gas costs (#11008)
* xDai chain support and nodes list update (#10989)
* EIP 2028: transaction gas lowered from 68 to 16 (#10987)
* EIP-1344 Add CHAINID op-code (#10983)
* manual publish jobs for releases, no changes for nightlies (#10977)
* [blooms-db] Fix benchmarks (#10974)
* Verify transaction against its block during import (#10954)
* Better error message for rpc gas price errors (#10931)
* Fix fork choice (#10837)
* Fix compilation on recent nightlies (#10991)
s3krit added a commit that referenced this pull request Sep 12, 2019
* add more tx tests (#11038)
* Fix parallel transactions race-condition (#10995)
* Add blake2_f precompile (#11017)
* [trace] introduce trace failed to Ext (#11019)
* Edit publish-onchain.sh to use https (#11016)
* Fix deadlock in network-devp2p (#11013)
* EIP 1108: Reduce alt_bn128 precompile gas costs (#11008)
* xDai chain support and nodes list update (#10989)
* EIP 2028: transaction gas lowered from 68 to 16 (#10987)
* EIP-1344 Add CHAINID op-code (#10983)
* manual publish jobs for releases, no changes for nightlies (#10977)
* [blooms-db] Fix benchmarks (#10974)
* Verify transaction against its block during import (#10954)
* Better error message for rpc gas price errors (#10931)
* tx-pool: accept local tx with higher gas price when pool full (#10901)
* Fix fork choice (#10837)
* Cleanup unused vm dependencies (#10787)
* Fix compilation on recent nightlies (#10991)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-pleasereview 🤓 Pull request needs code review. B0-patch-stable 🕷 Pull request should also be back-ported to the stable branch. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants