Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Losing peers and falling out of sync #10626

Closed
dafky2000 opened this issue May 3, 2019 · 6 comments
Closed

Losing peers and falling out of sync #10626

dafky2000 opened this issue May 3, 2019 · 6 comments
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.

Comments

@dafky2000
Copy link

  • Parity Ethereum version: 2.4.5
  • Operating system: Linux
  • Installation: Arch Package Repo
  • Fully synchronized: yes
  • Network: ethereum
  • Restarted: yes

Having issues losing peers since yesterday morning (approx 08:00 - 09:00 UTC). Parity starts up fine, syncs to the latest block, then proceeds to drop peers and fall out of sync regularily.

Likely related?: Sometimes, syncing will stop completely with 0 blocks Qed. During these pauses, we seem to accumulate peers normally but syncing remains "stuck" - I have not caught this since enabling trace logging.

During both cases, restart parity then everything resumes correctly for a short time.


Full archive + tracing started with:

parity --base-path /mnt/hardraid0/io.parity.ethereum/ --mode active --tracing on --pruning archive --db-compaction ssd --cache-size 8192 --min-peers 128 --max-peers 256 --no-periodic-snapshot --jsonrpc-interface 10.0.0.4 --ws-interface 10.0.0.4 --ws-hosts all -l sync=trace
@jam10o-new jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. labels May 3, 2019
@jam10o-new jam10o-new added this to the 2.6 milestone May 3, 2019
@mewwts
Copy link

mewwts commented May 3, 2019

Having this issue on 2.4.5 running parity --jsonrpc-threads=4 --jsonrpc-server-threads=4 with the following config file

[parity]
auto_update = "none"
no_download = true
base_path = "/mnt/parity-data/parity-data/"
no_persistent_txqueue = true

[network]
nat = "extip:x"
warp = false
allow_ips = "public"
min_peers=40
max_peers=80

[ipc]
disable = true

[rpc]
interface = "all"
apis = ["safe"]

[websockets]
interface = "all"
apis = ["safe"]

[dapps]
disable = true

[footprint]
tracing = "on"
pruning = "archive"
cache_size=12288

These nodes are running on 4 vCPUs, 16GB ram, ubuntu 18.04 on Digital Ocean.

@mewwts
Copy link

mewwts commented May 5, 2019

I wonder if there could me a memory leak somewhere. Here's my memory consumption on two 16GB nodes.
Screenshot_2019-05-05 DigitalOcean - parity-3
Screenshot_2019-05-05 DigitalOcean - parity-2

When the memory resets to 0 the node has been restarted by supervisor.

@jam10o-new
Copy link
Contributor

#10371 that is a known issue with an as-of-yet unidentified cause

@sbwdlihao
Copy link

The same problem

  • Parity Ethereum version: Parity-Ethereum/v2.5.1-beta-adabd81-20190514/x86_64-linux-gnu/rustc1.35.0
  • Operating system: CentOS Linux release 7.5.1804 (Core)
  • Installation: Build from source
  • Fully synchronized: yes
  • Network: ethereum
  • CPU: 8 core
  • Memory: 16G
  • SSD: yes

Start up log

2019-06-05 11:02:35  Starting Parity-Ethereum/v2.5.1-beta-adabd81-20190514/x86_64-linux-gnu/rustc1.35.0
2019-06-05 11:02:35  Keys path /data/parity/keys/ethereum
2019-06-05 11:02:35  DB path /data/parity/chains/ethereum/db/906a34e69aec8c0d
2019-06-05 11:02:35  State DB configuration: archive +Trace
2019-06-05 11:02:35  Operating mode: active
2019-06-05 11:02:36  Configured for Ethereum using Ethash engine
2019-06-05 11:02:38  Listening for new connections on 127.0.0.1:8546.

Log with '-l sync=debug'

2019-06-05 10:52:01  IO Worker #0 DEBUG sync  58 -> Invalid packet 0
2019-06-05 10:52:49  IO Worker #0 DEBUG sync  Wasn't able to finish transaction propagation within a deadline.
....
2019-06-05 10:52:49  IO Worker #1 DEBUG sync  Unexpected packet 2 from unregistered peer: 286:Geth/v5.5.2-be43774/linux/go1.9.7
2019-06-05 10:53:45  IO Worker #1 DEBUG sync  Wasn't able to finish transaction propagation within a deadline.

Lost peers happened for several times and before lost peers, the IO Worker takes many time to do transaction propagation until deadline

@sbwdlihao
Copy link

I solved my problem:

  • Use ESSD in aliyun which 5-10 times IOPS improvement than SSD
  • Lower rpc request per second
  • Set disable_periodic = true in config
  • Change PROPAGATE_TIMEOUT_INTERVAL in mod.rs from 5 seconds to 20 seconds and then rebuild

@adria0
Copy link

adria0 commented Jul 27, 2020

Closing issue due to its stale state.

@adria0 adria0 closed this as completed Jul 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

No branches or pull requests

7 participants
@mewwts @dafky2000 @ordian @sbwdlihao @adria0 @jam10o-new and others