Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

parity ethereum client doesn't always shutdown gracefully #10364

Closed
c0deright opened this issue Feb 15, 2019 · 17 comments
Closed

parity ethereum client doesn't always shutdown gracefully #10364

c0deright opened this issue Feb 15, 2019 · 17 comments
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. P5-sometimesoon 🌲 Issue is worth doing soon.
Milestone

Comments

@c0deright
Copy link

  • Parity Ethereum version: 2.2.10-stable
  • Operating system: Linux
  • Installation: binary
  • Fully synchronized: yes
  • Network: ethereum
  • Restarted: yes

Sometimes stopping parity results in the issue described here: #9101 (comment)

Sometimes when stopping parity even with shutdown tracing turned on the process exits immediately and nothing about a shutdown is being logged at all.

So we have 3 outcomes when stopping parity:

  • clean shutdown (nothing being logged at all, almost instant)
  • clean shutdown (shutdown being logged, taking 1-10 seconds)
  • unclean shutdown (Shutdown is taking longer than expected / Shutdown timeout reached, exiting uncleanly)

How to debug this further?

@c0deright
Copy link
Author

2019-02-15 12:25:03  Verifier #1 INFO import  Imported #7223288 0xe31c…3dcf (84 txs, 7.99 Mgas, 675 ms, 14.60 KiB)
2019-02-15 12:25:08  IO Worker #3 INFO import     5/ 5 peers     18 MiB chain  115 MiB db  0 bytes queue   43 KiB sync  RPC:  0 conn,    0 req/s, 3244 µs
2019-02-15 12:25:23  Verifier #1 INFO import  Imported #7223289 0x70d4…8380 (103 txs, 7.99 Mgas, 1599 ms, 16.89 KiB)
2019-02-15 12:25:38  IO Worker #2 INFO import     5/ 5 peers     18 MiB chain  115 MiB db  0 bytes queue   43 KiB sync  RPC:  0 conn,    0 req/s, 3244 µs
2019-02-15 12:25:57  main INFO parity_ethereum::run  Finishing work, please wait...
2019-02-15 12:25:57  main TRACE shutdown  [IoService] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57  main TRACE shutdown  [IoService] Closed.
2019-02-15 12:26:57  main WARN parity_ethereum::run  Shutdown is taking longer than expected.
2019-02-15 12:30:57  main WARN parity_ethereum::run  Shutdown timeout reached, exiting uncleanly.

@jam10o-new jam10o-new added F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M4-core ⛓ Core client code / Rust. labels Feb 15, 2019
@jam10o-new jam10o-new added this to the 2.4 milestone Feb 15, 2019
@5chdn 5chdn modified the milestones: 2.4, 2.5 Feb 21, 2019
@soc1c soc1c modified the milestones: 2.5, 2.6 Apr 2, 2019
@jam10o-new jam10o-new added P5-sometimesoon 🌲 Issue is worth doing soon. F2-bug 🐞 The client fails to follow expected behavior. and removed F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. labels May 10, 2019
@jam10o-new
Copy link
Contributor

People are still seeing this issue regularly in recent versions - unclean shutdowns are leading to many more reports of db corruption so bumping priority here 😥

@tzapu
Copy link
Contributor

tzapu commented May 28, 2019

i can confirm this happens regularly to at least 4 parity archive instances we run with the following config

--auto-update=none
--base-path=/paritydb
--mode=active
--tracing=on
--pruning=archive
--db-compaction=ssd
--scale-verifiers
--num-verifiers=6
--jsonrpc-server-threads=5
--jsonrpc-threads=5
--cache-size=22000
--min-peers=100
--max-peers=1000
--jsonrpc-hosts=all
--jsonrpc-interface=all
--ws-interface=all
--tx-queue-mem-limit=2048
--tx-queue-size=2000000

@ghost ghost mentioned this issue Jun 1, 2019
@dvdplm
Copy link
Collaborator

dvdplm commented Jun 6, 2019

This is probably fixed by #10689

@dvdplm
Copy link
Collaborator

dvdplm commented Jun 12, 2019

There is probably one more of these bugs to root out. I've seen this happen with --chain kovan when the snapshotting service is running. With some extra logging added it looks like this:

2019-06-12 14:04:10  main TRACE shutdown  [IoService] Closed.
2019-06-12 14:04:10  main TRACE shutdown  ClientService dropped
2019-06-12 14:04:10  main TRACE shutdown  RPC dropped
2019-06-12 14:04:10  main TRACE shutdown  KeepAlive dropped
2019-06-12 14:04:10  main TRACE shutdown  Informant shut down
2019-06-12 14:04:10  main TRACE shutdown  Informant dropped
2019-06-12 14:04:10  main TRACE shutdown  Client dropped
2019-06-12 14:04:10  main TRACE shutdown  Waiting for refs to Client to shutdown, strong_count=19, weak_count=Some(13)
2019-06-12 14:04:10  jsonrpc-eventloop-1 TRACE shutdown  [IoService] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10  jsonrpc-eventloop-1 TRACE shutdown  [IoService] Closed.
2019-06-12 14:04:11  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
2019-06-12 14:04:12  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
…
2019-06-12 14:05:10  main WARN parity_ethereum::run  Shutdown is taking longer than expected.
…
2019-06-12 14:05:11  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
2019-06-12 14:05:12  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
2019-06-12 14:05:13  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
…
2019-06-12 14:09:10  main WARN parity_ethereum::run  Shutdown timeout reached, exiting uncleanly.

@zet-tech
Copy link

The problem is still present in current stable. When it will be merged?

@jam10o-new
Copy link
Contributor

@zet-tech it should be resolved in 2.4.8 and 2.5.3 - it was merged into those releases - if you still have issues with shutdowns the source of the issue may be different

@zet-tech
Copy link

It is present in 2.4.8 and the issue is for sure related to rpc. When I bind only to locahost and there is no rpc calls, restarts are correct. But when I bind parity do remote IP and it got request from our other software (even one second is enough which means 5-10 requests, only eth_getWork and eth_getBlockByNumber), restart is not possible and process is being killed. This result in DB corruption much more often then is should (even once per 10 restarts) and we were forced to move to GETH on production due to this problem.

@dvdplm
Copy link
Collaborator

dvdplm commented Jun 28, 2019

One thing worth noting about any of these shutdown problems is that different bugs can cause the same symptom. We recently fixed one instance where shutdown would fail while the node was taking a snapshot.
RPC usage causing deadlock during shutdown is quite possibly a distinct bug.

@dvdplm
Copy link
Collaborator

dvdplm commented Jun 29, 2019

@zet-tech That sounds really bad. I have tried to reproduce the shutdown problem after RPC on the latest master and could not see a problem. I'd need your assistance to debug this further.

  • if you have the possibility to try your setup with a master build that'd be great
  • can you share your configuration toml file with us so I can replicate your setup more closely?
  • I'm not sure what you mean by "bind only to locahost"/"when I bind parity do remote IP", can you elaborate?

Thanks!

@ordian ordian modified the milestones: 2.6, 2.7 Jul 12, 2019
@CorentinPacaud
Copy link

Can anyone confirm this issue has been fixed ?

@dvdplm
Copy link
Collaborator

dvdplm commented Sep 24, 2019

Can anyone confirm this issue has been fixed ?

As mentioned above there are possibly several other causes with the same symptom. We have fixed a few but there might be others. FWIW we have experienced or afaik not had reports of shutdown issues for several months now.

@CorentinPacaud
Copy link

Can anyone confirm this issue has been fixed ?

As mentioned above there are possibly several other causes with the same symptom. We have fixed a few but there might be others. FWIW we have experienced or afaik not had reports of shutdown issues for several months now.

So, after my server automatically restarted this weekend, I can confirm that the parity server restart normally with pm2. No error.
Thx

@zet-tech
Copy link

I just installed 2.6.4. Problem still occurs, exactly as before.

Answering previous questions:
1). outdated
2).
eth.txt

I removed the IP address because it is public IP.

3). By bind only to localhost, I meant that if there is no RPC calls to parity then the error does not occur. But even one RPC call cause that parity cannot be shutdown.

@zet-tech
Copy link

zet-tech commented Dec 6, 2019

2.5.10 solves our restart problem.

@c0deright
Copy link
Author

Sorry, forgot to mention that I didn't observe ungraceful shutdowns with v2.6.5, now running v2.6.6.

@zet-tech Don't forget to upgrade to at least v2.5.11 before Istanbul fork at the weekend: https://github.com/paritytech/parity-ethereum/releases/tag/v2.5.11

@zet-tech
Copy link

zet-tech commented Dec 6, 2019 via email

@dvdplm dvdplm closed this as completed Dec 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. P5-sometimesoon 🌲 Issue is worth doing soon.
Projects
None yet
Development

No branches or pull requests

9 participants