Archive node loses peers (stop syncing blocks) #10724

APshenkin · 2019-06-05T20:57:29Z

Parity Ethereum version: 2.4.6
Operating system: Linux
Installation: docker
Fully synchronized: yes
Network: ethereum mainnet
Restarted: yes

Machine spec: 8 CPU 64GB ram, parity cache-size= 32GB

We have launched archive node and complete its sync. Today we update the node from 2.4.5. to 2.4.6 and run some software that fetch blocks information.

It's working, but peers connection is unstable and after some time node stop syncing blocks (it just displays Syncing 0.00 0.00 blk/s )

Below is examples of peers pump and down logs and last blocks (reversed order)

peers info



2019-06-05 20:32:52 UTC   12/25 peers     60 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 34380 µs
--
2019-06-05 20:32:12 UTC   10/25 peers     54 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 32475 µs
2019-06-05 20:31:40 UTC   10/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 32475 µs
2019-06-05 20:31:05 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 24494 µs
2019-06-05 20:30:34 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 17362 µs
2019-06-05 20:29:55 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 18661 µs
2019-06-05 20:29:20 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    7 req/s, 17362 µs
2019-06-05 20:28:49 UTC    9/25 peers     52 MiB chain   16 MiB db  0 bytes queue   82 KiB sync  RPC:  1 conn,    1 req/s, 32642 µs
2019-06-05 20:28:15 UTC   10/25 peers     49 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 31118 µs
2019-06-05 20:27:41 UTC    6/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 25000 µs
2019-06-05 20:27:05 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25000 µs
2019-06-05 20:26:33 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 20223 µs
2019-06-05 20:26:00 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   80 KiB sync  RPC:  1 conn,    1 req/s, 33003 µs
2019-06-05 20:26:00 UTC    4/25 peers     48 MiB chain   16 MiB db  0 bytes queue   80 KiB sync  RPC:  1 conn,    1 req/s, 33003 µs
2019-06-05 20:25:11 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33558 µs
2019-06-05 20:24:33 UTC    6/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33558 µs
2019-06-05 20:23:53 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 30569 µs
2019-06-05 20:23:53 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 30569 µs
2019-06-05 20:23:11 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 26161 µs
2019-06-05 20:22:43 UTC Updated conversion rate to Ξ1 = US$248.89 (19132568 wei/gas)
2019-06-05 20:22:40 UTC    7/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 17618 µs
2019-06-05 20:22:16 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 15189 µs
2019-06-05 20:21:37 UTC    8/25 peers     46 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    4 req/s, 21464 µs
2019-06-05 20:21:02 UTC    7/25 peers     44 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25221 µs
2019-06-05 20:21:02 UTC    7/25 peers     44 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25221 µs
2019-06-05 20:20:31 UTC    7/25 peers     43 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25221 µs
2019-06-05 20:19:50 UTC    5/25 peers     43 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    3 req/s, 25221 µs

stuck on block


2019-06-05 20:36:00 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7895893   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  443 MiB sync  RPC:  1 conn,    0 req/s, 46705 µs
2019-06-05 20:35:55 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7893988   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  355 MiB sync  RPC:  1 conn,    1 req/s, 46705 µs
2019-06-05 20:35:50 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7893988   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  176 MiB sync  RPC:  1 conn,    1 req/s, 46705 µs
2019-06-05 20:35:45 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7892083   20/25 peers     62 MiB chain   17 MiB db  0 bytes queue  147 MiB sync  RPC:  1 conn,    1 req/s, 38701 µs
2019-06-05 20:35:40 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  551 MiB sync  RPC:  1 conn,    0 req/s, 38701 µs
2019-06-05 20:35:35 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   19/25 peers     62 MiB chain   17 MiB db  0 bytes queue  401 MiB sync  RPC:  1 conn,    1 req/s, 38701 µs
2019-06-05 20:35:30 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   19/25 peers     62 MiB chain   17 MiB db  0 bytes queue  318 MiB sync  RPC:  1 conn,    1 req/s, 38701 µs
2019-06-05 20:35:25 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   19/25 peers     62 MiB chain   17 MiB db  0 bytes queue  212 MiB sync  RPC:  1 conn,    0 req/s, 33990 µs
2019-06-05 20:35:20 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7884971   17/25 peers     62 MiB chain   17 MiB db  0 bytes queue  140 MiB sync  RPC:  1 conn,    1 req/s, 33990 µs
2019-06-05 20:35:15 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7884971   16/25 peers     61 MiB chain   17 MiB db  0 bytes queue   60 MiB sync  RPC:  1 conn,    2 req/s, 33990 µs
2019-06-05 20:35:10 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7884971   18/25 peers     61 MiB chain   17 MiB db  0 bytes queue   87 KiB sync  RPC:  1 conn,    1 req/s, 33990 µs
2019-06-05 20:35:08 UTC Syncing #7901361 0xd3df…ad5c     0.60 blk/s   59.2 tx/s    4.8 Mgas/s      0+    0 Qed  #7884971   17/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 32341 µs
2019-06-05 20:35:01 UTC Reorg to #7901360 0xba42…c2a0 (0x8d58…2732 #7901358 0x9dc9…e404 0x3b25…123b)
2019-06-05 20:35:00 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    1 Qed  #7901360   13/25 peers     61 MiB chain   17 MiB db   98 KiB queue   80 KiB sync  RPC:  1 conn,    0 req/s, 33990 µs
2019-06-05 20:34:55 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   81 KiB sync  RPC:  1 conn,    1 req/s, 29244 µs
2019-06-05 20:34:51 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:46 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:45 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:45 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   10/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:44 UTC Syncing #7901359 0x8d58…2732     0.06 blk/s    6.6 tx/s    0.4 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 29244 µs
2019-06-05 20:34:30 UTC Imported #7901359 0x8d58…2732 (102 txs, 7.99 Mgas, 709 ms, 29.14 KiB)
2019-06-05 20:34:09 UTC Imported #7901358 0x9dc9…e404 (136 txs, 7.99 Mgas, 632 ms, 34.54 KiB)
2019-06-05 20:34:09 UTC   10/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 23838 µs
2019-06-05 20:33:43 UTC Imported #7901357 0xec21…a3fd (118 txs, 8.00 Mgas, 563 ms, 28.45 KiB)

Also we have usual full node and it's not faced this issue (it's just continue syncing normaly)
After archive node restart it continuous syncing

Is it a bug? Or maybe we can do something to prevent restarting every two hours?

Thanks in advance

The text was updated successfully, but these errors were encountered:

APshenkin · 2019-06-05T21:20:22Z

So found another strange behaviour. It looks like archive node take all 32GB cache. And when it uses all cache then it stucks

jam10o-new · 2019-06-05T21:29:04Z

@APshenkin if you specify a cache_size, parity will try to use all of that as cache, that is expected. Some memory will also be used beyond that during normal operation.

Things I am concerned about -
The node getting stuck after its cache fills
The large reorg that happened right before your node got stuck

APshenkin · 2019-06-05T21:34:51Z

Yes I'm agreed, but it strange, that it stucks each time when cache fills (this was several times today already)

Feel free to ask any other questions

Will post some info here if this will occur again

APshenkin · 2019-06-06T06:38:57Z

@joshua-mir So this occurs again


2019-06-06 02:10:45 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #49400   24/25 peers    410 MiB chain   49 MiB db  0 bytes queue   50 MiB sync  RPC:  1 conn,    2 req/s, 44139 µs
--
2019-06-06 02:10:40 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #49400   24/25 peers    402 MiB chain   49 MiB db  0 bytes queue   45 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:35 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #49400   23/25 peers    381 MiB chain   49 MiB db  0 bytes queue   28 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:10:30 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #46733   23/25 peers    366 MiB chain   49 MiB db  0 bytes queue   18 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:25 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #32382   20/25 peers    351 MiB chain   49 MiB db  0 bytes queue   16 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:10:20 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #57149   22/25 peers    341 MiB chain   49 MiB db  0 bytes queue   24 MiB sync  RPC:  1 conn,    2 req/s, 44139 µs
2019-06-06 02:10:15 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #55119   17/25 peers    340 MiB chain   49 MiB db  0 bytes queue   18 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:10 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #47370   18/25 peers    337 MiB chain   49 MiB db  0 bytes queue   17 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:10:05 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #36835   18/25 peers    334 MiB chain   49 MiB db  0 bytes queue   18 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:00 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #54610   18/25 peers    333 MiB chain   49 MiB db  0 bytes queue   29 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:09:55 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #54610   12/25 peers    328 MiB chain   49 MiB db  0 bytes queue   27 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:09:50 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #54610   11/25 peers    328 MiB chain   49 MiB db  0 bytes queue   13 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:09:45 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #50805   10/25 peers    327 MiB chain   49 MiB db  0 bytes queue   11 MiB sync  RPC:  1 conn,    0 req/s, 44102 µs
2019-06-06 02:09:40 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #45214    9/25 peers    327 MiB chain   49 MiB db  0 bytes queue   12 MiB sync  RPC:  1 conn,    1 req/s, 44102 µs
2019-06-06 02:09:35 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #38751    9/25 peers    326 MiB chain   49 MiB db  0 bytes queue    9 MiB sync  RPC:  1 conn,    0 req/s, 43872 µs
2019-06-06 02:09:30 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #35941    8/25 peers    326 MiB chain   49 MiB db  0 bytes queue    7 MiB sync  RPC:  1 conn,    1 req/s, 43872 µs
2019-06-06 02:09:25 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #29726    6/25 peers    325 MiB chain   49 MiB db  0 bytes queue    8 MiB sync  RPC:  1 conn,    0 req/s, 43800 µs
2019-06-06 02:09:20 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #25527    6/25 peers    324 MiB chain   49 MiB db  0 bytes queue    5 MiB sync  RPC:  1 conn,    2 req/s, 43800 µs
2019-06-06 02:09:15 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #19685    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    4 MiB sync  RPC:  1 conn,    2 req/s, 43800 µs
2019-06-06 02:09:10 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #16025    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    3 MiB sync  RPC:  1 conn,    1 req/s, 43800 µs
2019-06-06 02:09:05 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #11811    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    2 MiB sync  RPC:  1 conn,    1 req/s, 43800 µs
2019-06-06 02:09:00 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #9525    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    2 MiB sync  RPC:  1 conn,    1 req/s, 43328 µs
2019-06-06 02:08:55 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #6731    3/25 peers    324 MiB chain   49 MiB db  0 bytes queue    1 MiB sync  RPC:  1 conn,    1 req/s, 43328 µs
2019-06-06 02:08:50 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #4462    3/25 peers    324 MiB chain   49 MiB db  0 bytes queue    1 MiB sync  RPC:  1 conn,    2 req/s, 40224 µs
2019-06-06 02:08:45 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed      #896    5/25 peers    320 MiB chain   49 MiB db  0 bytes queue    1 MiB sync  RPC:  1 conn,    1 req/s, 40224 µs
2019-06-06 02:08:40 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed        #0    3/25 peers    320 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 40224 µs
2019-06-06 02:08:35 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7902859    3/25 peers    319 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33841 µs
2019-06-06 02:08:10 UTC    0/25 peers    319 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33841 µs
2019-06-06 02:08:00 UTC Imported #7902878 0x4a3b…110b (184 txs, 6.98 Mgas, 688 ms, 24.98 KiB)
2019-06-06 02:07:46 UTC Imported #7902877 0xb5de…c514 (35 txs, 3.82 Mgas, 224 ms, 7.07 KiB) + another 1 block(s) containing 207 tx(s)
2019-06-06 02:07:31 UTC    1/25 peers    318 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33841 µs
2019-06-06 02:07:20 UTC Imported #7902874 0xc496…7548 (129 txs, 7.99 Mgas, 3102 ms, 20.17 KiB) + another 6 block(s) containing 765 tx(s)
2019-06-06 02:07:07 UTC Reorg to #7902867 0xd356…d615 (0x8082…84ab #7902866 0x794a…7153 )
2019-06-06 02:06:55 UTC    0/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 38258 µs
2019-06-06 02:06:24 UTC    1/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 38258 µs
2019-06-06 02:05:32 UTC Imported #7902867 0x8082…84ab (62 txs, 7.99 Mgas, 415 ms, 7.37 KiB)
2019-06-06 02:05:31 UTC    4/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    3 req/s, 30119 µs
2019-06-06 02:05:31 UTC    4/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    3 req/s, 30119 µs
2019-06-06 02:05:31 UTC Imported #7902866 0x794a…7153 (138 txs, 7.99 Mgas, 582 ms, 20.54 KiB)
2019-06-06 02:05:19 UTC Imported #7902862 0x91d9…c746 (93 txs, 8.01 Mgas, 894 ms, 15.32 KiB) + another 1 block(s) containing 139 tx(s)
2019-06-06 02:05:04 UTC Imported #7902860 0x34ce…49cc (203 txs, 8.00 Mgas, 564 ms, 34.38 KiB)
2019-06-06 02:04:40 UTC    4/25 peers    317 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 21974 µs
2019-06-06 02:04:40 UTC    4/25 peers    317 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 21974 µs
2019-06-06 02:04:40 UTC Imported #7902859 0x66cb…b7a0 (150 txs, 8.00 Mgas, 572 ms, 18.83 KiB)
2019-06-06 02:04:26 UTC Imported #7902858 0xbaed…6107 (242 txs, 7.92 Mgas, 504 ms, 36.30 KiB)
2019-06-06 02:04:11 UTC    4/25 peers    317 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 18733 µs

And in this case parity wasn't use all memory

APshenkin · 2019-06-06T06:58:11Z

Also maybe it will help somehow: We use trace_replayBlockTransactions API for each new generated block.

jam10o-new · 2019-06-06T07:01:57Z

that's an extremely heavy rpc method - it could very well be one of the causes of the problem here if you see it taking a while to return a response.

APshenkin · 2019-06-06T07:04:47Z

No, It's completes very fast. E.g. after syncing node after restart it completes ~ 150 success request per minute

APshenkin · 2019-06-06T10:16:54Z

It's strange, but looks like increase disk space solve the issue.
There were ~120 GB free space. After increasing to ~ 500 GB issue was solved.
Our parity node doesn't stuck 4 hours already (previosly it stucks in 1-2 hours)

Also it's strange, but after increating disk space Active memory value was droped down 🤔

Will continue monitoring

jam10o-new · 2019-06-06T10:22:58Z

Just to warn you, 500gb will not be sufficient for an archive node (once you are in sync) - you will need upwards of 2tb. You don't want to hit a full disk either because it's a known issue that full disks can cause database corruption.

APshenkin · 2019-06-06T10:24:25Z

Our node is already synced up 😄
Current disk size 3.328Tb

parity folder size 2.6Tb

jam10o-new · 2019-06-06T10:25:30Z

Ah, you increased by 500gb, not to 500gb, misunderstood that 😅

APshenkin · 2019-06-06T10:26:09Z

Yes, sorry for that 🤷‍♂️

APshenkin · 2019-06-06T16:02:12Z

So 9 hours without stopping blocks import!
I still think that it's strange behaviour (there was ~128 GB free disk space) but parity started working well only after increasing this value to ~ 500GB

lacasian · 2019-06-07T11:08:15Z

So 9 hours without stopping blocks import!
I still think that it's strange behaviour (there was ~128 GB free disk space) but parity started working well only after increasing this value to ~ 500GB

We've got more than 1TB free and I've seen parity getting stuck this morning. Doesn't seem like the free space has any influence on the issue.

APshenkin · 2019-06-07T23:03:49Z

Agreed. Today we rebooted our node and after this we noticed, that after blocks syncing it lost all peers and stucks again (

dafky2000 · 2019-06-12T11:46:38Z

Related to #10626. I am having the exact same issue - 800GB free disk on 2 x 2TB SSD in raid0.

iFA88 · 2019-06-17T05:23:17Z

Hi! I have like the same issue with my archive node, but after some time the peers come back and the parity prints some empty newlines instead of status message.

Here is the log, the issue starts around 2019-06-17 04:21:49, please take a look:
https://www.fusionsolutions.io/doc/ethlog.tar.gz

This happens weekly 1-2 times.

iFA88 · 2019-06-17T19:25:10Z

Today the node has stopped about 10 times. I have added reversed peer which points to my other parity node on the same server, so have "survived" many peer drop period. But now has been stopped syncing:

2019-06-17 21:19:57  Syncing #7977726 0x4195…29fe     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+   20 Qed  #7977745   42/256 peers    482 MiB chain   80 MiB db    3 MiB queue  155 KiB sync  RPC:  0 conn,   97 req/s,
2019-06-17 21:19:57  Syncing #7977726 0x4195…29fe     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+   20 Qed  #7977745   42/256 peers    482 MiB chain   80 MiB db    3 MiB queue  155 KiB sync  RPC:  0 conn,   97 req/s,

htop says that the node uses a lot of disk r/w:

NI   PID USER      PRI  VIRT   RES CPU% MEM%   TIME+     DISK R/W NLWP Command
  0  5744 nodes      20 15.5G 12.1G 31.0 19.2  3h23:18  136.54 M/s   66 parity --reserved-p

Only the restart helps, if the devs needs any trace i can help.

grbIzl · 2019-06-20T11:25:30Z

@iFA88 @APshenkin can you please reproduce the issue with this trace enabled: -l sync=trace (add this to your Parity's startup parameters)? Thanks!

iFA88 · 2019-06-20T14:19:11Z

@iFA88 @APshenkin can you please reproduce the issue with this trace enabled: -l sync=trace (add this to your Parity's startup parameters)? Thanks!

Yes, I ran now with that parameter.

When i use the following parameters --interface local --no-discovery --reserved-only and give reversed peer to my other full node, then it works fine.

UPDATE
May this flag is not an option.. that generates too much logs (2 mb / min) no one ever will read that, even i don't see if hangs or not.

grbIzl · 2019-06-21T08:54:59Z

May this flag is not an option.. that generates too much logs (2 mb / min) no one ever will read that, even i don't see if hangs or not.

I will try to manage it.

iFA88 · 2019-06-21T15:25:08Z

About 24 hour log: https://www.fusionsolutions.io/doc/paritylog.tar.gz (1.2gb)
The node is currently in sync, i don't see when the node loses the sync during the time.

grbIzl · 2019-06-21T18:00:05Z

@iFA88 Thanks for the log! As I can see, the sync process in it is happening without stuck:

The logging ends at 2019-06-21 17:21:40
The last imported block before it was at:
2019-06-21 17:21:33 Verifier ChainConfig #11 INFO import Imported #8002255 0x63ba…08ed (181 txs, 7.99 Mgas, 1412 ms, 33.55 KiB)
Your node seems to be fully synced (the last block on mainnet right now is Trie #8,002,908)

iFA88 · 2019-06-21T18:11:42Z

I have written now a script what gets the current block in every minute from my nodes, example:
[2019-06-21 20:05:36.065820] OK [archive: 8002961] [full: 8002961] [peers: 30/50]
When its stuck we can see them easy.

iFA88 · 2019-06-24T08:28:17Z

Hey, I think i have reproduced a little sister of the issue:
https://www.fusionsolutions.io/doc/archivelog.tar.gz
Little sister, because in 5 mins the sync has continued, but the peers has been dropped/disconnected. I hope it helps something. Previous when the sync has been stopped and all peer dropped, then I got only peers who not synced full for hours.

grbIzl · 2019-07-16T15:15:11Z

@iFA88 can you please help with testing of the next patch? This time I'd like to apply the patch to your full node, the binary is here: https://gitlab.parity.io/parity/parity-ethereum/-/jobs/185203/artifacts/download Please keep the previously patched version for the archive node (if possible) and run both nodes with the same trace parameters as before (sync for archive, usual logging for full). In this build (except patch itself) one more log message added, that can spam your full node's log a little bit. I hope, that it's ok for you.

iFA88 · 2019-07-17T08:58:17Z

So its something went very wrong..
What I have now:
archiveNode => specParity (Parity-Ethereum/v2.7.0-nightly-b82b0ad-20190709/x86_64-linux-gnu/rustc1.36.0)
fullNode => specialFullParity (Parity-Ethereum/v2.7.0-nightly-5277eb3-20190716/x86_64-linux-gnu/rustc1.36.0)

Logs:
https://www.fusionsolutions.io/doc/speclog.tar.gz

Result:
The fullNode works with few new log output. The archive node runs for few mins and quits, I have tried restart again and again, but nothing helps.

grbIzl · 2019-07-17T10:26:25Z

@iFA88 Thank again! Your help is invaluable! Is it possible to launch a short log session with both nodes patched and sync trace enabled on both as well? Just run them both and wait 1-2 mins

iFA88 · 2019-07-17T10:34:51Z

Here is:
https://www.fusionsolutions.io/doc/specissue2.tar.gz

grbIzl · 2019-07-17T10:51:54Z

@iFA88 you're saying, that archive node quits? Can you please explain in more details? Does it panic? How does it look like? From the logs it seems, that the node was simply shut down after several seconds :-(

iFA88 · 2019-07-17T11:32:16Z

:( https://gist.github.com/iFA88/10222ceb5e1ac5d62b32258804d670bd

iFA88 · 2019-07-17T11:38:01Z

I will try recover my parity database. May that was for that I can only shut down the parity with SIGKILL..

UPDATE:
I need resync from scratch, it will take about 10 days when sync issue doesn't appears.

grbIzl · 2019-07-17T12:18:29Z

I need resync from scratch, it will take about 10 days when sync issue doesn't appears.

:-( I would also recommend check your disk and memory, in order to make sure, that it was not the reason for the corruption (fsck and memtest).

iFA88 · 2019-08-02T13:21:53Z

Still syncing.. It is on #5805037 block now.. At least going without issue :)

grbIzl · 2019-08-13T10:21:00Z

Hey @iFA88 ! Has the sync completed?

iFA88 · 2019-08-13T10:46:09Z

@grbIzl Still syncing.. best block is 6743002 now.

iFA88 · 2019-08-13T11:06:49Z

Good that the node has cached many blocks, because the archive node drops lot of times my full node. I database is now 1.7T, so about 15% missing from the full size.. about 6-10 days.

mnaboka · 2019-08-20T16:17:11Z

hey @APshenkin , i am facing the exact same problem with parity.
I noticed you have a nice graph of cache usage , would you mind sharing what tools did you use for this graph. Thanks!

iFA88 · 2019-08-24T14:54:18Z

I got critical error which i can not handle during syncing.
2019-08-24 16:50:16 Block import failed for #7418712 (0xd32f…fab9): Parent not found (0xeef9…5f22)
I have put all log to the archive file and i have also put the rocksdb LOG file:
https://www.fusionsolutions.io/doc/archiveEth.tar.gz

Update:
The database has anyhow corrupted. Now they try to repair it. Disks and ram are okay.

Update 2:
There is no chance to repair the database, i need sync from the beginning.. Tried several times, but got only this:

2019-08-24 20:11:05  DB has been previously marked as corrupted, attempting repair
2019-08-24 23:47:35  DB corrupted: Invalid argument: You have to open all column families. Column families not opened: col6, col5, col1, col0, col3, col4, col2, attempting repair

grbIzl · 2019-08-26T14:07:50Z

I got critical error which i can not handle during syncing.
2019-08-24 16:50:16 Block import failed for #7418712 (0xd32f…fab9): Parent not found (0xeef9…5f22)
I have put all log to the archive file and i have also put the rocksdb LOG file:
https://www.fusionsolutions.io/doc/archiveEth.tar.gz

It's very sad to hear :-( It seems, that you bumped the bug in rocksDB implementation facebook/rocksdb#3509 We see its sometimes, but still not able to address. I'm thinking, what I can do in order to help with resolution, but frankly there are not so many options available now (and all of them are not quick)

iFA88 · 2019-09-14T08:31:28Z

Greetings! My archive node still syncing, but now the same issue has happened on my full node:

https://www.fusionsolutions.io/doc/ethlog.tar.gz

The issue has started on 2019-09-14 00:12:22 The node has dropped all peers and then probably they had requested ancient blocks from the new peers..

This is the first time that this happened with my full node.
Parity-Ethereum/v2.5.7-stable-6bd7db9-20190829/x86_64-linux-gnu/rustc1.36.0

grbIzl · 2019-09-16T13:06:56Z

The issue has started on 2019-09-14 00:12:22 The node has dropped all peers and then probably they had requested ancient blocks from the new peers..

Yes, I see this stuck in logs, but I cannot say, what caused it without corresponding log targets enabled. I suggest for the sake of simplicity to track this problem with the full node separately and create another issue, if it repeats.

iFA88 · 2019-09-16T13:55:10Z

Sadly it is very hard to reproduce with the full node. I think the problem is the same for both, If my archive node finish with the sync then i can apply trace outputs. I need at least 30 days for that.. :(

iFA88 · 2019-10-23T14:29:25Z

My archive node has finished syncing (Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.36.0) After 20 min after the sync was done the issue has arrived. We can now talk about how we can debug this bug.

grbIzl · 2019-10-23T15:42:31Z

My archive node has finished syncing (Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.36.0) After 20 min after the sync was done the issue has arrived. We can now talk about how we can debug this bug.

Ok. For the start, can you please confirm, which exact versions run on your both nodes (full and archive)? Some time ago I made patches for both of them, did you apply these patches? If not, please don't for now. I'll try to figure out first, if it's possible to help you with backup.

iFA88 · 2019-10-23T16:24:56Z

I run Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.36.0 for all nodes. I can backup my full node, but I can not do any backup for my archive node because it is 3.2t. Yes I can apply patched clients and run it.

grbIzl · 2019-10-24T09:05:17Z

@iFA88 would it be ok for you to use our backup (made and backuped on Parity archive node)? We use 2.5.8 there, but the diff doesn't affect the db

iFA88 · 2019-10-24T09:06:38Z

When i understand correctly, you want download my archive node db and use it in your local?

grbIzl · 2019-10-24T09:10:25Z

Actually vice versa. If anything happens on your archive node, you may download backup from us and apply it on your node.

iFA88 · 2019-10-24T09:14:58Z

Alright, please, let we talk on gitter.

APshenkin changed the title ~~Archive node loses peers (stop sincyng blocks)~~ Archive node loses peers (stop syncing blocks) Jun 5, 2019

jam10o-new added F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M4-core ⛓ Core client code / Rust. labels Jun 5, 2019

jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. and removed F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. labels Jun 6, 2019

ysbae0928 mentioned this issue Jul 17, 2019

Synchronization is not working. #10882

Closed

This was referenced Nov 15, 2019

Treat blocks, that already downloaded, as synced #11264

Merged

Issues during block sync #11265

Merged

adria0 added the A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 27, 2020

adria0 closed this as completed Jul 27, 2020

Archive node loses peers (stop syncing blocks) #10724

Archive node loses peers (stop syncing blocks) #10724

Comments

APshenkin commented Jun 5, 2019

APshenkin commented Jun 5, 2019

jam10o-new commented Jun 5, 2019

APshenkin commented Jun 5, 2019

APshenkin commented Jun 6, 2019

APshenkin commented Jun 6, 2019 • edited Loading

jam10o-new commented Jun 6, 2019

APshenkin commented Jun 6, 2019 • edited Loading

APshenkin commented Jun 6, 2019

jam10o-new commented Jun 6, 2019

APshenkin commented Jun 6, 2019 • edited Loading

jam10o-new commented Jun 6, 2019

APshenkin commented Jun 6, 2019

APshenkin commented Jun 6, 2019 • edited Loading

lacasian commented Jun 7, 2019

APshenkin commented Jun 7, 2019

dafky2000 commented Jun 12, 2019

iFA88 commented Jun 17, 2019

iFA88 commented Jun 17, 2019

grbIzl commented Jun 20, 2019

iFA88 commented Jun 20, 2019 • edited Loading

grbIzl commented Jun 21, 2019

iFA88 commented Jun 21, 2019

grbIzl commented Jun 21, 2019

iFA88 commented Jun 21, 2019

iFA88 commented Jun 24, 2019

grbIzl commented Jul 16, 2019

iFA88 commented Jul 17, 2019 • edited Loading

grbIzl commented Jul 17, 2019

iFA88 commented Jul 17, 2019

grbIzl commented Jul 17, 2019

iFA88 commented Jul 17, 2019

iFA88 commented Jul 17, 2019 • edited Loading

grbIzl commented Jul 17, 2019

iFA88 commented Aug 2, 2019 • edited Loading

grbIzl commented Aug 13, 2019

iFA88 commented Aug 13, 2019

iFA88 commented Aug 13, 2019

mnaboka commented Aug 20, 2019

iFA88 commented Aug 24, 2019 • edited Loading

grbIzl commented Aug 26, 2019

iFA88 commented Sep 14, 2019

grbIzl commented Sep 16, 2019

iFA88 commented Sep 16, 2019

iFA88 commented Oct 23, 2019

grbIzl commented Oct 23, 2019

iFA88 commented Oct 23, 2019

grbIzl commented Oct 24, 2019

iFA88 commented Oct 24, 2019

grbIzl commented Oct 24, 2019

iFA88 commented Oct 24, 2019

APshenkin commented Jun 6, 2019 •

edited

Loading

APshenkin commented Jun 6, 2019 •

edited

Loading

APshenkin commented Jun 6, 2019 •

edited

Loading

APshenkin commented Jun 6, 2019 •

edited

Loading

iFA88 commented Jun 20, 2019 •

edited

Loading

iFA88 commented Jul 17, 2019 •

edited

Loading

iFA88 commented Jul 17, 2019 •

edited

Loading

iFA88 commented Aug 2, 2019 •

edited

Loading

iFA88 commented Aug 24, 2019 •

edited

Loading