Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Low peer count, dropping to 0: "Handler initialization process is too long" #425

Closed
soc1c opened this issue Sep 9, 2019 · 11 comments
Closed

Comments

@soc1c
Copy link
Contributor

soc1c commented Sep 9, 2019

2019-09-09 13:11:11 Imported #243357 (0x475e…d709)                                                                                                                                                                                                                                                                [1366/1579]
2019-09-09 13:11:11 Starting parachain attestation session on top of parent 0x475e055e6cf897817dabec87d149e3b20a5c4e00775cb042143d451c52f6d709. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:11:12 Imported #243358 (0x0dca…63f1)                                                                                                                                                                                                                                                                           
2019-09-09 13:11:12 Starting parachain attestation session on top of parent 0x0dca7a69ca6518f7935e313fcf3ca698f551b2af9d2d210d0d206b311a1063f1. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:11:13 Imported #243359 (0xcad6…66e4)                                                                                                                                                                                            
2019-09-09 13:11:13 Starting parachain attestation session on top of parent 0xcad60164500529dcd83fefffb06762a12fceacde41e593d763e22c764ebb66e4. Local parachain duty is None                                                                  
2019-09-09 13:11:33 Handler initialization process is too long with PeerId("QmWtqkR1SSpCevMfxtdwbD1FJpBHRvX3x4PoFk4VEzUWJJ")                                                                                                                  
2019-09-09 13:11:33 Handler initialization process is too long with PeerId("QmSJhfx5Hta9PKhbByxkFz8j4D3anmf94Mi8uD4tsGkkvQ")                                                                                                                  
2019-09-09 13:11:33 Handler initialization process is too long with PeerId("QmchDJtEGiEWf7Ag58HNoTg9jSGzxkSZ23VgmF6xiLKKsZ")                                                                                                                  
2019-09-09 13:11:33 Idle (2 peers), best: #243359 (0xcad6…66e4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                     
2019-09-09 13:11:33 Idle (2 peers), best: #243359 (0xcad6…66e4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                     
2019-09-09 13:11:33 Idle (2 peers), best: #243359 (0xcad6…66e4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                     
2019-09-09 13:11:33 Idle (2 peers), best: #243359 (0xcad6…66e4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                     
2019-09-09 13:11:33 Idle (2 peers), best: #243359 (0xcad6…66e4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                     
2019-09-09 13:11:34 Imported #243360 (0x0071…93ec)                                                                                                                                                                                            
2019-09-09 13:11:35 Starting parachain attestation session on top of parent 0x00717c498ce9c0a134d914fe894eaaeb86a0d5d542285381ffbc2be411e893ec. Local parachain duty is None                                                                  
2019-09-09 13:11:36 Imported #243361 (0x6010…1390)                                                                                                                                                                                            
2019-09-09 13:11:36 Starting parachain attestation session on top of parent 0x6010f2eaaedcbf270fe874703778f48b241ac04a81aa6a3fcb19cdc7b8af1390. Local parachain duty is None                                                                  
2019-09-09 13:11:37 Imported #243362 (0xb961…95bc)                                                                                                                                                                                            
2019-09-09 13:11:37 Starting parachain attestation session on top of parent 0xb96199ef2375b798311d79fdfabe5d72c780b899a04a06cbb3f1b2ba3d0395bc. Local parachain duty is None                                                                  
2019-09-09 13:11:38 Imported #243363 (0x7ed5…c2c4)                                                                                                                                                                                            
2019-09-09 13:11:38 Starting parachain attestation session on top of parent 0x7ed55271d8a7806470657f0f3d1ddcbb7d3fd5d747ff8dc4e1a17503f0e9c2c4. Local parachain duty is None                                                                  
2019-09-09 13:11:58 Handler initialization process is too long with PeerId("QmW9PtASCawTG2SzsTnqEasaAv1FVjM7HJ56dZFyBKBuiC")                                                                                                                  
2019-09-09 13:11:58 Handler initialization process is too long with PeerId("QmchDJtEGiEWf7Ag58HNoTg9jSGzxkSZ23VgmF6xiLKKsZ")                                                                                                                  
2019-09-09 13:11:58 Handler initialization process is too long with PeerId("QmeUNz2M6pfgWYUMxySWDjTRWgpXUkB1dCr4qm6nFmEtST")                                                                                                                  
2019-09-09 13:11:58 Handler initialization process is too long with PeerId("QmeCit3Nif4VfNqrEJsdYHZGcKzRCnZvGxg6hha1iNj4mk")                                                                                                                  
2019-09-09 13:11:58 Idle (2 peers), best: #243363 (0x7ed5…c2c4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 19 B/s                                                                                                                                
2019-09-09 13:11:58 Idle (2 peers), best: #243363 (0x7ed5…c2c4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 19 B/s                                                                                                                                
2019-09-09 13:11:58 Idle (2 peers), best: #243363 (0x7ed5…c2c4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 19 B/s                                                                                                                                
2019-09-09 13:11:58 Idle (2 peers), best: #243363 (0x7ed5…c2c4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 19 B/s                                                                                                                                
2019-09-09 13:11:58 Idle (2 peers), best: #243363 (0x7ed5…c2c4), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 19 B/s                                                                                                                                
2019-09-09 13:11:59 Reorg from #243363,0x7ed5…c2c4 to #243363,0xe102…2561, common ancestor #243362,0xb961…95bc                                                                                                                                
2019-09-09 13:11:59 Imported #243363 (0xe102…2561)                                                                                                                                                                                            
2019-09-09 13:11:59 Starting parachain attestation session on top of parent 0xe10286d7216f25ce6f4c0f5091f56f0db7c91aa3d4ff7523fc4de7ca0b952561. Local parachain duty is None                                                                  
2019-09-09 13:12:00 Imported #243364 (0x1b3d…cd58)                                                                                                                                                                                            
2019-09-09 13:12:00 Starting parachain attestation session on top of parent 0x1b3d2a7a377552aa172506aac8cff7699b1a8418f80e18bfd2cdab5c10ebcd58. Local parachain duty is None                                                                  
2019-09-09 13:12:01 Imported #243365 (0x8bc7…62f0)                                                                                                                                                                                            
2019-09-09 13:12:02 Starting parachain attestation session on top of parent 0x8bc7ce7a9956346ad5e8402e3bb159a194d7d9fb2bf77c92708a913dc60c62f0. Local parachain duty is None                                                                  
2019-09-09 13:12:03 Imported #243366 (0x9f2d…2b95)                                                                                                                                                                                            
2019-09-09 13:12:03 Starting parachain attestation session on top of parent 0x9f2dc9f9f219ff115e3b6bd48e7c71c064704bc95a83f0e6c28f5797969e2b95. Local parachain duty is None                                                                  
2019-09-09 13:12:04 Imported #243367 (0x973a…98f2)                                                                                                                                                                                            
2019-09-09 13:12:04 Starting parachain attestation session on top of parent 0x973a3d8642e1917d30b235242f9ba020ced18ab3c652b8bd7e89367a165498f2. Local parachain duty is None                                                                  
2019-09-09 13:12:29 Handler initialization process is too long with PeerId("QmR2Jdd4V93yzTDyX5gKQ7PE4M2KsbK3vjC8RvXCnzqHU6")                                                                                                                                                    
2019-09-09 13:12:29 Handler initialization process is too long with PeerId("QmYXkwkJAg3fAnGEwLEzhiWCqxrf11M7KmZDHrnwhzzng4")                                                                                                                                                                                                 
2019-09-09 13:12:29 Handler initialization process is too long with PeerId("QmdCgjyRf3aKvHv7FCFA6S6Tk4mJaJ5aX2Kidf2FP4Kb8H")                                                                                                                                                    
2019-09-09 13:12:29 Libp2p => Query for [18, 32, 118, 152, 186, 116, 187, 158, 152, 28, 147, 85, 197, 59, 98, 23, 192, 135, 163, 247, 232, 208, 244, 91, 160, 153, 176, 126, 203, 80, 107, 213, 191, 85] timed out with 4 results                                                                                            
2019-09-09 13:12:29 Idle (1 peers), best: #243367 (0x973a…98f2), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                       
2019-09-09 13:12:29 Idle (1 peers), best: #243367 (0x973a…98f2), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                       
2019-09-09 13:12:29 Idle (1 peers), best: #243367 (0x973a…98f2), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                       
2019-09-09 13:12:29 Idle (1 peers), best: #243367 (0x973a…98f2), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                       
2019-09-09 13:12:29 Idle (1 peers), best: #243367 (0x973a…98f2), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:12:29 Idle (1 peers), best: #243367 (0x973a…98f2), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:12:31 Imported #243368 (0x9199…946e)                                                                                                            
2019-09-09 13:12:31 Starting parachain attestation session on top of parent 0x9199afb56ea8d5dd7c28ad32d4410d62c57d33d01c41801b4d4d0883ba70946e. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:12:32 Imported #243369 (0xa9f3…b4ed)                                                                                                            
2019-09-09 13:12:32 Starting parachain attestation session on top of parent 0xa9f3ce99726782b7b761f56dd317d02b12b012bdf690ea033ebedea234a5b4ed. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:12:33 Imported #243370 (0x191b…5ae5)                                                                                                            
2019-09-09 13:12:33 Starting parachain attestation session on top of parent 0x191b9ea6706075bb70425c17be63a1b53b7e86ae8f4301b6915b65f4027c5ae5. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:12:35 Imported #243371 (0xf8fc…30fe)                                                                                                                                                                                                                                                                           
2019-09-09 13:12:35 Starting parachain attestation session on top of parent 0xf8fca1998def048057e8ceb30e316abb86e949c5bdd075c15d2037e2b05f30fe. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:12:36 Imported #243372 (0xfc0d…2d3d)                                                                                                                                                                                                                                                                           
2019-09-09 13:12:36 Starting parachain attestation session on top of parent 0xfc0da8e79ba5fa6db0ac4c451e488bc15e68c9e792792ec501945dd3968f2d3d. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:12:37 Imported #243373 (0xb46a…93c1)                                                                                                            
2019-09-09 13:12:37 Starting parachain attestation session on top of parent 0xb46a445927ad520bb313971c3e4f3eb1cac51295b022c23c5c15dcd7668e93c1. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:13:07 Handler initialization process is too long with PeerId("QmeUNz2M6pfgWYUMxySWDjTRWgpXUkB1dCr4qm6nFmEtST")                                                                                                                                                                                                 
2019-09-09 13:13:07 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 0 ⬆ 0                                                                                                                                                                                                                    
2019-09-09 13:13:08 Idle (0 peers), best: #243373 (0xb46a…93c1), finalized #204216 (0x8743…6ec7), ⬇ 3.5kiB/s ⬆ 1.8kiB/s                                                                                                                                                                                                      
2019-09-09 13:13:13 Idle (4 peers), best: #243377 (0x591f…0d7a), finalized #204216 (0x8743…6ec7), ⬇ 16.3kiB/s ⬆ 9.6kiB/s                                                                                                                                                                                                     
2019-09-09 13:13:15 Imported #243380 (0xa901…0639)                                                                                                            
2019-09-09 13:13:15 Starting parachain attestation session on top of parent 0xa901cf19f0a8a654bc119661fbee1a70764db8817bc8791e21dc53988be80639. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:13:21 Idle (1 peers), best: #243380 (0xa901…0639), finalized #204216 (0x8743…6ec7), ⬇ 0.9kiB/s ⬆ 0.1kiB/s                                                                                                                                                                                                      
2019-09-09 13:13:22 Imported #243381 (0x0c7e…5af2)                                                                                                            
2019-09-09 13:13:22 Starting parachain attestation session on top of parent 0x0c7ea60e78ecbea34135efc5354b616ee11259a7d2515256028876d36fc45af2. Local parachain duty is None                                                                                                                                                 
2019-09-09 13:13:28 Handler initialization process is too long with PeerId("QmZ3wqG9vxJY8eWxNGEWSDf9MtYtti3D5jrRT5He2BWrEp")                                                                                                                                                                                                 
2019-09-09 13:13:28 Idle (5 peers), best: #243381 (0x0c7e…5af2), finalized #204216 (0x8743…6ec7), ⬇ 1.3kiB/s ⬆ 0.8kiB/s                                                                                                                                                                                                      
2019-09-09 13:13:28 Idle (5 peers), best: #243381 (0x0c7e…5af2), finalized #204216 (0x8743…6ec7), ⬇ 1.3kiB/s ⬆ 0.8kiB/s   

Running latest 0.5.1: 60155e8 on Kusama

[user@host ~]$ polkadot --version
polkadot 0.5.1-60155e89-x86_64-linux-gnu

[user@host ~]$ uname -a
Linux 5.2.11-arch1-1-ARCH #1 SMP PREEMPT Thu Aug 29 08:09:36 UTC 2019 x86_64 GNU/Linux
@bkchr
Copy link
Member

bkchr commented Sep 13, 2019

CC @tomaka

@JohnnySheffield
Copy link

Observing a similar issue, peer count dropped from 110 to 0, a lot of Handler initialization process is too long and Writing a telemetry log took more than 50ms.

Running from docker image parity/polkadot:v0.6.2

Logs attached:
polkadot-28-99kzg.log

@tomaka
Copy link
Contributor

tomaka commented Oct 22, 2019

Handler initialization process is too long and Writing a telemetry log took more than 50ms both indicate that these operations took longer than expected.
However these operations (especially the telemetry log) don't require any lock, except maybe a very short internal Mutex as an implementation detail of sending on a channel.

I'm not sure what is happening here, except for all CPUs being very busy.

@JohnnySheffield
Copy link

JohnnySheffield commented Oct 22, 2019

I've left the node for a bit and it recovered, peer count is up, caught up with the network. From the metrics i can see there is enough cores available, attaching the screenshot where the cpu and network usage drop is visible while the Handler initialization process is too long issues were present.
Screenshot from 2019-10-22 11-50-13

@tomaka
Copy link
Contributor

tomaka commented Oct 22, 2019

Do you have an idea how to reproduce the issue, or is it just random?
Are you validating? Or just a regular node?

@JohnnySheffield
Copy link

No idea how to reproduce, the node was running normally, then it started to misbehave, then recovered. While googling for the errors, found this issue. It's an regular, non-validating node.

@tomaka
Copy link
Contributor

tomaka commented Oct 22, 2019

I strongly suspect that this is not a networking issue but an issue coming from the fact that all CPUs suddenly are busy.

This issue is quite worrisome, but since we don't know the cause, I don't really know how to go forward except adding diagnostics everywhere.

@tomaka
Copy link
Contributor

tomaka commented Oct 22, 2019

One possible way to debug would be to apply this: https://github.com/tomaka/futures-diagnose-exec/

@gavofyork
Copy link
Member

Is this still a problem?

@tomaka
Copy link
Contributor

tomaka commented Nov 13, 2019

Yes. It happens every 3-6 hours or so and lasts a minute.
I haven't been able to figure out where this comes from so far, but I suspect a call to the WASM engine from the networking future that takes a long time.

@gavofyork
Copy link
Member

Please upgrade to 0.7 series and reopen if still an issue.

imstar15 pushed a commit to imstar15/polkadot that referenced this issue Aug 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants