-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geth signing stops after a period of time #16406
Comments
Quick recap on sealing in Clique:
There's a slight race condition possible if your in-turn signer is offline. E.g. you have 3 signers, and the in-turn one goes offline. In that case you need both the other signers to work in lockstep to progress the chain. But since both are out-of-turn, they will both mine blocks with 1 diff. Given a good timing, those blocks will be mined simultaneously, and both will be stuck on their own little world. Could you provide some more infos about your setup, maybe th eligs from each individual signer when they get stuck? That might shed some light on the issue. |
Thanks for getting back and sure no problem I can provide logs output should I set verbosity to 4 and output to file ? My setup is basically 5 AWS EC2 instnaces, each running with Clique, genesis created with puppeth and a dedicated bootnode server also, 1 standard geth install with nothing fancy just the usual rpc commands much the same as the example signer command above. just running with verbosity 4 and i see this
at the moment all 3 signers have stopped, is there anywhere in particular where i can check for why signers fall over or a specific kind of error when nothing is coming up on the screen. I can send the 4 logs of output to you somehow if you need to see them |
just stopped and started all 3 AWS signers and the 1 signer i have locally.. i started them back up again and see the block numbers in the output but they are all frozen like this, seems like 2 are stuck on block 2172 and 2 are stuck on block 2171 signer 1 (AWS)
signer 2 (local instance)
node 3 (AWS)
node 4 (AWS)
|
Ref clique-seal of v1.8.11, I think there is no an effective mechanism to |
I'm also facing the same issue with 3 signers. Will the possibility of the occurrence of the race condition mentioned be reduced with more signers? |
Experimenting the same issue with 6 sealers, i restarted each node but now im get stucked in
Btw, some nodes are stucked on the 488677 and others are on 488676, same issue of @lyhbarry Also, after 1 hour, the issue of local transactions is showed again
Any ideas? My current version is:
I dont know why the
error happens.. but since is the same issue as @lyhbarry is pretty sure that error makes the chain out of sync and now the signers can continue Is there any information i can provide? The blockchain was running for 2 months without any issues.. Also, i see this message in one signer:
And this information
|
@REPTILEHAUS it appears that you have some sort of network partition, since two sealers are on |
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have more relevant information or answers to our questions so that we can investigate further. |
I have 5 clients setup (1 boot, 1 geth, 3 signers using Clique) - Most of the time they work no problem and mining starts, sometimes mining just stops with no warning nor issue/error message.
Another time I noticed that after coming to consensus on an additional external signer that mining will also stop - I have tried to find information on the order with which signers take turns and what can stop mining from happening, from looking at the code it seems that signers are chosen more so by timing than randomly but can a signer who misses their "turn" for whatever reason cripple or stop the other nodes from signing/validating blocks ? I would hazard a guess that this is not the case but I cant seem to tract down the issue and its is rendering my dev network useless. I know I'm not giving much to go on but there is no errors like usual such "as bad propagated block" all that happens is siging stops and everyone is on "signed recently, waiting to sign again"
for what its worth im running my signers with this
The text was updated successfully, but these errors were encountered: