Geth Clique Block Period 1, Fork of Chain inconsistency #21191

cyrilnavessamuel · 2020-06-08T09:44:59Z

Hi there,

I am posting this bug which is already discussed but is inactive #18402.

Moreover the bug which I post here is a slight modification of the one posted in 18402

I have also posted in stack exchange which didn't have much response: https://ethereum.stackexchange.com/questions/83357/geth-clique-block-period-1-fork-of-chain-inconsistency

System information

Geth version: 1.9.12
OS & Version: Linux
Clique PoA
Processor: Intel i7 - 3770
CPU: 8
RAM : 16 GB
TXPOOL.GlobalSlots: 100000000
Block Period 1 or 2 (Faster block times)
No of Sealers/ Nodes : 4

Expected behaviour

Block Period 1 / 2 / 3 (Shorter block times) fork is created but should be resolved or reorged atleast

Actual behaviour

In my private Clique network with 4 Nodes (A,B,C,D) I noticed a fork in the chain for block period 1.

I noticed that it happens some times with block period 1 & 2.

I noticed that the fork happens at block 1678 . Nodes A & D have a similar chain data meanwhile Nodes B& C have similar data.

At block 1677, I noticed the difference in data between 2 chains:

Block hashes are different
Block of one chain is an uncle block while for other chain it has 5000 txs included
Both blocks have same difficulty 2 which means that it was mined in turn
Another complication arises when I noticed it was the same sealer who sealed the block.

This results in fork of the network and stalling at the end which cannot undergo any reorg in this deadlock situation.

Steps to reproduce the behaviour

Setup PoA Clique network with block period 1 / 2
No of Nodes/ Sealers : 4
Send transactions faster to invoke smart contract method. Eg client available in : https://github.com/cyrilnavessamuel/ethereumissue
You will notice fork when the nodes stop mining and when you debug the issue looking at the latest block header.

Backtrace

Latest Block 1678 details in Node A:

Block 1677 fork occured between Node A & Node C on 1 fork and other fork having Node B & Node D.
Notice the
difference in block hash ,
different no of txs in block (1 fork has 0 txs and other has 4791 txs)
same difficulty in both forks
Fork 1 details: Node A & Node D

Fork 2 details: Node B & Node C

Note: Fork 2 image is different since 4791 txs were printed which was too large;

holiman · 2020-06-18T08:33:32Z

The curios thing here is that one miner mined two different variants of block 1676, one empty and the other with 4791 transactions. It could be that there's a race here somewhere, since the default behaivour of the miner is to start mining an empty block, and then update the work as transactions come in.

holiman · 2020-06-18T08:35:17Z

A blocktime of 1 second, combined with a gasLimit of 250M, filled with thousands of transactions probably hits some cornercase in the miner.

karalabe · 2020-06-18T08:35:39Z

You gas limit seems to be insanely large (250M), and you block time quite small (1-3s). How much time do nodes need to actually process one such block? If the block processing times exceed the period, you can end up in very strange scenarios where signers start to race with themselves betwene importing/mining.

karalabe · 2020-06-18T08:38:03Z

Perhaps if you could share normal operational logs so we can see how heavy subsequent blocks are, how much time they take, etc.

cyrilnavessamuel · 2020-06-18T08:42:21Z

Thanks Martin and Peter for looking into the issue. I'll try to recreate the scenario and share the operational logs of it.
Cheers

cyrilnavessamuel · 2020-07-01T15:05:12Z

Hello All,

Sorry for the time to reproduce the issue.

This time I got with the same configuration (Block period 1) and with three scripts sending concurrently to 3 nodes from a 4 node blockchain network. Fork Happened at block 109. Although the fork happened and stuck there, the same node sealing two varieties of block was not reproduced.

Node A & Node B on one Fork

Node C & Node D on other Fork;

Please find the attached log files of 4 Nodes in zip file on google drive since the file is some what big.

https://drive.google.com/file/d/18E-0DHueFt3OOYb51Y7s0M5tYEGzPZvV/view?usp=sharing

Thanks for looking,
Cheers

fnaticwang · 2020-07-27T03:19:23Z

This problem has been around for almost a year and we still haven't solved it.

sambacha · 2020-07-27T19:05:28Z

try the following changes:

Genesis file changes

change gasblock to: 1DCD6000
delete: "petersburgBlock": 0,

period time

*where*, `OUT_OF_TURN_DELAY_MULTIPLER` == 500ms
*where*, `MIN_OUT_OF_TURN_DELAY` == your_desired_period (in this case `2000ms`

${G_PERIOD} == [ MIN_OUT_OF_TURN_DELAY + rand(SIGNER_COUNT * OUT_OF_TURN_DELAY_MULTIPLER) ]

so that you change: period: ${G_PERIOD} in your genesis file.

make sure your closing connections as well,
netstat -a | grep 8545 | wc -l

additional

Clique out of turn is a known issue, see goerli testnet.

holiman · 2020-08-20T09:10:48Z

I think the root problem here is

No of Nodes/ Sealers : 4

If you'd had 5 instead, the situation would be less prone to chain ties (2 sealers vs 2 sealers), and you'd have a natural tie-breaker.

adamschmideg added the status:triage label Jun 18, 2020

karalabe added need:more-information and removed status:triage labels Jun 18, 2020

no-response bot removed the need:more-information label Jun 18, 2020

adamschmideg added the status:triage label Aug 20, 2020

karalabe closed this as completed Aug 27, 2020

fjl removed the status:triage label Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geth Clique Block Period 1, Fork of Chain inconsistency #21191

Geth Clique Block Period 1, Fork of Chain inconsistency #21191

cyrilnavessamuel commented Jun 8, 2020

holiman commented Jun 18, 2020

holiman commented Jun 18, 2020

karalabe commented Jun 18, 2020

karalabe commented Jun 18, 2020

cyrilnavessamuel commented Jun 18, 2020

cyrilnavessamuel commented Jul 1, 2020

fnaticwang commented Jul 27, 2020

sambacha commented Jul 27, 2020 •

edited

Loading

holiman commented Aug 20, 2020

Geth Clique Block Period 1, Fork of Chain inconsistency #21191

Geth Clique Block Period 1, Fork of Chain inconsistency #21191

Comments

cyrilnavessamuel commented Jun 8, 2020

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Backtrace

holiman commented Jun 18, 2020

holiman commented Jun 18, 2020

karalabe commented Jun 18, 2020

karalabe commented Jun 18, 2020

cyrilnavessamuel commented Jun 18, 2020

cyrilnavessamuel commented Jul 1, 2020

fnaticwang commented Jul 27, 2020

sambacha commented Jul 27, 2020 • edited Loading

Genesis file changes

period time

additional

holiman commented Aug 20, 2020

sambacha commented Jul 27, 2020 •

edited

Loading