Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BFT mining coordinator being temporarily stopped while syncing #7657

Merged
merged 10 commits into from
Oct 9, 2024

Conversation

matthew1001
Copy link
Contributor

@matthew1001 matthew1001 commented Sep 20, 2024

PR description

When a new BFT node starts syncing against an existing chain, its BFT block timer tries to propose blocks even though it does not have the current state of the chain. This can result in exceptions such as:

2024-09-20 10:32:37.379+01:00 | EthScheduler-Services-5 (importBlock) | INFO  | FullImportBlockStep | Import reached block 200 (0x39dfd73363ef73d12463cdd6522c095e9ee341eadd231bb31e72c70bd2b405fd), - Mg/s, Peers: 3
2024-09-20 10:32:37.552+01:00 | BftProcessorExecutor-QBFT-0 | ERROR | AbstractBlockProcessor | failed persisting block
java.lang.RuntimeException: World State Root does not match expected value, header 0xffc6762fcbdd6e3416691f76ca2937b59d38c33933606f6de9bd19cb0cfa297d calculated 0x66495ad973c71710d0e556a763896866114018e30485123776764484f3a44b55
        at org.hyperledger.besu.ethereum.trie.diffbased.common.worldview.DiffBasedWorldState.verifyWorldStateRoot(DiffBasedWorldState.java:227)
        at org.hyperledger.besu.ethereum.trie.diffbased.common.worldview.DiffBasedWorldState.persist(DiffBasedWorldState.java:185)
        at org.hyperledger.besu.ethereum.mainnet.AbstractBlockProcessor.processBlock(AbstractBlockProcessor.java:233)
        at org.hyperledger.besu.ethereum.mainnet.BlockProcessor.processBlock(BlockProcessor.java:78)
        at org.hyperledger.besu.ethereum.MainnetBlockValidator.processBlock(MainnetBlockValidator.java:246)
        at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:163)
        at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:107)
        at org.hyperledger.besu.consensus.qbft.validation.ProposalPayloadValidator.validateBlock(ProposalPayloadValidator.java:100)
        at org.hyperledger.besu.consensus.qbft.validation.ProposalPayloadValidator.validate(ProposalPayloadValidator.java:84)
        at org.hyperledger.besu.consensus.qbft.validation.ProposalValidator.validate(ProposalValidator.java:103)
        at org.hyperledger.besu.consensus.qbft.validation.MessageValidator.validateProposal(MessageValidator.java:133)
        at org.hyperledger.besu.consensus.qbft.statemachine.RoundState.setProposedBlock(RoundState.java:87)
        at org.hyperledger.besu.consensus.qbft.statemachine.QbftRound.updateStateWithProposedBlock(QbftRound.java:265)
        at org.hyperledger.besu.consensus.qbft.statemachine.QbftRound.updateStateWithProposalAndTransmit(QbftRound.java:194)
        at org.hyperledger.besu.consensus.qbft.statemachine.QbftRound.createAndSendProposalMessage(QbftRound.java:140)
        at org.hyperledger.besu.consensus.qbft.statemachine.QbftBlockHeightManager.handleBlockTimerExpiry(QbftBlockHeightManager.java:140)
        at org.hyperledger.besu.consensus.common.bft.statemachine.BaseBftController.handleBlockTimerExpiry(BaseBftController.java:168)
2024-09-20 10:32:37.554+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | MainnetBlockValidator | Failed to process block 234 (0x54c2787f9ac73aaa38a37bac1c00d5d615591bb7fc5cc1b4ef3d25f1f320db79): Optional[World State Root does not match expected value, header 0xffc6762fcbdd6e3416691f76ca2937b59d38c33933606f6de9bd19cb0cfa297d calculated 0x66495ad973c71710d0e556a763896866114018e30485123776764484f3a44b55], caused by java.lang.RuntimeException: World State Root does not match expected value, header 0xffc6762fcbdd6e3416691f76ca2937b59d38c33933606f6de9bd19cb0cfa297d calculated 0x66495ad973c71710d0e556a763896866114018e30485123776764484f3a44b55

BFT nodes should not try to propose blocks until they are in sync with the rest of the chain. This PR makes the following changes:

  1. Updates the BftProcessor so that the existing stop() call isn't necessarily terminal. Today it only appears to be used on shutdown. This PR makes it possible to stop() a BftProcessor in such a way that it can be restarted again without the node restarting
  2. Updates the BftMiningCoordinator to restart the BftProcessor when it the coordinator is restarted

The result is that a node can begin doing BFT work (e.g. in the case where it is a single node with nobody to sync with, or where it is a validator for a brand new chain), but then stop its BFT work if it determines that it has a sync peer and starts downloading from it. Once syncing has completed the BFT work can then resume, this time with up-to-date state for the chain.

…g happens

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Copy link
Contributor

@fab-10 fab-10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please add a CHANGELOG entry, and does it make sense to add some tests?

@matthew1001
Copy link
Contributor Author

Yes good point, I'll add both shortly @fab-10

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
@matthew1001
Copy link
Contributor Author

Tests & changelog entry added @fab-10

matthew1001 and others added 3 commits October 9, 2024 11:36
…common/bft/BftEventQueue.java

Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Matt Whitehead <matthew1001@hotmail.com>
…common/bft/BftProcessor.java

Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Matt Whitehead <matthew1001@hotmail.com>
Signed-off-by: Matt Whitehead <matthew.whitehead@kaleido.io>
@matthew1001 matthew1001 merged commit 03a0cfa into hyperledger:main Oct 9, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants