-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix obtaining deposits after connection loss #3943
Conversation
When an error occurs during Eth1 deposits import, the already imported blocks are kept while the connection to the EL is re-established. However, the corresponding merkleizer is not persisted, leading to any future deposits no longer being properly imported. This is quite common when syncing a fresh Nimbus instance against an already-synced Geth EL. Fixed by persisting the head merkleizer together with the blocks.
The head merkleizer is not kept around on purpose, because the web3 providers are known to miss certain deposit events from time to time which causes the merkleizer to get infected with invalid values. These are eventually discovered, the monitor is restarted and we continue from the known safe finalized state. |
Yes, that case still works. But in the case where there are no invalid values, there are still regular disconnects of the websocket connection every couple minutes, and because there is nothing invalid in there, the sync starts from the previous head (instead of from finalized head). In this case, the old head merkleizer still needs to be available. Memory consumption should be similar, whether it is living in a |
The scenario you are describing goes through this flow: if m.latestEth1Block.isSome and m.depositsChain.blocks.len > 0:
let needsReset = m.depositsChain.hasConsensusViolation or (block:
let
lastKnownBlock = m.depositsChain.blocks.peekLast
matchingBlockAtNewProvider = awaitWithRetries(
m.dataProvider.getBlockByNumber lastKnownBlock.number)
lastKnownBlock.voteData.block_hash.asBlockHash != matchingBlockAtNewProvider.hash)
if needsReset:
m.depositsChain.clear()
m.latestEth1Block = none(FullBlockId)
and then, because if shouldProcessDeposits and m.depositsChain.blocks.len == 0:
let startBlock = awaitWithRetries(
m.dataProvider.getBlockByHash(
m.depositsChain.finalizedBlockHash.asBlockHash))
m.depositsChain.addBlock Eth1Block(
number: Eth1BlockNumber startBlock.number,
timestamp: Eth1BlockTimestamp startBlock.timestamp,
voteData: eth1DataFromMerkleizer(
m.depositsChain.finalizedBlockHash,
m.depositsChain.finalizedDepositsMerkleizer))
eth1SyncedTo = Eth1BlockNumber startBlock.number
eth1_synced_head.set eth1SyncedTo.toGaugeValue
eth1_finalized_head.set eth1SyncedTo.toGaugeValue
eth1_finalized_deposits.set(
m.depositsChain.finalizedDepositsMerkleizer.getChunkCount.toGaugeValue)
m.depositsChain.headMerkleizer = copy m.finalizedDepositsMerkleizer
debug "Starting Eth1 syncing", `from` = shortLog(m.depositsChain.blocks[0]) The fix is for the case where |
Alright, thank you for the explanation. It's also curious why your WebSocket connection was dying so frequently. Have you tried to look into the root cause behind this? |
#3944 The use of nested `awaitWithRetries` calls would have resulted in an unexpected number of retries (3x3). We now use regular `await` in outer layer to avoid the problem. #3943 The new code has an invariant that the `headMerkleizer` field in the `Eth1Chain` is always kept in sync with the blocks stored in the chain. This invariant is now enforced better by doing the necessary merkleizer updates in the `Eth1Chain.addBlock` function.
#3944 The use of nested `awaitWithRetries` calls would have resulted in an unexpected number of retries (3x3). We now use regular `await` in outer layer to avoid the problem. #3943 The new code has an invariant that the `headMerkleizer` field in the `Eth1Chain` is always kept in sync with the blocks stored in the chain. This invariant is now enforced better by doing the necessary merkleizer updates in the `Eth1Chain.addBlock` function, in the `Eth1Chain.init` function and in the `Eth1Chain.reset` function.
When an error occurs during Eth1 deposits import, the already imported
blocks are kept while the connection to the EL is re-established.
However, the corresponding merkleizer is not persisted, leading to any
future deposits no longer being properly imported. This is quite common
when syncing a fresh Nimbus instance against an already-synced Geth EL.
Fixed by persisting the head merkleizer together with the blocks.