-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious checkpoint verifier JoinError panic on zebrad shutdown #1576
Comments
@teor2345 this is not a priority but if you get some time, can you provide some hint on how to implement the checkpoint rollback? |
@oxarbitrage when I originally wrote the ticket, I thought we needed to preserve the "whole checkpoint" invariant. But we're shutting down the syncer, and it's the only component that cares about checkpoints. So we only need to preserve the "continuous blocks" and "complete block" invariants, which are much weaker. We've already implemented these invariants: continuous blocks: make sure blocks are committed in order The FinalizedState queues non-continuous blocks and does not write them to disk: pub fn queue_and_commit_finalized(&mut self, queued: QueuedFinalized) {
let prev_hash = queued.0.block.header.previous_block_hash;
let height = queued.0.height;
self.queued_by_prev_hash.insert(prev_hash, queued);
while let Some(queued_block) = self.queued_by_prev_hash.remove(&self.finalized_tip_hash()) {
self.commit_finalized(queued_block);
...
}
...
} We also assert if we try to commit any blocks to disk out of order: // Assert that callers (including unit tests) get the chain order correct
if self.is_empty(hash_by_height) {
assert_eq!(
block::Hash([0; 32]),
block.header.previous_block_hash,
"the first block added to an empty state must be a genesis block"
);
assert_eq!(
block::Height(0),
height,
"cannot commit genesis: invalid height"
);
} else {
assert_eq!(
self.finalized_tip_height()
.expect("state must have a genesis block committed")
+ 1,
Some(height),
"committed block height must be 1 more than the finalized tip height"
);
assert_eq!(
self.finalized_tip_hash(),
block.header.previous_block_hash,
"committed block must be a child of the finalized tip"
);
} complete block: write the data for each block in a database transaction that covers all column families let batch = prepare_commit();
let result = self.db.write(batch).map(|()| hash); So we can just ignore this panic as well. If any blocks are incomplete, the database will roll them back. I've edited the ticket, and added some tests to make sure |
Version
zebrad 1.0.0-alpha.0
Commit a418284 "Create the global span immediately after activating tracing" from PR #1568
Platform
Linux ... 5.4.83 #1-NixOS SMP Fri Dec 11 12:23:33 UTC 2020 x86_64 GNU/Linux
Description
zebrad
can panic with a spuriouscommit_finalized_block should not panic: JoinError::Cancelled
error in the checkpointer on shutdown.I tried this:
Shutting down
zebrad
using Control-C (SIGINT)I expected to see this happen:
zebrad
exits without panickingInstead, this happened:
zebrad
panics with a spuriouscommit_finalized_block should not panic: JoinError::Cancelled
in theCheckpointVerifier
.This invariant is stronger than required, because the syncer is shutting down.
The
FinalizedState
already ensures the necessary invariants:Solution
zebrad
, and then restarts it (rather than usingdebug_stop_at_height
)Restarting in the middle of a checkpoint:
height
parameter to therestart_stop_at_height
test0
,zebra_consensus::MAX_CHECKPOINT_HEIGHT_GAP / 2
, andzebra_consensus::MAX_CHECKPOINT_HEIGHT_GAP
That way, we'll test that Zebra can restart:
Commands
zebrad start
from a debug build, using the default configRelated Issues
We should re-test the fix for this issue after #1351, because it will change the interrupt handler so it works even when
zebrad
is busy.Spurious MustUseOneshotSender panic on zebrad shutdown #1574 is a similar issue in the network code
Logs
The text was updated successfully, but these errors were encountered: