Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in freezer DB storage of randao_mixes #3011

Closed
michaelsproul opened this issue Feb 9, 2022 · 1 comment
Closed

Fix bug in freezer DB storage of randao_mixes #3011

michaelsproul opened this issue Feb 9, 2022 · 1 comment
Assignees
Labels
bug Something isn't working database

Comments

@michaelsproul
Copy link
Member

Description

There's a bug lurking in the database code that can cause occasional database corruption. It doesn't happen consistently, but when it does it seems to result in a zero hash (0x00) appearing in the randao_mixes array.

The case I'm investigating presented as corruption at slot 135168 on Prater:

Feb 08 07:31:26.461 ERRO State reconstruction failed             error: HotColdDBError(BlockReplayBlockError(HeaderInvalid { reason: ParentBlockRootMismatch { state: 0x373eb699eae0110e474671cab72d5c6ca666d4a6f5a5a356f2af89039ad98382, block: 0xabf45deec98af2873a04d352ebbc54eac35d00c8157fea27b21f9adc2446233b } })), service: beacon

Oddly the first corrupt state actually occurs much earlier. I found that the state at slot 12288 was corrupt using this (fish) script:

for i in (seq 0 2048 135168)
    set checksum (curl -s -H "Accept: application/octet-stream" "http://localhost:5052/eth/v2/debug/beacon/states/$i" | sha256sum)
    echo "$i: $checksum"
end

Diffing the corrupt state at slot 12288 against the real state reveals a 0x00 value in the randao_mixes at index 320. This is interesting because that corresponds to epoch 320, i.e. 64 epochs prior to slot 12288 (epoch 384).

I think the bug must be in store_updated_vector, which is responsible for writing the randao mixes in the flat format used by the database:

pub fn store_updated_vector<F: Field<E>, E: EthSpec, S: KeyValueStore<E>>(
field: F,
store: &S,
state: &BeaconState<E>,
spec: &ChainSpec,
ops: &mut Vec<KeyValueStoreOp>,
) -> Result<(), Error> {
let chunk_size = F::chunk_size();
let (start_vindex, end_vindex) = F::start_and_end_vindex(state.slot(), spec);
let start_cindex = start_vindex / chunk_size;
let end_cindex = end_vindex / chunk_size;
// Store the genesis value if we have access to it, and it hasn't been stored already.
if F::slot_needs_genesis_value(state.slot(), spec) {
let genesis_value = F::extract_genesis_value(state, spec)?;
F::check_and_store_genesis_value(store, genesis_value, ops)?;
}
// Start by iterating backwards from the last chunk, storing new chunks in the database.
// Stop once a chunk in the database matches what we were about to store, this indicates
// that a previously stored state has already filled-in a portion of the indices covered.
let full_range_checked = store_range(
field,
(start_cindex..=end_cindex).rev(),
start_vindex,
end_vindex,
store,
state,
spec,
ops,
)?;
// If the previous `store_range` did not check the entire range, it may be the case that the
// state's vector includes elements at low vector indices that are not yet stored in the
// database, so run another `store_range` to ensure these values are also stored.
if !full_range_checked {
store_range(
field,
start_cindex..end_cindex,
start_vindex,
end_vindex,
store,
state,
spec,
ops,
)?;
}
Ok(())
}

It's possible that we're somehow re-writing the old state at 12288 which inapproriately zeroes some entries and corrupts all subsequent states. I don't think the corruption can occur the first time state 12288 is written else it would have failed the block root check at that point or shortly after.

Will update this issue with more info soon.

@michaelsproul
Copy link
Member Author

Closing in favour of hierarchical state diffs, which deletes this part of the database 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database
Projects
None yet
Development

No branches or pull requests

1 participant