-
Notifications
You must be signed in to change notification settings - Fork 42
State tracking for double sign protection #193
Conversation
…d so regressions don't trigger a double sign
struct LastSignData { | ||
pub height: i64, | ||
pub round: i64, | ||
pub step: i8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these ever actually be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure. I have a vague memory that in Tendermint sometimes these values get set at -1 as a marker.
But I haven't actually looked through all the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also those are the types in the Tendermint implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know the only thing that can be negative is the Proposal.POLRound (can be -1 as @zmanian mentioned). The other fields can not be negative. But yeah the types are all signed in the tendermint golang code ... Not sure we need to follow this tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest using unsigned integers for anything that isn't explicitly negative, and potentially Option
in lieu of -1
Sync to disk on every signature
Help on why the integration tests are failing would be appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into why the tests are failing:
- test_handle_and_sign_proposal
- test_handle_and_sign_vote
Co-Authored-By: zmanian <zaki@manian.org>
Some(ref p) => Some(ConsensusState { | ||
height: p.height, | ||
round: p.round, | ||
step: 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it would be better to use constants / enums for this. See https://github.com/tendermint/tendermint/blob/411bc5e49fcc3dda866ae799ba2ceda0084f80d4/privval/file.go#L17-L35
pub fn sync_to_disk(&mut self) -> std::io::Result<()> { | ||
self.file | ||
.write_all(serde_json::to_string(&self.data).unwrap().as_ref())?; | ||
self.file.sync_all()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be made a bit more atomic by writing the new state to a separate file (e.g. .chainid_priv_validator_state.json.tmp
) then replacing/overwriting the other file (e.g. with fs::rename
)
Re: integration tests, is it possible they're triggering double signing and that's causing it to abort the connection? I'd suggest adding |
What initially looked like some encoding problem was a file perms problem (should be fixed in 74f6c5a) |
OK, not quite. Still debugging/investigating. |
Interestingly, the tests pass if I manually remove the file and then fail when the |
Would definitely recommend making a new file (i.e. |
Most likely the file should be deleted at the end of every test |
I tried adding fs::remove_file("test_chain_id_priv_validator_state.json").unwrap(); to the end of the integration tests but this doesn't seem be helping or deleting the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be OK to merge this and address a few things in a follow-up PR. Namely:
Here is my concern. It's not unusual to see regression errors in the logs on Tendermint. But I am pretty sure with our current error handling a regression error here is going to halt the Tendermint to KMS connection |
@@ -38,7 +38,7 @@ jobs: | |||
command: | | |||
rustc --version | |||
cargo --version | |||
cargo test --all --all-features | |||
cargo test --all --all-features -- --test-threads 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler (or faster) way to make sure tests-files do not interfere with each other would be to use different chain-ids per test
Have you experienced that? Can we reproduce this somehow (ideally with a test)? Feel free to dismiss my review / approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve this with the caveat that we should probably be doing proper atomic writes to update the state file. It appears failure to do this already manifested as a bug in this PR where two versions of the state wound up concatenated in the same file.
There's a crate which provides an atomic write API: https://crates.io/crates/atomicwrites
I can submit a PR to switch to it.
Starting to work on tracking state in the file system for each client to prevent double signing.
This is basically duplicating the approach taken inside of tendermint where were we track forward round progression as a measure of preventing a double sign.