Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

State tracking for double sign protection #193

Merged
merged 20 commits into from
Mar 8, 2019
Merged

Conversation

zmanian
Copy link
Contributor

@zmanian zmanian commented Mar 4, 2019

Starting to work on tracking state in the file system for each client to prevent double signing.

This is basically duplicating the approach taken inside of tendermint where were we track forward round progression as a measure of preventing a double sign.

@zmanian zmanian requested review from liamsi and tarcieri March 6, 2019 16:03
@zmanian zmanian marked this pull request as ready for review March 6, 2019 16:03
struct LastSignData {
pub height: i64,
pub round: i64,
pub step: i8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these ever actually be negative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure. I have a vague memory that in Tendermint sometimes these values get set at -1 as a marker.

But I haven't actually looked through all the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also those are the types in the Tendermint implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know the only thing that can be negative is the Proposal.POLRound (can be -1 as @zmanian mentioned). The other fields can not be negative. But yeah the types are all signed in the tendermint golang code ... Not sure we need to follow this tho.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest using unsigned integers for anything that isn't explicitly negative, and potentially Option in lieu of -1

@zmanian
Copy link
Contributor Author

zmanian commented Mar 6, 2019

Help on why the integration tests are failing would be appreciated.

Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into why the tests are failing:

  • test_handle_and_sign_proposal
  • test_handle_and_sign_vote

src/last_sign_state.rs Outdated Show resolved Hide resolved
Co-Authored-By: zmanian <zaki@manian.org>
Some(ref p) => Some(ConsensusState {
height: p.height,
round: p.round,
step: 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub fn sync_to_disk(&mut self) -> std::io::Result<()> {
self.file
.write_all(serde_json::to_string(&self.data).unwrap().as_ref())?;
self.file.sync_all()?;
Copy link
Contributor

@tarcieri tarcieri Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made a bit more atomic by writing the new state to a separate file (e.g. .chainid_priv_validator_state.json.tmp) then replacing/overwriting the other file (e.g. with fs::rename)

@tarcieri
Copy link
Contributor

tarcieri commented Mar 6, 2019

Re: integration tests, is it possible they're triggering double signing and that's causing it to abort the connection?

I'd suggest adding warn! debug messages if double signing is ever encountered

@liamsi
Copy link
Contributor

liamsi commented Mar 6, 2019

What initially looked like some encoding problem was a file perms problem (should be fixed in 74f6c5a)

@liamsi
Copy link
Contributor

liamsi commented Mar 6, 2019

OK, not quite. Still debugging/investigating.

@liamsi
Copy link
Contributor

liamsi commented Mar 6, 2019

Interestingly, the tests pass if I manually remove the file and then fail when the test_chain_id_priv_validator_state.json is present. It looks like the code appends to the empty state because the content after successfully test_handle_and_sign_vote running is:
{"height":0,"round":0,"step":0,"block_id":null}{"height":12345,"round":2,"step":6,"block_id":"736F6D6520686173683030303030303030303030303030303030303030303030"}

@tarcieri
Copy link
Contributor

tarcieri commented Mar 6, 2019

Would definitely recommend making a new file (i.e. File::create() and overwriting the old one). It will prevent any issues like that in perpetuity.

@zmanian
Copy link
Contributor Author

zmanian commented Mar 6, 2019

Most likely the file should be deleted at the end of every test

@zmanian
Copy link
Contributor Author

zmanian commented Mar 7, 2019

I tried adding

fs::remove_file("test_chain_id_priv_validator_state.json").unwrap();

to the end of the integration tests but this doesn't seem be helping or deleting the file

Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmanian
Copy link
Contributor Author

zmanian commented Mar 7, 2019

Here is my concern.

It's not unusual to see regression errors in the logs on Tendermint. But I am pretty sure with our current error handling a regression error here is going to halt the Tendermint to KMS connection

@@ -38,7 +38,7 @@ jobs:
command: |
rustc --version
cargo --version
cargo test --all --all-features
cargo test --all --all-features -- --test-threads 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler (or faster) way to make sure tests-files do not interfere with each other would be to use different chain-ids per test

@liamsi
Copy link
Contributor

liamsi commented Mar 7, 2019

It's not unusual to see regression errors in the logs on Tendermint. But I am pretty sure with our current error handling a regression error here is going to halt the Tendermint to KMS connection

Have you experienced that? Can we reproduce this somehow (ideally with a test)? Feel free to dismiss my review / approve.

Copy link
Contributor

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this with the caveat that we should probably be doing proper atomic writes to update the state file. It appears failure to do this already manifested as a bug in this PR where two versions of the state wound up concatenated in the same file.

There's a crate which provides an atomic write API: https://crates.io/crates/atomicwrites

I can submit a PR to switch to it.

@tarcieri tarcieri merged commit 52b7295 into master Mar 8, 2019
@tarcieri tarcieri deleted the zaki/priv_validator_state branch March 8, 2019 22:27
This was referenced Mar 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants