Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Wire up candidate backing, approval-voting to disputes #3348

Merged
29 commits merged into from
Jul 9, 2021
Merged

Conversation

rphmeier
Copy link
Contributor

  1. Candidate backing informs the dispute coordinator of every imported backing statement, even those produced locally.
  2. Approval voting informs the dispute coordinator of every produced approval vote.
  3. Approval voting informs the dispute coordinator to issue local dispute statements when encountering an invalid candidate, whether that's due to failed recovery or failed validation.

@rphmeier rphmeier added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jun 22, 2021
Comment on lines 2266 to 2267
// REVIEW: should timeouts and internal execution errors lead to dispute?
// only some execution errors?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pepyakin I'd appreciate your input here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not being @pepyakin there are a definitely a few variants that I'd expect to trigger a dispute BadParent, PoVHashMismatch,ParaHeadHashMismatch,BadSignature,CodeHashMismatch,InvalidOutputs,CodeDecompressionFailure,PoVDecompressionFailure,BadReturn.

ParamsTooLarge(u64), CodeTooLarge(u64), depends on the limits we set, but these should be checked when submitting i.e. a code update to avoid this upfront.

#[derive(Debug)]
pub enum InvalidCandidate {
    /// Failed to execute.`validate_block`. This includes function panicking.
    ExecutionError(String),
    /// Validation outputs check doesn't pass.
    InvalidOutputs,
    /// Execution timeout.
    Timeout,
    /// Validation input is over the limit.
    ParamsTooLarge(u64),
    /// Code size is over the limit.
    CodeTooLarge(u64),
    /// Code does not decompress correctly.
    CodeDecompressionFailure,
    /// PoV does not decompress correctly.
    PoVDecompressionFailure,
    /// Validation function returned invalid data.
    BadReturn,
    /// Invalid relay chain parent.
    BadParent,
    /// POV hash does not match.
    PoVHashMismatch,
    /// Bad collator signature.
    BadSignature,
    /// Para head hash does not match.
    ParaHeadHashMismatch,
    /// Validation code hash does not match.
    CodeHashMismatch,
}

the bigger question is regarding Timeout as discussed at the retreat which depends on hardware, and other errors, that depend on the particular impl details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am myself not entirely sure and I would've approached you with the same questions. Here are my thoughts anyway,

In ideal world, internal errors should be extremely rare and only raise due to misconfiguration or hardware malfunction. That implies that such a failure should only happen in isolation. Given that, it would be fair from the protocol perspective to no-show in such a case. We could introduce logic that issues a dispute here. However, a rational even if honest validator will be incentivised to comment out this piece of code. I feel this is the case even if in presence of precedents of refunding such slashes. Nobody wants to snitch on themselves and get potentially slashed.

In the real world there are bugs though. Inclusion of a bad candidate can be really bad under some circumstances. Specifically, we assume that a bad candidate thanks to messaging (XCMP or bridges) taints everything it touches. Reverting a slash seems to be an easier task than reverting arbitrary changes made by a swarm of state machines. Some of those are out of our reach (i.e. bridges to other chains). I am not sure how realistic such an attack is, I don't have a scenario.

It seems that we should pick one of the options here. I am a bit more inclined to make those a slashing event and defer to the governance. Hopefully, the validators will be not so rational and there will be some friction due to the fact that this decision comes from the client devs and this is on by default.

Regarding the timeouts. After we have the PVF prechecking game, then we won't deal with timeouts by compilation. This leaves us only with timeouts by execution. I am not familiar enough with the t_good, t_bad, t_ugly model but I assume now it should solve the issue about difference in hardware and compiler versions. If I understand correctly, they have their own criteria about this and it depends on the exact amount of time spent on validation. This should be easy to get from the validation host.

I am not yet sure what's the best way to deal with this issue in the mean time.

validator_public.clone(),
).ok();

if let (Some(candidate_receipt), Some(dispute_statement))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// not the validator public key, the public key must be passed as well,
/// along with the signing context.
///
/// This does signature checks again with the data provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the again in this sentence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose Rob meant that new_checked will perform checking of the signature of the dispute statement that we've created here.

Copy link
Contributor

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the handling of InvalidCandidate variants, LGTM

@pepyakin pepyakin self-requested a review June 23, 2021 16:17
node/core/backing/src/lib.rs Show resolved Hide resolved
/// not the validator public key, the public key must be passed as well,
/// along with the signing context.
///
/// This does signature checks again with the data provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose Rob meant that new_checked will perform checking of the signature of the dispute statement that we've created here.

validator_public,
backing_statement.unchecked_signature().clone(),
)
}
Copy link
Member

@eskimor eskimor Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again of this, I am not sure including the actual public key gains us anything here. Yes we can then verify the signature without a SessionInfo, but for checking that the key was indeed an actual valid validator key we still depend on SessionInfo. And in the end this is what matters - fine we have a valid signature, but everyone can provide a valid signature with a self chosen key.

I would argue that, by checking against the included key, instead of a validator index, we in fact provide less safety. Because previously you had to look up that key and the only sane way to do that was to query the SessionInfo. Now you can check the signature without that and say "This is valid.", when in fact you don't actually know, you only know it has been signed with some key.

So yes it is cumbersome (especially in a pure and completely type safe way) to check those signatures, because they are based on session keys, but we need to deal with it. Maybe we can have some cryptographic proof, that a given key belongs to a validator in a particular session?

Copy link
Member

@eskimor eskimor Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even if, that would be based on keys again and those would need to be checked again for their authenticity. So yeah, we need to query the chain, it is the source of truth after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, I think we should get this PR merged ASAP to avoid further merge conflicts. My concerns don't actually relate to this PR specifically.

Copy link
Contributor Author

@rphmeier rphmeier Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a hlll I will die on, but 99% of the time when you're using a signed statement, you're updating some metadata which is based on an underlying [ValidatorId]. This necessitates validating that statement.public() == validators[index] and is only an obvious check when the ValidatorIndex isn't part of statement already. If the validator index is part of statement it means that the code accepting the statement is going to be naturally less defensive and less reliable, from a self-contained view.

Our use-cases of backing, bitfields, approvals, and disputes all meet this criterion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I still like the other API more though: Ok, the smart constructor is not perfect, but it will catch almost all mistakes one can make (especially the most realistic ones, like just forgetting to check at all). The SignedDisputeStatement API requires us to carry around a tuple (SignedDisputeStatement, ValidatorIndex) which could be totally made up and is also rather inconvenient to use.

Anyhow, let's leave it as is for now. I was just wondering, if we kept a compatible implementation, whether we could skip the redundant signature checks (from_backing_statement(statement.as_unchecked()) calls).

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we import all statements in backing, but only our own statements in approval checking?

@rphmeier
Copy link
Contributor Author

rphmeier commented Jul 6, 2021

Oh, that is definitely an oversight.

@eskimor
Copy link
Member

eskimor commented Jul 7, 2021

Another thing I noticed. I don't think the coordinator checks whether there is already a vote from this node, before participating in a dispute. Needs not necessarily be part of this PR, but is certainly related as previously such a check would have been pointless.

@rphmeier rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Jul 7, 2021
@rphmeier rphmeier added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 8, 2021
@rphmeier rphmeier mentioned this pull request Jul 8, 2021
7 tasks
@rphmeier
Copy link
Contributor Author

rphmeier commented Jul 8, 2021

bot merge

@ghost
Copy link

ghost commented Jul 8, 2021

Waiting for commit status.

@ghost
Copy link

ghost commented Jul 8, 2021

Merge aborted: Checks failed for 6776c5a

@rphmeier
Copy link
Contributor Author

rphmeier commented Jul 9, 2021

bot merge

@ghost
Copy link

ghost commented Jul 9, 2021

Waiting for commit status.

@ghost ghost merged commit 567cfb9 into master Jul 9, 2021
@ghost ghost deleted the rh-wire-up-disputes branch July 9, 2021 21:15
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants