-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Wire up candidate backing, approval-voting to disputes #3348
Conversation
rphmeier
commented
Jun 22, 2021
- Candidate backing informs the dispute coordinator of every imported backing statement, even those produced locally.
- Approval voting informs the dispute coordinator of every produced approval vote.
- Approval voting informs the dispute coordinator to issue local dispute statements when encountering an invalid candidate, whether that's due to failed recovery or failed validation.
node/core/approval-voting/src/lib.rs
Outdated
// REVIEW: should timeouts and internal execution errors lead to dispute? | ||
// only some execution errors? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pepyakin I'd appreciate your input here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not being @pepyakin there are a definitely a few variants that I'd expect to trigger a dispute BadParent
, PoVHashMismatch
,ParaHeadHashMismatch
,BadSignature
,CodeHashMismatch
,InvalidOutputs
,CodeDecompressionFailure
,PoVDecompressionFailure
,BadReturn
.
ParamsTooLarge(u64)
, CodeTooLarge(u64)
, depends on the limits we set, but these should be checked when submitting i.e. a code update to avoid this upfront.
#[derive(Debug)]
pub enum InvalidCandidate {
/// Failed to execute.`validate_block`. This includes function panicking.
ExecutionError(String),
/// Validation outputs check doesn't pass.
InvalidOutputs,
/// Execution timeout.
Timeout,
/// Validation input is over the limit.
ParamsTooLarge(u64),
/// Code size is over the limit.
CodeTooLarge(u64),
/// Code does not decompress correctly.
CodeDecompressionFailure,
/// PoV does not decompress correctly.
PoVDecompressionFailure,
/// Validation function returned invalid data.
BadReturn,
/// Invalid relay chain parent.
BadParent,
/// POV hash does not match.
PoVHashMismatch,
/// Bad collator signature.
BadSignature,
/// Para head hash does not match.
ParaHeadHashMismatch,
/// Validation code hash does not match.
CodeHashMismatch,
}
the bigger question is regarding Timeout
as discussed at the retreat which depends on hardware, and other errors, that depend on the particular impl details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am myself not entirely sure and I would've approached you with the same questions. Here are my thoughts anyway,
In ideal world, internal errors should be extremely rare and only raise due to misconfiguration or hardware malfunction. That implies that such a failure should only happen in isolation. Given that, it would be fair from the protocol perspective to no-show in such a case. We could introduce logic that issues a dispute here. However, a rational even if honest validator will be incentivised to comment out this piece of code. I feel this is the case even if in presence of precedents of refunding such slashes. Nobody wants to snitch on themselves and get potentially slashed.
In the real world there are bugs though. Inclusion of a bad candidate can be really bad under some circumstances. Specifically, we assume that a bad candidate thanks to messaging (XCMP or bridges) taints everything it touches. Reverting a slash seems to be an easier task than reverting arbitrary changes made by a swarm of state machines. Some of those are out of our reach (i.e. bridges to other chains). I am not sure how realistic such an attack is, I don't have a scenario.
It seems that we should pick one of the options here. I am a bit more inclined to make those a slashing event and defer to the governance. Hopefully, the validators will be not so rational and there will be some friction due to the fact that this decision comes from the client devs and this is on by default.
Regarding the timeouts. After we have the PVF prechecking game, then we won't deal with timeouts by compilation. This leaves us only with timeouts by execution. I am not familiar enough with the t_good, t_bad, t_ugly model but I assume now it should solve the issue about difference in hardware and compiler versions. If I understand correctly, they have their own criteria about this and it depends on the exact amount of time spent on validation. This should be easy to get from the validation host.
I am not yet sure what's the best way to deal with this issue in the mean time.
node/core/backing/src/lib.rs
Outdated
validator_public.clone(), | ||
).ok(); | ||
|
||
if let (Some(candidate_receipt), Some(dispute_statement)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
node/primitives/src/lib.rs
Outdated
/// not the validator public key, the public key must be passed as well, | ||
/// along with the signing context. | ||
/// | ||
/// This does signature checks again with the data provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the again
in this sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose Rob meant that new_checked
will perform checking of the signature of the dispute statement that we've created here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the handling of InvalidCandidate
variants, LGTM
node/primitives/src/lib.rs
Outdated
/// not the validator public key, the public key must be passed as well, | ||
/// along with the signing context. | ||
/// | ||
/// This does signature checks again with the data provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose Rob meant that new_checked
will perform checking of the signature of the dispute statement that we've created here.
node/primitives/src/lib.rs
Outdated
validator_public, | ||
backing_statement.unchecked_signature().clone(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again of this, I am not sure including the actual public key gains us anything here. Yes we can then verify the signature without a SessionInfo
, but for checking that the key was indeed an actual valid validator key we still depend on SessionInfo
. And in the end this is what matters - fine we have a valid signature, but everyone can provide a valid signature with a self chosen key.
I would argue that, by checking against the included key, instead of a validator index, we in fact provide less safety. Because previously you had to look up that key and the only sane way to do that was to query the SessionInfo
. Now you can check the signature without that and say "This is valid.", when in fact you don't actually know, you only know it has been signed with some key.
So yes it is cumbersome (especially in a pure and completely type safe way) to check those signatures, because they are based on session keys, but we need to deal with it. Maybe we can have some cryptographic proof, that a given key belongs to a validator in a particular session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But even if, that would be based on keys again and those would need to be checked again for their authenticity. So yeah, we need to query the chain, it is the source of truth after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, I think we should get this PR merged ASAP to avoid further merge conflicts. My concerns don't actually relate to this PR specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a hlll I will die on, but 99% of the time when you're using a signed statement, you're updating some metadata which is based on an underlying [ValidatorId]
. This necessitates validating that statement.public() == validators[index]
and is only an obvious check when the ValidatorIndex
isn't part of statement
already. If the validator index is part of statement
it means that the code accepting the statement is going to be naturally less defensive and less reliable, from a self-contained view.
Our use-cases of backing, bitfields, approvals, and disputes all meet this criterion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! I still like the other API more though: Ok, the smart constructor is not perfect, but it will catch almost all mistakes one can make (especially the most realistic ones, like just forgetting to check at all). The SignedDisputeStatement
API requires us to carry around a tuple (SignedDisputeStatement, ValidatorIndex)
which could be totally made up and is also rather inconvenient to use.
Anyhow, let's leave it as is for now. I was just wondering, if we kept a compatible implementation, whether we could skip the redundant signature checks (from_backing_statement(statement.as_unchecked()
) calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we import all statements in backing, but only our own statements in approval checking?
Oh, that is definitely an oversight. |
Another thing I noticed. I don't think the coordinator checks whether there is already a vote from this node, before participating in a dispute. Needs not necessarily be part of this PR, but is certainly related as previously such a check would have been pointless. |
bot merge |
Waiting for commit status. |
Merge aborted: Checks failed for 6776c5a |
bot merge |
Waiting for commit status. |