-
Notifications
You must be signed in to change notification settings - Fork 1.6k
inherent disputes: remove per block initializer and disputes timeout event #6937
Conversation
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
{ | ||
Self::deposit_event(Event::DisputeTimedOut(candidate_hash)); | ||
|
||
dispute.concluded_at = Some(now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes:
- Disputes that hit the runtime should actually never time out - they have been confirmed, so they should conclude.
- We don't have a timeout on the node side. So timing them out in the runtime is kind of pointless and out of sync with what the node would assume about the dispute.
- We drop disputes after 6 sessions anyway.
- We can not and should not slash anybody on a timed out dispute - the people who voted are the good guys (most likely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We drop disputes after 6 sessions anyway.
Yes, but maybe we should be reverting a block if the dispute hasn't concluded instead of finalizing it after some time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what we could and probably should do: Revert immediately once f+1 validators voted against a candidate.
{ | ||
Self::deposit_event(Event::DisputeTimedOut(candidate_hash)); | ||
|
||
dispute.concluded_at = Some(now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes:
- Disputes that hit the runtime should actually never time out - they have been confirmed, so they should conclude.
- We don't have a timeout on the node side. So timing them out in the runtime is kind of pointless and out of sync with what the node would assume about the dispute.
- We drop disputes after 6 sessions anyway.
- We can not and should not slash anybody on a timed out dispute - the people who voted are the good guys (most likely).
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
We'd dump all pruned disputes here whenever this happens? Or just the ones where we lost money? (excepting some special config or whatever) |
Most of the disputes are concluded and there is no point in iterating them for everyblock in each session until they are pruned. We only have a limited number of active disputes, because they conclude fast. Iterating them should be fast. |
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some other changes to the guide, at least: https://paritytech.github.io/polkadot/book/runtime/disputes.html
Other than that, looks good.
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
…/fix_disputes_initializer
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@@ -148,15 +148,15 @@ pub mod v4 { | |||
} | |||
|
|||
fn on_runtime_upgrade() -> Weight { | |||
if StorageVersion::get::<Pallet<T>>() == 3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 146 above it is checking for version 3.
No idea why the CI is not catching that, with this recent change the pre- and post checks should run. cc @kianenigma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, looks like I missed changing this and indeed CI did not fail ....
* master: Companion: wasm-builder support stable Rust (#6967) Fix feature (#6966) configuration: backport async backing parameters from the feature branch (#6961) Histogram support in runtime metrics (#6935) Bump openssl from 0.10.38 to 0.10.48 (#6955) Proxy for Nomination Pools (#6846) Nomination Pools migration v5: RewardPool fix (#6957) bump timestamp script to v0.2 (#6954) Subsystem channel tweaks (#6905) Companion for #13683 (#6944) inherent disputes: remove per block initializer and disputes timeout event (#6937)
This pull request has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/polkadot-dispute-storm-the-postmortem/2550/1 |
# Goal The goal of this PR is upgrade Polkadot to v0.9.42 Polkadot Release Notes: https://github.com/paritytech/polkadot/releases/tag/v0.9.42 https://github.com/paritytech/polkadot/releases/tag/v0.9.41 https://github.com/paritytech/polkadot/releases/tag/v0.9.40 Polkadot Release Analysis: https://forum.polkadot.network/t/polkadot-release-analysis-v0-9-41-v0-9-42/2828/1 Closes #1472 Closes #1270 Closes #1332 # Discussion - v0.9.40 was not working and there was evidence that changes in v0.9.42 were related, so we decided to jump ahead to v0.9.42 <!-- List discussion items --> # Runtime Migrations Included - [x] paritytech/polkadot#6937 - [x] paritytech/substrate#13715 - [x] paritytech/substrate#13936 - [x] paritytech/polkadot#7114 - [x] Further all migrations from 9.38+ are included: paritytech/polkadot#7162) # Checklist - [x] Chain spec updated - [ ] Custom RPC OR Runtime API added/changed? Updated js/api-augment. - [ ] Design doc(s) updated - [ ] Tests added - [ ] Benchmarks added - [x] Weights updated # Tests Performed - [x] `make ci-local` -- Passing (includes lint, docs, unit-test and integration-test) - [x] `make start-native` -- Successfully attached debugger when creating MSA with Polkadot UI - [x] `make start-relay` -- Docker Containers successfully started, but too slow in emulation on Apple Silicon M2. - [x] `make upgrade-local` -- Successfully started local relay network and upgraded to a newer test runtime. - [x] `make upgrade-local` -- Successfully updated a node running the current version on branch `main` --------- Co-authored-by: Frequency CI [bot] <do-not-reply@users.noreply.github.com> Co-authored-by: Matthew Orris <--help> Co-authored-by: Robert La Ferla <robert@onemsg.co>
Fixes #6862
TODO:
https://github.com/paritytech/polkadot/blob/master/runtime/parachains/src/disputes.rs#L918 is the culprit for the heavy weight blocks. During each block execution, but before paras inherent enter, this piece of code iterates all active/inactive disputes to mark them as timed out if that is the case. We keep disputes for 6 sessions, then we prune them. For 60k disputes as seen on Polkadot, this means 1,200000s (20usec * 60000) of weight not accounted for.
My fix is to remove the timeout event and the code relevant to it, and allow those disputes to be pruned as session advances. As discussed with @eskimor we don't really need the timeout event. However I don't have a good understanding if we would want to slash timed out disputes in the future, or how that should work. cc @ordian . If these are still needed I have already explored some options like:
Reducing the maximum allowed dispute weight to just 70% of the total block weight is to reduce heavy block import times during high load. I would personally go even lower to 50%, but with current value plus backing and bitfield processing weight should be 90% of a block. Initially I have wrongly assumed that without pushing enough disputes on chain it would slow finality, but this is wrong as finality happens offchain and we just need the votes for slashing.I've tested this fix on Versi (without reducing the dispute weight) and metrics below confirm it works. By time index in the chart we start before the fix with disputes ongoing. Then disputes are stopped and you can see the stair pattern (10m sessions) as disputes are pruned on each new session. After the fix is applied, the block times have different distribution as we are no longer paying that cost of iterating every dispute and also get back to normal as soon as disputes are ended.