From 2f27a549113b3a3c83e5743f21bf960de1f0652a Mon Sep 17 00:00:00 2001 From: Overkillus Date: Tue, 21 Nov 2023 21:21:28 +0000 Subject: [PATCH 01/18] init --- .../src/protocol-validator-disabling.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md new file mode 100644 index 000000000000..01d8cd712fcc --- /dev/null +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -0,0 +1,9 @@ +# Validator Disabling + +As established in the [approval process](protocol-approval.md) dealing with bad parablocks is a three step process: + +1. Detection +1. Escalation +1. Consequences + +The main system responsible for dispensing consequences for malicious actors is the [dispute system](protocol-disputes.md) which eventually leads to slashes being applied. \ No newline at end of file From d0ed8683365d83b496ab7aa91df36deeda2cef79 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Tue, 21 Nov 2023 21:22:43 +0000 Subject: [PATCH 02/18] init --- polkadot/roadmap/implementers-guide/src/SUMMARY.md | 1 + 1 file changed, 1 insertion(+) diff --git a/polkadot/roadmap/implementers-guide/src/SUMMARY.md b/polkadot/roadmap/implementers-guide/src/SUMMARY.md index bb19390c7af4..41485e5df8ec 100644 --- a/polkadot/roadmap/implementers-guide/src/SUMMARY.md +++ b/polkadot/roadmap/implementers-guide/src/SUMMARY.md @@ -8,6 +8,7 @@ - [Disputes Process](protocol-disputes.md) - [Dispute Flow](disputes-flow.md) - [Chain Selection and Finalization](protocol-chain-selection.md) + - [Validator Disabling](protocol-validator-disabling.md) - [Architecture Overview](architecture.md) - [Messaging Overview](messaging.md) - [PVF Pre-checking](pvf-prechecking.md) From 7d0fde4b255b35cdfc3e501109384b110fbde4eb Mon Sep 17 00:00:00 2001 From: Overkillus Date: Thu, 23 Nov 2023 15:31:50 +0000 Subject: [PATCH 03/18] Background and Risks draft --- .../src/protocol-validator-disabling.md | 54 ++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 01d8cd712fcc..6d41a365f3ef 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -1,9 +1,61 @@ # Validator Disabling +## Background + As established in the [approval process](protocol-approval.md) dealing with bad parablocks is a three step process: 1. Detection 1. Escalation 1. Consequences -The main system responsible for dispensing consequences for malicious actors is the [dispute system](protocol-disputes.md) which eventually leads to slashes being applied. \ No newline at end of file +The main system responsible for dispensing consequences for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events which will be applied in the next era. It is important to note the **high latency** of the punishment as it is only effective at the start of the next era (24h in Polkadot) and does not immediately remove the validator from the active validator set. + +There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that the validator disabling is not a replacement for the dispute system. It is a complementary system that is focused on lighter but immediate consequences usually in the form of restricted validator privileges. + +Validator disabling and getting forced out at the end of an era due to slashes have similar outcomes but there are a few differences: + +- **latency** (next few blocks for validator disabling and 24-48h for getting pushed out organically) +- **pool restriction** (validator disabling can lower the number of active validators if we fully disable) +- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) + +## Risks of NOT having validator disabling + +A simple argument for disabling is that if someone is already slashed 100% and they have nothing to loose they could cause harm to the network and should be silenced. + +What harm could they cause? + +**1. Liveness attacks:** + +- Break sharding (with mass no-shows or mass disputes): It forces everyone to do all the work which affects liveness but doesn't kill it completely. The chain can progress at a slow rate. +- Mass invalid candidate backing: Spawns a lot of worthless work that needs to be done but it is bounded by backing numbers. Honest backers will still back valid candidates and that cannot be stopped. Honest block authors will eventually select valid candidates and even if disputed they will win and progress the chain. + +**2. Security attacks:** + +- The best and possibly only way to affect security is by getting lucky in the approval process. If by chance all approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around approval checking. + +> **Note:** +> With 30 approvals needed chance for that a malicious candidate going through is around 4\*10^-15. Assuming attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake slashed). + +The risk of above attacks can be possibly mitigated with more immediate measures such as validator disabling but the strategy has to be very carefully designed to not introduce new attack vectors. + +## Risks of validator disabling + +The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs could disable honest nodes or be abused by attackers to specifically silence honest nodes. Disabling honest nodes could tip the scales between honest and dishonest nodes and destabilize the protocol. Honest nodes being pushed out of consensus is primarily a problem for approval voting and disputes where a supermajority is required. + +It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems. + +> **Note:** +> What about slashes with no validator direct disabling? +> Slashing by itself is less of a problem due to its high latency of getting pushed out of the validator set. It still affects the honest slashed node in the short term (lost funds), but if the slash was truly unjustified the governance should refund the tokens after an investigation. So generally in the long term no harm will be done. It gives 24-48 hours to react in those cases which is at least a small buffer further escalate if an attack pushing out honest nodes out of consensus would show up. The pushed out validator will also be swapped out for another random validator which most likely will be honest. + +# =============================================== + +Above can be summarized as follows: + +- Disputes & Slashing are a security requirement. + +- Validator Disabling is **not** a security requirement but a liveness optimization. + +> **Note:** +> - Security = Invalid candidates cannot go through (or are statistically very improbable) +> - Liveness = Valid candidates can go through (at a decent pace) \ No newline at end of file From abb996e27c9f8f927a040d7cec7037df07e01f92 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Fri, 15 Dec 2023 20:41:04 +0000 Subject: [PATCH 04/18] Risks expanded, overview and TODOs --- .../src/protocol-disputes.md | 5 +- .../src/protocol-validator-disabling.md | 120 ++++++++++++++---- 2 files changed, 99 insertions(+), 26 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-disputes.md b/polkadot/roadmap/implementers-guide/src/protocol-disputes.md index 2a4082cc07f9..38c1c0c40d6a 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-disputes.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-disputes.md @@ -8,9 +8,8 @@ All parachain blocks that end up in the finalized relay chain should be valid. T only backed, but not included. We have two primary components for ensuring that nothing invalid ends up in the finalized relay chain: - * Approval Checking, as described [here](./protocol-approval.md) and implemented according to the [Approval - Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid parachain blocks - from making their way into the finalized relay chain as long as the amount of attempts are limited. + * Approval Checking, as described [here](./protocol-approval.md) and implemented accordingly in the [Approval + Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid parachain blocks from making their way into the finalized relay chain as long as the amount of attempts are limited. * Disputes, this protocol, which ensures that each attempt to include something bad is caught, and the offending validators are punished. Disputes differ from backing and approval process (and can not be part of those) in that a dispute is independent of a particular fork, while both backing and approval operate on particular forks. This diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 6d41a365f3ef..57aca1f0ff87 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -8,15 +8,49 @@ As established in the [approval process](protocol-approval.md) dealing with bad 1. Escalation 1. Consequences -The main system responsible for dispensing consequences for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events which will be applied in the next era. It is important to note the **high latency** of the punishment as it is only effective at the start of the next era (24h in Polkadot) and does not immediately remove the validator from the active validator set. +The main system responsible for dispensing **consequences** for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events. It is important to note the **high latency** of the punishment as it is only effective after 27 eras (27 days in Polkadot) and does not immediately remove the validator from the active validator set. -There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that the validator disabling is not a replacement for the dispute system. It is a complementary system that is focused on lighter but immediate consequences usually in the form of restricted validator privileges. +There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that the validator disabling is not a replacement for the dispute or slashing systems. It is a complementary system that is focused on lighter but immediate consequences usually in the form of restricted validator privileges. -Validator disabling and getting forced out at the end of an era due to slashes have similar outcomes but there are a few differences: +The primary goals are: +- Eliminate cases where attackers can get free attempts at attacking the network +- Eliminate or minimize the risks of honest nodes being pushed out of consensus by getting unjustly slashed -- **latency** (next few blocks for validator disabling and 24-48h for getting pushed out organically) -- **pool restriction** (validator disabling can lower the number of active validators if we fully disable) -- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) +The above two goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness ia paramount. + +Side goals are: +- Reduce the damages to honest nodes that had a fault which might cause repeated slashes + +> **Note:** \ +> Liveness = Valid candidates can go through (at a decent pace) \ +> Security = Invalid candidates cannot go through (or are statistically very improbable) + +## System Overview + +High level assumptions and goals of the validator disabling system that will be further discussed in the following sections: + +- If validator gets slashed (even 0%) we disable him in the runtime and on the node side. +- We only disable up to 1/3 of the validators. +- If there are more offenders than 1/3 of the set disable only the highest offenders. (Some will get re-enabled.) +- Disablement lasts for 1 era. +- Disabled validators remain in the active validator set but have some limited permissions +- Disabled validators can no longer back candidates +- Disabled validators can participate in approval checking and their 'valid' votes behave normally. 'invalid' - votes do not automatically escalate into disputes but they are logged and stored so they will be taken into account if a dispute arises from at least 1 honest non-disabled validator. +- Disabling does not affect GRANDPA at all. +- Disabling affects Block Authoring. (Both ways: block authoring equivocation disables and disabling stops block authoring) + +> **Note:** \ +> Having the above elements allows us to simplify the design: +> - No chilling of validators. +> - No Im-Online slashing. +> - No force new era logic. +> - No slashing spans + +

+ +# Design + +To better understand the design we will first go through what without validator disabling and what issues it can bring. ## Risks of NOT having validator disabling @@ -26,36 +60,76 @@ What harm could they cause? **1. Liveness attacks:** -- Break sharding (with mass no-shows or mass disputes): It forces everyone to do all the work which affects liveness but doesn't kill it completely. The chain can progress at a slow rate. -- Mass invalid candidate backing: Spawns a lot of worthless work that needs to be done but it is bounded by backing numbers. Honest backers will still back valid candidates and that cannot be stopped. Honest block authors will eventually select valid candidates and even if disputed they will win and progress the chain. +- 1.1. Break sharding (with mass no-shows or mass disputes): It forces everyone to do all the work which affects liveness but doesn't kill it completely. The chain can progress at a slow rate. + +- 1.2. Mass invalid candidate backing: Spawns a lot of worthless work that needs to be done but it is bounded by backing numbers. Honest backers will still back valid candidates and that cannot be stopped. Honest block authors will eventually select valid candidates and even if disputed they will win and progress the chain. + +**2. Soundness attacks:** -**2. Security attacks:** +- 2.1. The best and possibly only way to affect soundness is by getting lucky in the approval process. If by chance all approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around approval checking. -- The best and possibly only way to affect security is by getting lucky in the approval process. If by chance all approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around approval checking. +> **Note:** +> With 30 approvals needed chance that a malicious candidate going through is around 4\*10^-15. Assuming attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake slashed). + +Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes even from a single attacker. Nevertheless whatever the attack vector within the old system* the attackers would get eventually get slashed and pushed out of the active validator set. > **Note:** -> With 30 approvals needed chance for that a malicious candidate going through is around 4\*10^-15. Assuming attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake slashed). +> \* In the old design validators were chilled in the era after committing an offense. Chilled validators were excluded from NPoS elections which resulted in them getting pushed out of the validator set within 1-2 eras. This was risky as it could push out honest nodes out of consensus if they were unjustly slashed but gives some time to react through governance or community action. + +## Risks of having validator disabling -The risk of above attacks can be possibly mitigated with more immediate measures such as validator disabling but the strategy has to be very carefully designed to not introduce new attack vectors. +The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest nodes. Disabling honest nodes could tip the scales between honest and dishonest nodes and destabilize the protocol. Honest nodes being pushed out of consensus is primarily a problem for approval voting and disputes where a supermajority is required. -## Risks of validator disabling +> **Note:** +> It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems. -The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs could disable honest nodes or be abused by attackers to specifically silence honest nodes. Disabling honest nodes could tip the scales between honest and dishonest nodes and destabilize the protocol. Honest nodes being pushed out of consensus is primarily a problem for approval voting and disputes where a supermajority is required. +## Addressing the risks TODO -It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems. +**Risks of having validator disabling:** -> **Note:** -> What about slashes with no validator direct disabling? -> Slashing by itself is less of a problem due to its high latency of getting pushed out of the validator set. It still affects the honest slashed node in the short term (lost funds), but if the slash was truly unjustified the governance should refund the tokens after an investigation. So generally in the long term no harm will be done. It gives 24-48 hours to react in those cases which is at least a small buffer further escalate if an attack pushing out honest nodes out of consensus would show up. The pushed out validator will also be swapped out for another random validator which most likely will be honest. +**Risks of NOT having validator disabling:** + +# =============================================== +# Other things I need to put somewhere # =============================================== +Things to add: +- optional re-enabling (in what cases it helps) +- reasons why we disable for a full era +- confirmation trumping disablement +- reasons for not affecting grandpa +- reasons for affecting BA +- uncertainties around BEEFY +- problems with forcing new eras +- no-showing when disabled and similarity to a security analysis of a DoS attack on approvals +- accumulating slashes vs max slashing and disabling +- example attacks and how we defend from them +--- + Above can be summarized as follows: -- Disputes & Slashing are a security requirement. +- Disputes & Slashing are a soundness requirement. + +- Validator Disabling is **not** a soundness requirement but a liveness optimization. + +--- + +Validator disabling and getting forced ouf of NPoS elections due to slashes have similar outcomes but there are a few differences: + +- **latency** (next few blocks for validator disabling and 27 days for getting pushed out organically) +- **pool restriction** (validator disabling could effectively lower the number of active validators if we fully disable) +- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) + +--- + +Disabling on minor slashes and accumulating slashes should both provide enough security as a deterrent against repeating offences, but disabling for minor offences is more lenient for honest faulty nodes and that's why we prefer it. Ideally we'd have both disabling AND accumulating as attackers can still commit multiple minor offences (for instance invalid on valid disputes) in the same block before they get punished and disabled, but damages done should be minimal so it's not a huge priority. -- Validator Disabling is **not** a security requirement but a liveness optimization. +--- -> **Note:** -> - Security = Invalid candidates cannot go through (or are statistically very improbable) -> - Liveness = Valid candidates can go through (at a decent pace) \ No newline at end of file +(not here but revise rest of guide)\ +**Relevant Slashes: ** +- backing invalid -> 100% +- valid on invalid -> 100%/k +- invalid on valid -> 0% (or very small slash) +- BA equivocation -> ? (w/e it is currently) \ No newline at end of file From 06096b61327e9a0243d5e0ec92fea1922e89b90d Mon Sep 17 00:00:00 2001 From: Overkillus Date: Tue, 16 Jan 2024 22:39:14 +0000 Subject: [PATCH 05/18] mitigation, duration, economics, simplifications, extra types --- .../src/protocol-validator-disabling.md | 240 +++++++++++++----- 1 file changed, 175 insertions(+), 65 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 57aca1f0ff87..655b84eb9e00 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -8,7 +8,10 @@ As established in the [approval process](protocol-approval.md) dealing with bad 1. Escalation 1. Consequences -The main system responsible for dispensing **consequences** for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events. It is important to note the **high latency** of the punishment as it is only effective after 27 eras (27 days in Polkadot) and does not immediately remove the validator from the active validator set. +The main system responsible for dispensing **consequences** for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events. It is important to note the **high latency** of the punishment as it is only effective after 27 eras (27 days in Polkadot). Dispute concluding by itself does not immediately remove the validator from the active validator set. + +> **Note:** \ +> There was an additional mechanism of automatically chilling the validator which removed their intent to participate in the next election, but the removed validator could simply re-register his intent to validate. There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that the validator disabling is not a replacement for the dispute or slashing systems. It is a complementary system that is focused on lighter but immediate consequences usually in the form of restricted validator privileges. @@ -16,45 +19,50 @@ The primary goals are: - Eliminate cases where attackers can get free attempts at attacking the network - Eliminate or minimize the risks of honest nodes being pushed out of consensus by getting unjustly slashed -The above two goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness ia paramount. +The above three goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness ia paramount. + +> **Note:** \ +> Liveness = Valid candidates can go through (at a decent pace) \ +> Security = Invalid candidates cannot go through (or are statistically very improbable) Side goals are: - Reduce the damages to honest nodes that had a fault which might cause repeated slashes +- Reduce liveness impact of individual malicious attackers -> **Note:** \ -> Liveness = Valid candidates can go through (at a decent pace) \ -> Security = Invalid candidates cannot go through (or are statistically very improbable) +


## System Overview High level assumptions and goals of the validator disabling system that will be further discussed in the following sections: -- If validator gets slashed (even 0%) we disable him in the runtime and on the node side. -- We only disable up to 1/3 of the validators. -- If there are more offenders than 1/3 of the set disable only the highest offenders. (Some will get re-enabled.) -- Disablement lasts for 1 era. -- Disabled validators remain in the active validator set but have some limited permissions -- Disabled validators can no longer back candidates -- Disabled validators can participate in approval checking and their 'valid' votes behave normally. 'invalid' - votes do not automatically escalate into disputes but they are logged and stored so they will be taken into account if a dispute arises from at least 1 honest non-disabled validator. -- Disabling does not affect GRANDPA at all. -- Disabling affects Block Authoring. (Both ways: block authoring equivocation disables and disabling stops block authoring) +1. If validator gets slashed (even 0%) we mark them as disabled in the runtime and on the node side. +1. We only disable up to byzantine threshold of the validators. +1. If there are more offenders than byzantine threshold disable only the highest offenders. (Some might get re-enabled.) +1. Disablement lasts for 1 era. +1. Disabled validators remain in the active validator set but have some limited permissions. +1. Disabled validators can get re-elected. +1. Disabled validators can no longer back candidates. +1. Disabled validators can participate in approval checking. +1. Disabled validators cannot initiate disputes, but their votes are still counted if a dispute occurs. +1. Disabled validators making dispute statements no-show in approval checking. +1. Disabling does not affect GRANDPA at all. +1. Disabling affects Block Authoring. (Both ways: block authoring equivocation disables and disabling stops block authoring) -> **Note:** \ -> Having the above elements allows us to simplify the design: -> - No chilling of validators. -> - No Im-Online slashing. -> - No force new era logic. -> - No slashing spans -

+Having the above elements allows us to simplify the current staking & slashing design: +- No automatic chilling of validators. +- No force new era logic. +- No slashing spans -# Design +


-To better understand the design we will first go through what without validator disabling and what issues it can bring. +# Risks ## Risks of NOT having validator disabling -A simple argument for disabling is that if someone is already slashed 100% and they have nothing to loose they could cause harm to the network and should be silenced. +Assume that if an offense is committed a slash is deposited but the perpetrator can still act normally. He will be slashed 100% with a long delay. This is akin to the current design. + +A simple argument for disabling is that if someone is already slashed 100% and they have nothing to lose they could cause harm to the network and should be silenced. What harm could they cause? @@ -71,65 +79,167 @@ What harm could they cause? > **Note:** > With 30 approvals needed chance that a malicious candidate going through is around 4\*10^-15. Assuming attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake slashed). -Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes even from a single attacker. Nevertheless whatever the attack vector within the old system* the attackers would get eventually get slashed and pushed out of the active validator set. - -> **Note:** -> \* In the old design validators were chilled in the era after committing an offense. Chilled validators were excluded from NPoS elections which resulted in them getting pushed out of the validator set within 1-2 eras. This was risky as it could push out honest nodes out of consensus if they were unjustly slashed but gives some time to react through governance or community action. +Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes even from a single attacker. Nevertheless whatever the attack vector within the old system the attackers would get *eventually* get slashed and pushed out of the active validator set but they had plenty of time to wreck havoc. ## Risks of having validator disabling -The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest nodes. Disabling honest nodes could tip the scales between honest and dishonest nodes and destabilize the protocol. Honest nodes being pushed out of consensus is primarily a problem for approval voting and disputes where a supermajority is required. +Assume we fully push out validator when they commit offenses. + +The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest nodes. + +Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval checking (% for 30-ish malicious in a row). + +There are are also censorship or liveness issues if backing is suddenly dominate by malicious nodes but in general even if some honest blocks get backed liveness should be preserved. > **Note:** > It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems. -## Addressing the risks TODO +


-**Risks of having validator disabling:** +# Risks Mitigation +## Addressing the risks of having validator disabling: -**Risks of NOT having validator disabling:** +One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#system-overview)) or to be exact the byzantine threshold. If for any reason more than 1/3 of validators are getting disabled it means that some part of the protocol failed or there is more than 1/3 malicious nodes which breaks the assumptions. -# =============================================== -# Other things I need to put somewhere -# =============================================== +Even in such a dire situation where more than 1/3 got disabled the most likely scenario is a non-determinism bug or sacrifice attack bug. Those attacks generally cause minor slashes to multiple honest nodes. In such a case the situation could be salvaged by prioritizing highest offenders for disabling ([**Point 3.**](#system-overview)). -Things to add: -- optional re-enabling (in what cases it helps) -- reasons why we disable for a full era -- confirmation trumping disablement -- reasons for not affecting grandpa -- reasons for affecting BA -- uncertainties around BEEFY -- problems with forcing new eras -- no-showing when disabled and similarity to a security analysis of a DoS attack on approvals -- accumulating slashes vs max slashing and disabling -- example attacks and how we defend from them ---- +Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully fully blocking disabled nodes a different approach can be taken - one were only some functionalities are disabled ([**Point 5.**](#system-overview)). +Once of those functionalities can be approval voting which as pointed above is so crucial that even in a disabled state nodes should be able to participate in it ([**Point 8.**](#system-overview)). -Above can be summarized as follows: +> **Note:** \ +> Approval Checking statement are implicitly valid. Sending a statement for an invalid candidate is a part of the dispute logic which we did not yet discuss. For now we only allow nodes to state that a candidate is valid or remain silent. But this solves the main risk of disabling. -- Disputes & Slashing are a soundness requirement. +Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes ([**Point 7.**](#system-overview)). -- Validator Disabling is **not** a soundness requirement but a liveness optimization. - ---- -Validator disabling and getting forced ouf of NPoS elections due to slashes have similar outcomes but there are a few differences: +## Addressing the risks of NOT having validator disabling: -- **latency** (next few blocks for validator disabling and 27 days for getting pushed out organically) -- **pool restriction** (validator disabling could effectively lower the number of active validators if we fully disable) -- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) +To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1 (Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in disablement. ([**Point 7.**](#system-overview)) + +The attack vector 1.1 (Break sharding) requires a bit more nuance. If we assume that the attacker is a single entity and that he can get a lot of disputes through he could potentially incredibly easily break sharding. This generally points into the direction of disallowing that during disablement ([**Point 9.**](#system-overview)). + +This might seem like an issue because it takes away the escalation privileges of disabled approval checkers but this is NOT true. By issuing a dispute statement those nodes remain silent in approval checking because they skip their approval statement and thus will count as a no-show. This will create a mini escalation for that particular candidate. This means that disabled nodes maintain just enough escalation that they can protect soundness (same argument as soundness protection during a DoS attack on approval checking) but they lose their extreme escalation privilege which are only given to flawlessly performing nodes ([**Point 10.**](#system-overview)). + +As a defense in depth measure dispute statements from disabled validators count toward confirming disputes (byzantine threshold needed to confirm). If a dispute is confirmed everyone participates in it. This protects us from situations where due to a bug more than byzantine threshold of validators would be disabled. + +> **Note:** \ +> The way this behavior is achieved easily in implementation is that honest nodes note down dispute statements from disabled validators just like they would for normal nodes, but they do not release their own dispute statements unless the dispute is confirmed already. This simply stops the escalation process of disputes. + +

+ +# Disabling Duration + +## Context: + +A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to deposit is 28 days which ensures that the stake will eventually be slashed before being withdrawn. Disabling has to protect us for that whole period in between the offense and the actual execution. + +## Design: + +A few options for the duration of disablement were considered: +- 1 epoch (4h in Polkadot) +- 1 era (24h in Polkadot) +- 2-26 eras +- 27 eras + +1 epoch is a short period and between a few epochs the validator set might be exactly the same. It is also very difficult to fix any local node issues for honest validator in such a short time so the chance for a repeated offense is high. + +1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of the currently disabled validator might no longer be present anyway. ([**Point 4.**](#system-overview)) + +Higher values could be considered and the main arguments for those are based around the fact that it reduces the number of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in 27 attacks at most should not compromise oru safety assumptions. Although this direction could be further explored and might be parametrized for governance to decide. + +


+ +# Economic consequences of Disablement + +Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled validator will not receive any rewards for backing or block authoring. which will reduce it's profits. + +That means that the opportunity cost of being disabled is a punishment by itself and thus it can be used for some cases where a minor punishment is needed. Current implementation was using 0% slashes to mark nodes for chilling and similar approach of 0% slashes can be used to mark validators for disablement. ([**Point 1.**](#system-overview)) + +Anything higher than 0% will of course also lead to a disablement. + +> **Notes:** \ +> Alternative designs incorporating disabling proportional to offenses were explored but they were deemed too complex and not worth the effort. Main issue with those is that proportional disabling would cause back and forth between disabled and enabled which complicated tracking the state of disabled validators and messes with optimistic node optimizations. Main benefits were that minor slashes will be barely disabled which has nice properties against sacrifice attacks. + +


---- +# Simplifications -Disabling on minor slashes and accumulating slashes should both provide enough security as a deterrent against repeating offences, but disabling for minor offences is more lenient for honest faulty nodes and that's why we prefer it. Ideally we'd have both disabling AND accumulating as attackers can still commit multiple minor offences (for instance invalid on valid disputes) in the same block before they get punished and disabled, but damages done should be minimal so it's not a huge priority. +Some systems can be greatly simplified our outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of multiple bugs. + +## Automatic Chilling + +Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS solutions and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could re-register their intent to validate at any time. + +Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often postponed for very lengthy amounts of time most likely due to simply not checking their stake. + +The biggest issue was that chilling in case of honest node slashes could lead to honest validators being somewhat quickly (next era) pushed out of the next validator set. This retains the validator set size but gives an edge to attackers as they can more easily win slots in the NPoS election. + +Disabling generally makes automatic-chilling after slash events redundant and disabled nodes can be considered for re-election which ensures that we do not push honest validators out of the validator set. ([**Point 6.**](#system-overview)) + +## Forcing New Era + +Previous implementation of disabling had some mechanisms allowing for temporarily fully disabling validators and if too many were disabled forcing a new era. Substrate offered the ability to force a new era but it was also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the randomness used throughout the protocol. + +## Slashing Spans + +TODO + +


+ +# Other types of slashing + +Above slashes were specifically referring to slashing events coming from disputes against candidates, but in Polkadot other types of offenses exist for example GRANDPA equivocations or block authoring offenses. Question is if the above defined design can handle those offenses. + +## GRANDPA Offenses + +The only GRANDPA offense is an equivocation (as of now). It is not a very serious offense and some nodes committing do not endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a catastrophic failure potentially resulting in 2 finalized blocks on the same height. + +Honest nodes generally should not commit those offenses so the goal of protecting them does not apply here. + +> **Note:** \ +> A validator running multiple nodes with the same identity might equivocate, but doing that is highly not advised but it has happened before. + +It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might compromise liveness. + +Best approach is to allow disabled nodes to participate in GRANDPA as normal. ([**Point 11.**](#system-overview)) + +TODO: Verify GRANDPA performance loss if a bit less than 1/3 equivocates. +TODO: GRANDPA equivocation causing disablement? Seems unnecessary. Add reasoning. + +## Block Authoring Offenses + +Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad blocks but they only get to do it in bounded amounts. + +Disabling in BA is not a requirement as both liveness and soundness are preserved but it is the current default behavior as well as it offers a bit less wasted work. + +Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be trusted with it's PVF anymore. + +Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because of offenses in BA. ([**Point 12.**](#system-overview)) This filters out honest faulty nodes as well as protects from some attackers. + +## BEEFY + +Upcoming feature currently not in scope. It might require a brand new class of disablement with it's own separate rules. + +


+ +# Extra Design Considerations + +## Disabling vs Accumulating Slashes + +Instant disabling generally allows us to remove the need for accumulating slashes. It is a more immediate punishment and it is a more lenient punishment for honest nodes. + +The current architecture of using max slashing can be used and it works around the problems of delaying the slash for a long period. + +An alternative design with immediate slashing and acclimating slashing could relevant to other systems but it goes against the governance auditing mechanisms so it's not be suitable for Polkadot. + +## Disabling vs Getting Pushed Out of NPoS Elections + +Validator disabling and getting forced ouf of NPoS elections (1 era) due to slashes are actually very similar processes in terms of outcomes but there are some differences: + +- **latency** (next few blocks for validator disabling and 27 days for getting pushed out organically) +- **pool restriction** (validator disabling could effectively lower the number of active validators during an era if we fully disable) +- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) ---- +Granularity is particularly crucial in the final design as only a few select functions are disabled while others remain. -(not here but revise rest of guide)\ -**Relevant Slashes: ** -- backing invalid -> 100% -- valid on invalid -> 100%/k -- invalid on valid -> 0% (or very small slash) -- BA equivocation -> ? (w/e it is currently) \ No newline at end of file From 680e7915133eb08fe2174c410547322e5d0e7983 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Wed, 17 Jan 2024 03:05:50 +0000 Subject: [PATCH 06/18] implementation details --- .../src/protocol-validator-disabling.md | 37 ++++++++++++++++++- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 655b84eb9e00..237ec6108b27 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -29,8 +29,6 @@ Side goals are: - Reduce the damages to honest nodes that had a fault which might cause repeated slashes - Reduce liveness impact of individual malicious attackers -


- ## System Overview High level assumptions and goals of the validator disabling system that will be further discussed in the following sections: @@ -243,3 +241,38 @@ Validator disabling and getting forced ouf of NPoS elections (1 era) due to slas Granularity is particularly crucial in the final design as only a few select functions are disabled while others remain. +


+ +# Implementation + +Implementation of the above design covers a few additional areas that allow for node-side optimizations. + +## Core Features + +1. Disabled Validators Tracking (**Runtime**) #2950 + - Add and expose a ``disabled_validators`` map through a Runtime API + - Add new disabled validators when they get slashed +1. Enforce Backing Disabling (**Runtime**) #1592 + - Filter out votes from ``disabled_validators`` in ``BackedCandidates`` in ``process_inherent_data`` +1. Substrate BZT Limit for Disabling #1963 + - Can be parametrized but default to BZT + - Disable only up to 1/3 of validators +1. Set Disabling Duration to 1 Era #1966 + - Clear ``disabled_validators`` on era change +1. Respect Disabling in Backing Statement Distribution (**Node**) #1591 + - This is an optimization as in the end it would get filtered in the runtime anyway + - Filter out backing statements coming from ``disabled_validators`` +1. Respect Disablement in Backing (**Node**) #2951 + - This is an optimization as in the end it would get filtered in the runtime anyway + - Don't start backing new candidates when disabled + - Don't react to backing requests when disabled +1. Stop Automatic Chilling of Offenders #1962 +1. Respect Disabling in Dispute Participation (**Node**) #2225 + - Receive dispute statements from ``disabled_validators`` but do not release own statements + - Ensure dispute confirmation when BZT statements from disabled +1. Defense Against Past-Era Dispute Spam (**Node**) #2225 + - Add a node-side parallel store of ``disabled_validators`` + - Runtime ``disabled_validators`` always have priority over node-side ``disabled_validators`` + - Respect the BZT threshold +1. Re-enable small offender when approaching BZT (**Runtime**) #TODO + From 03abe5d6540a253aa8d56c3cd86a2bc09b5aefe8 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Wed, 17 Jan 2024 03:22:40 +0000 Subject: [PATCH 07/18] minor changes --- .../src/protocol-validator-disabling.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 237ec6108b27..c9fa3e94f6ac 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -153,6 +153,7 @@ Higher values could be considered and the main arguments for those are based aro Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled validator will not receive any rewards for backing or block authoring. which will reduce it's profits. That means that the opportunity cost of being disabled is a punishment by itself and thus it can be used for some cases where a minor punishment is needed. Current implementation was using 0% slashes to mark nodes for chilling and similar approach of 0% slashes can be used to mark validators for disablement. ([**Point 1.**](#system-overview)) +0% slashes could for instance be used to punish approval checkers voting invalid on valid candidates. Anything higher than 0% will of course also lead to a disablement. @@ -161,7 +162,7 @@ Anything higher than 0% will of course also lead to a disablement.


-# Simplifications +# Redundancy Some systems can be greatly simplified our outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of multiple bugs. @@ -247,7 +248,7 @@ Granularity is particularly crucial in the final design as only a few select fun Implementation of the above design covers a few additional areas that allow for node-side optimizations. -## Core Features +## Core Features (+ #issues) 1. Disabled Validators Tracking (**Runtime**) #2950 - Add and expose a ``disabled_validators`` map through a Runtime API @@ -267,12 +268,18 @@ Implementation of the above design covers a few additional areas that allow for - Don't start backing new candidates when disabled - Don't react to backing requests when disabled 1. Stop Automatic Chilling of Offenders #1962 + - Chilling still persists as a state but is no longer automatic applied on offenses 1. Respect Disabling in Dispute Participation (**Node**) #2225 - Receive dispute statements from ``disabled_validators`` but do not release own statements - Ensure dispute confirmation when BZT statements from disabled 1. Defense Against Past-Era Dispute Spam (**Node**) #2225 + - This is needed because runtime cannot disable validators which it no longer knows about - Add a node-side parallel store of ``disabled_validators`` + - Add new disabled validators to node-side store when they loose a dispute in any leaf in scope - Runtime ``disabled_validators`` always have priority over node-side ``disabled_validators`` - Respect the BZT threshold + > **Note:** \ + > An alternative design here was considered where instead of tracking new incoming leaves a relay parent is used. This would guarantee determinism as different nodes can see different leaves, but this approach was leaving too wide of a window because of Async-Backing. Relay Parent could have been significantly in the past and it would give a lot of time for past session disputes to be spammed. 1. Re-enable small offender when approaching BZT (**Runtime**) #TODO + - When BZT limit is reached and there are more offenders to be disabled re-enable the smallest offenders to disable the biggest ones From 06d24fc3ba40b50b9ea1705316bc4044d86c1636 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Wed, 17 Jan 2024 14:38:28 +0000 Subject: [PATCH 08/18] GRANDPA fixes --- .../src/protocol-validator-disabling.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index c9fa3e94f6ac..360b9cd4c811 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -39,11 +39,11 @@ High level assumptions and goals of the validator disabling system that will be 1. Disablement lasts for 1 era. 1. Disabled validators remain in the active validator set but have some limited permissions. 1. Disabled validators can get re-elected. -1. Disabled validators can no longer back candidates. +1. Disabled validators cannot back candidates. 1. Disabled validators can participate in approval checking. 1. Disabled validators cannot initiate disputes, but their votes are still counted if a dispute occurs. 1. Disabled validators making dispute statements no-show in approval checking. -1. Disabling does not affect GRANDPA at all. +1. Disabled validators can participate in GRANDPA, but equivocations cause disablement. 1. Disabling affects Block Authoring. (Both ways: block authoring equivocation disables and disabling stops block authoring) @@ -197,14 +197,11 @@ The only GRANDPA offense is an equivocation (as of now). It is not a very seriou Honest nodes generally should not commit those offenses so the goal of protecting them does not apply here. > **Note:** \ -> A validator running multiple nodes with the same identity might equivocate, but doing that is highly not advised but it has happened before. +> A validator running multiple nodes with the same identity might equivocate. Doing that is highly not advised but it has happened before. It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might compromise liveness. -Best approach is to allow disabled nodes to participate in GRANDPA as normal. ([**Point 11.**](#system-overview)) - -TODO: Verify GRANDPA performance loss if a bit less than 1/3 equivocates. -TODO: GRANDPA equivocation causing disablement? Seems unnecessary. Add reasoning. +Best approach is to allow disabled nodes to participate in GRANDPA as normal and as mentioned before GRANDPA equivocations should not happen to honest nodes so we can safely disable the offenders. ([**Point 11.**](#system-overview)) ## Block Authoring Offenses From e354e212a75add1f97091dcaf7e3ece323b61dbe Mon Sep 17 00:00:00 2001 From: Overkillus Date: Tue, 23 Jan 2024 16:14:44 +0000 Subject: [PATCH 09/18] ordering and nits --- .../src/protocol-validator-disabling.md | 40 +++++++++---------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 360b9cd4c811..19533e0e621f 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -39,12 +39,12 @@ High level assumptions and goals of the validator disabling system that will be 1. Disablement lasts for 1 era. 1. Disabled validators remain in the active validator set but have some limited permissions. 1. Disabled validators can get re-elected. -1. Disabled validators cannot back candidates. 1. Disabled validators can participate in approval checking. +1. Disabled validators can participate in GRANDPA, but equivocations cause disablement. +1. Disabled validators cannot author blocks. +1. Disabled validators cannot back candidates. 1. Disabled validators cannot initiate disputes, but their votes are still counted if a dispute occurs. 1. Disabled validators making dispute statements no-show in approval checking. -1. Disabled validators can participate in GRANDPA, but equivocations cause disablement. -1. Disabling affects Block Authoring. (Both ways: block authoring equivocation disables and disabling stops block authoring) Having the above elements allows us to simplify the current staking & slashing design: @@ -103,21 +103,21 @@ One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#syste Even in such a dire situation where more than 1/3 got disabled the most likely scenario is a non-determinism bug or sacrifice attack bug. Those attacks generally cause minor slashes to multiple honest nodes. In such a case the situation could be salvaged by prioritizing highest offenders for disabling ([**Point 3.**](#system-overview)). Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully fully blocking disabled nodes a different approach can be taken - one were only some functionalities are disabled ([**Point 5.**](#system-overview)). -Once of those functionalities can be approval voting which as pointed above is so crucial that even in a disabled state nodes should be able to participate in it ([**Point 8.**](#system-overview)). +Once of those functionalities can be approval voting which as pointed above is so crucial that even in a disabled state nodes should be able to participate in it ([**Point 7.**](#system-overview)). > **Note:** \ > Approval Checking statement are implicitly valid. Sending a statement for an invalid candidate is a part of the dispute logic which we did not yet discuss. For now we only allow nodes to state that a candidate is valid or remain silent. But this solves the main risk of disabling. -Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes ([**Point 7.**](#system-overview)). +Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes ([**Point 10.**](#system-overview)). ## Addressing the risks of NOT having validator disabling: -To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1 (Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in disablement. ([**Point 7.**](#system-overview)) +To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1 (Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in disablement. ([**Point 10.**](#system-overview)) -The attack vector 1.1 (Break sharding) requires a bit more nuance. If we assume that the attacker is a single entity and that he can get a lot of disputes through he could potentially incredibly easily break sharding. This generally points into the direction of disallowing that during disablement ([**Point 9.**](#system-overview)). +The attack vector 1.1 (Break sharding) requires a bit more nuance. If we assume that the attacker is a single entity and that he can get a lot of disputes through he could potentially incredibly easily break sharding. This generally points into the direction of disallowing that during disablement ([**Point 11.**](#system-overview)). -This might seem like an issue because it takes away the escalation privileges of disabled approval checkers but this is NOT true. By issuing a dispute statement those nodes remain silent in approval checking because they skip their approval statement and thus will count as a no-show. This will create a mini escalation for that particular candidate. This means that disabled nodes maintain just enough escalation that they can protect soundness (same argument as soundness protection during a DoS attack on approval checking) but they lose their extreme escalation privilege which are only given to flawlessly performing nodes ([**Point 10.**](#system-overview)). +This might seem like an issue because it takes away the escalation privileges of disabled approval checkers but this is NOT true. By issuing a dispute statement those nodes remain silent in approval checking because they skip their approval statement and thus will count as a no-show. This will create a mini escalation for that particular candidate. This means that disabled nodes maintain just enough escalation that they can protect soundness (same argument as soundness protection during a DoS attack on approval checking) but they lose their extreme escalation privilege which are only given to flawlessly performing nodes ([**Point 12.**](#system-overview)). As a defense in depth measure dispute statements from disabled validators count toward confirming disputes (byzantine threshold needed to confirm). If a dispute is confirmed everyone participates in it. This protects us from situations where due to a bug more than byzantine threshold of validators would be disabled. @@ -201,7 +201,7 @@ Honest nodes generally should not commit those offenses so the goal of protectin It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might compromise liveness. -Best approach is to allow disabled nodes to participate in GRANDPA as normal and as mentioned before GRANDPA equivocations should not happen to honest nodes so we can safely disable the offenders. ([**Point 11.**](#system-overview)) +Best approach is to allow disabled nodes to participate in GRANDPA as normal and as mentioned before GRANDPA equivocations should not happen to honest nodes so we can safely disable the offenders. ([**Point 8.**](#system-overview)) ## Block Authoring Offenses @@ -211,7 +211,7 @@ Disabling in BA is not a requirement as both liveness and soundness are preserve Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be trusted with it's PVF anymore. -Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because of offenses in BA. ([**Point 12.**](#system-overview)) This filters out honest faulty nodes as well as protects from some attackers. +Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because of offenses in BA. ([**Point 9.**](#system-overview)) This filters out honest faulty nodes as well as protects from some attackers. ## BEEFY @@ -245,31 +245,31 @@ Granularity is particularly crucial in the final design as only a few select fun Implementation of the above design covers a few additional areas that allow for node-side optimizations. -## Core Features (+ #issues) +## Core Features -1. Disabled Validators Tracking (**Runtime**) #2950 +1. Disabled Validators Tracking (**Runtime**) [#2950](https://github.com/paritytech/polkadot-sdk/issues/2950) - Add and expose a ``disabled_validators`` map through a Runtime API - Add new disabled validators when they get slashed -1. Enforce Backing Disabling (**Runtime**) #1592 +1. Enforce Backing Disabling (**Runtime**) [#1592](https://github.com/paritytech/polkadot-sdk/issues/1592) - Filter out votes from ``disabled_validators`` in ``BackedCandidates`` in ``process_inherent_data`` -1. Substrate BZT Limit for Disabling #1963 +1. Substrate Byzantine Threshold (BZT) as Limit for Disabling [#1963](https://github.com/paritytech/polkadot-sdk/issues/1963) - Can be parametrized but default to BZT - Disable only up to 1/3 of validators -1. Set Disabling Duration to 1 Era #1966 +1. Set Disabling Duration to 1 Era [#1966](https://github.com/paritytech/polkadot-sdk/issues/1966) - Clear ``disabled_validators`` on era change -1. Respect Disabling in Backing Statement Distribution (**Node**) #1591 +1. Respect Disabling in Backing Statement Distribution (**Node**) [#1591](https://github.com/paritytech/polkadot-sdk/issues/1951) - This is an optimization as in the end it would get filtered in the runtime anyway - Filter out backing statements coming from ``disabled_validators`` -1. Respect Disablement in Backing (**Node**) #2951 +1. Respect Disablement in Backing (**Node**) [#2951](https://github.com/paritytech/polkadot-sdk/issues/2951) - This is an optimization as in the end it would get filtered in the runtime anyway - Don't start backing new candidates when disabled - Don't react to backing requests when disabled -1. Stop Automatic Chilling of Offenders #1962 +1. Stop Automatic Chilling of Offenders [#1962](https://github.com/paritytech/polkadot-sdk/issues/1962) - Chilling still persists as a state but is no longer automatic applied on offenses -1. Respect Disabling in Dispute Participation (**Node**) #2225 +1. Respect Disabling in Dispute Participation (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225) - Receive dispute statements from ``disabled_validators`` but do not release own statements - Ensure dispute confirmation when BZT statements from disabled -1. Defense Against Past-Era Dispute Spam (**Node**) #2225 +1. Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225) - This is needed because runtime cannot disable validators which it no longer knows about - Add a node-side parallel store of ``disabled_validators`` - Add new disabled validators to node-side store when they loose a dispute in any leaf in scope From 47084040774228d5067bfa91c175929ae9bf42da Mon Sep 17 00:00:00 2001 From: Overkillus Date: Tue, 23 Jan 2024 16:35:38 +0000 Subject: [PATCH 10/18] review nits --- polkadot/roadmap/implementers-guide/src/protocol-disputes.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-disputes.md b/polkadot/roadmap/implementers-guide/src/protocol-disputes.md index 38c1c0c40d6a..922cc3c3e2b5 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-disputes.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-disputes.md @@ -9,7 +9,8 @@ only backed, but not included. We have two primary components for ensuring that nothing invalid ends up in the finalized relay chain: * Approval Checking, as described [here](./protocol-approval.md) and implemented accordingly in the [Approval - Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid parachain blocks from making their way into the finalized relay chain as long as the amount of attempts are limited. +Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid parachain blocks +from making their way into the finalized relay chain as long as the amount of attempts are limited. * Disputes, this protocol, which ensures that each attempt to include something bad is caught, and the offending validators are punished. Disputes differ from backing and approval process (and can not be part of those) in that a dispute is independent of a particular fork, while both backing and approval operate on particular forks. This From 691d175066b9db5bab5ecf5e525498395821cdba Mon Sep 17 00:00:00 2001 From: Overkillus Date: Tue, 23 Jan 2024 16:35:49 +0000 Subject: [PATCH 11/18] more nits --- .../src/protocol-validator-disabling.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 19533e0e621f..4bc500c23500 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -19,7 +19,7 @@ The primary goals are: - Eliminate cases where attackers can get free attempts at attacking the network - Eliminate or minimize the risks of honest nodes being pushed out of consensus by getting unjustly slashed -The above three goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness ia paramount. +The above three goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness is paramount. > **Note:** \ > Liveness = Valid candidates can go through (at a decent pace) \ @@ -102,6 +102,9 @@ One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#syste Even in such a dire situation where more than 1/3 got disabled the most likely scenario is a non-determinism bug or sacrifice attack bug. Those attacks generally cause minor slashes to multiple honest nodes. In such a case the situation could be salvaged by prioritizing highest offenders for disabling ([**Point 3.**](#system-overview)). +> **Note:** \ +> System can be launched with re-enabling and will still provide some security improvements. Re-enabling will be launched in an upgrade after the initial deployment. + Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully fully blocking disabled nodes a different approach can be taken - one were only some functionalities are disabled ([**Point 5.**](#system-overview)). Once of those functionalities can be approval voting which as pointed above is so crucial that even in a disabled state nodes should be able to participate in it ([**Point 7.**](#system-overview)). @@ -150,7 +153,7 @@ Higher values could be considered and the main arguments for those are based aro # Economic consequences of Disablement -Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled validator will not receive any rewards for backing or block authoring. which will reduce it's profits. +Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled validator will not receive any rewards for backing or block authoring. which will reduce its profits. That means that the opportunity cost of being disabled is a punishment by itself and thus it can be used for some cases where a minor punishment is needed. Current implementation was using 0% slashes to mark nodes for chilling and similar approach of 0% slashes can be used to mark validators for disablement. ([**Point 1.**](#system-overview)) 0% slashes could for instance be used to punish approval checkers voting invalid on valid candidates. @@ -164,13 +167,13 @@ Anything higher than 0% will of course also lead to a disablement. # Redundancy -Some systems can be greatly simplified our outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of multiple bugs. +Some systems can be greatly simplified or outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of multiple bugs. ## Automatic Chilling Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS solutions and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could re-register their intent to validate at any time. -Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often postponed for very lengthy amounts of time most likely due to simply not checking their stake. +Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often postponed for very lengthy amounts of time most likely due to simply not checking their stake. This forced unsubscribing of nominators was later removed but it leads back to the original quoted issue of offending validators simply re-registering their interest and continuing to attack the network. The biggest issue was that chilling in case of honest node slashes could lead to honest validators being somewhat quickly (next era) pushed out of the next validator set. This retains the validator set size but gives an edge to attackers as they can more easily win slots in the NPoS election. @@ -178,11 +181,7 @@ Disabling generally makes automatic-chilling after slash events redundant and di ## Forcing New Era -Previous implementation of disabling had some mechanisms allowing for temporarily fully disabling validators and if too many were disabled forcing a new era. Substrate offered the ability to force a new era but it was also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the randomness used throughout the protocol. - -## Slashing Spans - -TODO +Previous implementation of disabling had some mechanisms allowing for temporarily fully disabling validators and if too many were disabled forcing a new era. Frame staking pallet offered the ability to force a new era but it was also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the randomness used throughout the protocol.


From 85bc54626f37fa9d00e438ec8a8f410a8313d579 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Thu, 9 May 2024 13:15:57 +0100 Subject: [PATCH 12/18] Review feedback, approval slashes clarifications, typos, beefy clarifications --- .../src/protocol-validator-disabling.md | 68 ++++++++++--------- 1 file changed, 35 insertions(+), 33 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 4bc500c23500..ec5426efbebd 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -8,7 +8,7 @@ As established in the [approval process](protocol-approval.md) dealing with bad 1. Escalation 1. Consequences -The main system responsible for dispensing **consequences** for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events. It is important to note the **high latency** of the punishment as it is only effective after 27 eras (27 days in Polkadot). Dispute concluding by itself does not immediately remove the validator from the active validator set. +The main system responsible for dispensing **consequences** for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events. The slashes itself can be dispensed quickly (a matter of blocks) but for an extra layer of auditing all slashes are deferred for 27 days (in Polkadot/Kusama) which gives time for Governance to investigate and potentially alter the punishment. Dispute concluding by itself does not immediately remove the validator from the active validator set. > **Note:** \ > There was an additional mechanism of automatically chilling the validator which removed their intent to participate in the next election, but the removed validator could simply re-register his intent to validate. @@ -16,10 +16,10 @@ The main system responsible for dispensing **consequences** for malicious actors There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that the validator disabling is not a replacement for the dispute or slashing systems. It is a complementary system that is focused on lighter but immediate consequences usually in the form of restricted validator privileges. The primary goals are: -- Eliminate cases where attackers can get free attempts at attacking the network -- Eliminate or minimize the risks of honest nodes being pushed out of consensus by getting unjustly slashed +- Eliminate or minimize cases where attackers can get free attempts at attacking the network +- Eliminate or minimize the risks of honest nodes being pushed out of consensus when getting unjustly slashed (defense in depth) -The above three goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness is paramount. +The above two goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness is paramount. > **Note:** \ > Liveness = Valid candidates can go through (at a decent pace) \ @@ -40,25 +40,19 @@ High level assumptions and goals of the validator disabling system that will be 1. Disabled validators remain in the active validator set but have some limited permissions. 1. Disabled validators can get re-elected. 1. Disabled validators can participate in approval checking. -1. Disabled validators can participate in GRANDPA, but equivocations cause disablement. +1. Disabled validators can participate in GRANDPA/BEEFY, but equivocations cause disablement. 1. Disabled validators cannot author blocks. 1. Disabled validators cannot back candidates. 1. Disabled validators cannot initiate disputes, but their votes are still counted if a dispute occurs. 1. Disabled validators making dispute statements no-show in approval checking. - -Having the above elements allows us to simplify the current staking & slashing design: -- No automatic chilling of validators. -- No force new era logic. -- No slashing spans -


# Risks ## Risks of NOT having validator disabling -Assume that if an offense is committed a slash is deposited but the perpetrator can still act normally. He will be slashed 100% with a long delay. This is akin to the current design. +Assume that if an offense is committed a slash is deposited but the perpetrator can still act normally. He will be slashed 100% with a long delay (slash deferral duration which is 27 days). This is akin to the current design. A simple argument for disabling is that if someone is already slashed 100% and they have nothing to lose they could cause harm to the network and should be silenced. @@ -87,10 +81,10 @@ The primary risk behind having any sort of disabling is that it is a double-edge Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval checking (% for 30-ish malicious in a row). -There are are also censorship or liveness issues if backing is suddenly dominate by malicious nodes but in general even if some honest blocks get backed liveness should be preserved. +There are also censorship or liveness issues if backing is suddenly dominated by malicious nodes but in general even if some honest blocks get backed liveness should be preserved. > **Note:** -> It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems. +> It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes and determinism are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems.


@@ -105,7 +99,7 @@ Even in such a dire situation where more than 1/3 got disabled the most likely s > **Note:** \ > System can be launched with re-enabling and will still provide some security improvements. Re-enabling will be launched in an upgrade after the initial deployment. -Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully fully blocking disabled nodes a different approach can be taken - one were only some functionalities are disabled ([**Point 5.**](#system-overview)). +Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully blocking disabled nodes a different approach can be taken - one were only some functionalities are disabled ([**Point 5.**](#system-overview)). Once of those functionalities can be approval voting which as pointed above is so crucial that even in a disabled state nodes should be able to participate in it ([**Point 7.**](#system-overview)). > **Note:** \ @@ -133,7 +127,7 @@ As a defense in depth measure dispute statements from disabled validators count ## Context: -A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to deposit is 28 days which ensures that the stake will eventually be slashed before being withdrawn. Disabling has to protect us for that whole period in between the offense and the actual execution. +A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to unbond you stake is 28 days which ensures that the stake will eventually be slashed before being withdrawn. ## Design: @@ -143,11 +137,11 @@ A few options for the duration of disablement were considered: - 2-26 eras - 27 eras -1 epoch is a short period and between a few epochs the validator set might be exactly the same. It is also very difficult to fix any local node issues for honest validator in such a short time so the chance for a repeated offense is high. +1 epoch is a short period and between a few epochs the validator will most likely be exactly the same. It is also very difficult to fix any local node issues for honest validator in such a short time so the chance for a repeated offense is high. -1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of the currently disabled validator might no longer be present anyway. ([**Point 4.**](#system-overview)) +1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of the currently disabled validator might no longer be present anyway. It also gives the time for the validator to chill themselves if they have identified a cause and want to spend more time fixing it. ([**Point 4.**](#system-overview)) -Higher values could be considered and the main arguments for those are based around the fact that it reduces the number of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in 27 attacks at most should not compromise oru safety assumptions. Although this direction could be further explored and might be parametrized for governance to decide. +Higher values could be considered and the main arguments for those are based around the fact that it reduces the number of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in 27 attacks at most should not compromise our safety assumptions. Although this direction could be further explored and might be parametrized for governance to decide.


@@ -167,21 +161,24 @@ Anything higher than 0% will of course also lead to a disablement. # Redundancy -Some systems can be greatly simplified or outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of multiple bugs. +Some systems can be greatly simplified or outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of potential bugs or new attack vectors. ## Automatic Chilling -Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS solutions and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could re-register their intent to validate at any time. +Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS elections and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could re-register their intent to validate at any time. The intent behind this logic was to protect honest stakes from repeated slashes caused by unnoticed bugs. It would give time for validators to fix their issue before continuing as a validator. -Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often postponed for very lengthy amounts of time most likely due to simply not checking their stake. This forced unsubscribing of nominators was later removed but it leads back to the original quoted issue of offending validators simply re-registering their interest and continuing to attack the network. +Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often postponed for very lengthy amounts of time most likely due to simply not checking their stake. **This forced unsubscribing of nominators was later disabled.** -The biggest issue was that chilling in case of honest node slashes could lead to honest validators being somewhat quickly (next era) pushed out of the next validator set. This retains the validator set size but gives an edge to attackers as they can more easily win slots in the NPoS election. +Automatic chilling was achieving its goals in ideal scenarios (no attackers, no lazy nominators) but it opened new vulnerabilities for attackers. The biggest issue was that chilling in case of honest node slashes could lead to honest validators being quickly pushed out of the next validator set within the next era. This retains the validator set size but gives an edge to attackers as they can more easily win slots in the NPoS election. -Disabling generally makes automatic-chilling after slash events redundant and disabled nodes can be considered for re-election which ensures that we do not push honest validators out of the validator set. ([**Point 6.**](#system-overview)) +Disabling allows for punishment that limits the damages malicious actors can cause without having to resort to kicking them out of the validator set. This protects us from the edge case of honest validators getting quickly pushed out of the set by slashes. ([**Point 6.**](#system-overview)) + +> **Notes:** \ +> As long as honest slashes absolutely cannot occur automatic chilling is a sensible and desirable. This means it could be re-enabled once PolkaVM introduces deterministic gas metering. Then best of both worlds could be achieved. ## Forcing New Era -Previous implementation of disabling had some mechanisms allowing for temporarily fully disabling validators and if too many were disabled forcing a new era. Frame staking pallet offered the ability to force a new era but it was also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the randomness used throughout the protocol. +Previous implementation of disabling had some limited mechanisms allowing for validators disablement and if too many were disabled forcing a new era (new election). Frame staking pallet offered the ability to force a new era but it was also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the randomness used throughout the protocol.


@@ -189,9 +186,9 @@ Previous implementation of disabling had some mechanisms allowing for temporaril Above slashes were specifically referring to slashing events coming from disputes against candidates, but in Polkadot other types of offenses exist for example GRANDPA equivocations or block authoring offenses. Question is if the above defined design can handle those offenses. -## GRANDPA Offenses +## GRANDPA/BEEFY Offenses -The only GRANDPA offense is an equivocation (as of now). It is not a very serious offense and some nodes committing do not endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a catastrophic failure potentially resulting in 2 finalized blocks on the same height. +The main offences for GRANDPA/BEEFY are equivocations. It is not a very serious offense and some nodes committing do not endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a catastrophic failure potentially resulting in 2 finalized blocks on the same height in the case of GRANDPA. Honest nodes generally should not commit those offenses so the goal of protecting them does not apply here. @@ -200,9 +197,9 @@ Honest nodes generally should not commit those offenses so the goal of protectin It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might compromise liveness. -Best approach is to allow disabled nodes to participate in GRANDPA as normal and as mentioned before GRANDPA equivocations should not happen to honest nodes so we can safely disable the offenders. ([**Point 8.**](#system-overview)) +Best approach is to allow disabled nodes to participate in GRANDPA/BEEFY as normal and as mentioned before GRANDPA/BABE/BEEFY equivocations should not happen to honest nodes so we can safely disable the offenders. Additionally the slashes for singular equivocations will be very low so those offenders would easily get re-enabled in the case of more serious offenders showing up. ([**Point 8.**](#system-overview)) -## Block Authoring Offenses +## Block Authoring Offenses (BABE Equivocations) Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad blocks but they only get to do it in bounded amounts. @@ -212,10 +209,6 @@ Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They m Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because of offenses in BA. ([**Point 9.**](#system-overview)) This filters out honest faulty nodes as well as protects from some attackers. -## BEEFY - -Upcoming feature currently not in scope. It might require a brand new class of disablement with it's own separate rules. -


# Extra Design Considerations @@ -238,6 +231,15 @@ Validator disabling and getting forced ouf of NPoS elections (1 era) due to slas Granularity is particularly crucial in the final design as only a few select functions are disabled while others remain. +## Enabling Approval Voter Slashes + +The original Polkadot 1.0 design describes that all validators on the loosing side of the dispute are slashed. In the current system only the backers are slashed and any approval voters on the wrong side will not be slashed. This creates some undesirable incentives: + +- Lazy approval checkers (approvals yay`ing everything) +- Spammy approval checkers (approval voters nay`ing everything) + +Initially those slashes were disabled to reduce the complexity and to minimize the risk surface in case the system malfunctioned. This is especially risky in case any nondeterministic bugs are present in the system. Once validator re-enabling is launched approval voter slashes can be re-instated. Numbers need to be further explored but slashes between 0-2% are reasonable. 0% would still disable which with the opportunity cost consideration should be enough. +


# Implementation From 2669cbc2c2f92eee1e507d651e3d769c609a9790 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Thu, 9 May 2024 14:56:10 +0100 Subject: [PATCH 13/18] fmt --- .../src/protocol-validator-disabling.md | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index ec5426efbebd..fc886f81e0a0 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -68,18 +68,18 @@ What harm could they cause? - 2.1. The best and possibly only way to affect soundness is by getting lucky in the approval process. If by chance all approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around approval checking. -> **Note:** +> **Note:** > With 30 approvals needed chance that a malicious candidate going through is around 4\*10^-15. Assuming attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake slashed). Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes even from a single attacker. Nevertheless whatever the attack vector within the old system the attackers would get *eventually* get slashed and pushed out of the active validator set but they had plenty of time to wreck havoc. ## Risks of having validator disabling -Assume we fully push out validator when they commit offenses. +Assume we fully push out validator when they commit offenses. -The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest nodes. +The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest nodes. -Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval checking (% for 30-ish malicious in a row). +Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval checking (% for 30-ish malicious in a row). There are also censorship or liveness issues if backing is suddenly dominated by malicious nodes but in general even if some honest blocks get backed liveness should be preserved. @@ -90,7 +90,7 @@ There are also censorship or liveness issues if backing is suddenly dominated by # Risks Mitigation -## Addressing the risks of having validator disabling: +## Addressing the risks of having validator disabling One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#system-overview)) or to be exact the byzantine threshold. If for any reason more than 1/3 of validators are getting disabled it means that some part of the protocol failed or there is more than 1/3 malicious nodes which breaks the assumptions. @@ -108,7 +108,7 @@ Once of those functionalities can be approval voting which as pointed above is s Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes ([**Point 10.**](#system-overview)). -## Addressing the risks of NOT having validator disabling: +## Addressing the risks of NOT having validator disabling To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1 (Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in disablement. ([**Point 10.**](#system-overview)) @@ -125,11 +125,11 @@ As a defense in depth measure dispute statements from disabled validators count # Disabling Duration -## Context: +## Context A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to unbond you stake is 28 days which ensures that the stake will eventually be slashed before being withdrawn. -## Design: +## Design A few options for the duration of disablement were considered: - 1 epoch (4h in Polkadot) @@ -141,7 +141,7 @@ A few options for the duration of disablement were considered: 1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of the currently disabled validator might no longer be present anyway. It also gives the time for the validator to chill themselves if they have identified a cause and want to spend more time fixing it. ([**Point 4.**](#system-overview)) -Higher values could be considered and the main arguments for those are based around the fact that it reduces the number of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in 27 attacks at most should not compromise our safety assumptions. Although this direction could be further explored and might be parametrized for governance to decide. +Higher values could be considered and the main arguments for those are based around the fact that it reduces the number of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in 27 attacks at most should not compromise our safety assumptions. Although this direction could be further explored and might be parametrized for governance to decide.


@@ -188,7 +188,7 @@ Above slashes were specifically referring to slashing events coming from dispute ## GRANDPA/BEEFY Offenses -The main offences for GRANDPA/BEEFY are equivocations. It is not a very serious offense and some nodes committing do not endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a catastrophic failure potentially resulting in 2 finalized blocks on the same height in the case of GRANDPA. +The main offences for GRANDPA/BEEFY are equivocations. It is not a very serious offense and some nodes committing do not endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a catastrophic failure potentially resulting in 2 finalized blocks on the same height in the case of GRANDPA. Honest nodes generally should not commit those offenses so the goal of protecting them does not apply here. @@ -201,11 +201,11 @@ Best approach is to allow disabled nodes to participate in GRANDPA/BEEFY as norm ## Block Authoring Offenses (BABE Equivocations) -Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad blocks but they only get to do it in bounded amounts. +Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad blocks but they only get to do it in bounded amounts. Disabling in BA is not a requirement as both liveness and soundness are preserved but it is the current default behavior as well as it offers a bit less wasted work. -Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be trusted with it's PVF anymore. +Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be trusted with it's PVF anymore. Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because of offenses in BA. ([**Point 9.**](#system-overview)) This filters out honest faulty nodes as well as protects from some attackers. @@ -215,7 +215,7 @@ Both points above don't present significant risks when disabling so the default ## Disabling vs Accumulating Slashes -Instant disabling generally allows us to remove the need for accumulating slashes. It is a more immediate punishment and it is a more lenient punishment for honest nodes. +Instant disabling generally allows us to remove the need for accumulating slashes. It is a more immediate punishment and it is a more lenient punishment for honest nodes. The current architecture of using max slashing can be used and it works around the problems of delaying the slash for a long period. @@ -227,7 +227,7 @@ Validator disabling and getting forced ouf of NPoS elections (1 era) due to slas - **latency** (next few blocks for validator disabling and 27 days for getting pushed out organically) - **pool restriction** (validator disabling could effectively lower the number of active validators during an era if we fully disable) -- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) +- **granularity** (validator disabling could remove only a portion of validator privileges instead of all) Granularity is particularly crucial in the final design as only a few select functions are disabled while others remain. @@ -244,7 +244,7 @@ Initially those slashes were disabled to reduce the complexity and to minimize t # Implementation -Implementation of the above design covers a few additional areas that allow for node-side optimizations. +Implementation of the above design covers a few additional areas that allow for node-side optimizations. ## Core Features From 28a111079defbe467edfe7d159d8028c6d971a0f Mon Sep 17 00:00:00 2001 From: Overkillus Date: Fri, 10 May 2024 12:36:20 +0100 Subject: [PATCH 14/18] punishment table, fmt core features update --- .../src/protocol-validator-disabling.md | 300 ++++++++--- .../parachains/src/assigner_coretime/mod.rs | 488 ++++++++++++++++++ 2 files changed, 714 insertions(+), 74 deletions(-) create mode 100644 polkadot/runtime/parachains/src/assigner_coretime/mod.rs diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index fc886f81e0a0..85b8ded42a13 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -8,18 +8,29 @@ As established in the [approval process](protocol-approval.md) dealing with bad 1. Escalation 1. Consequences -The main system responsible for dispensing **consequences** for malicious actors is the [dispute system](protocol-disputes.md) which eventually dispenses slash events. The slashes itself can be dispensed quickly (a matter of blocks) but for an extra layer of auditing all slashes are deferred for 27 days (in Polkadot/Kusama) which gives time for Governance to investigate and potentially alter the punishment. Dispute concluding by itself does not immediately remove the validator from the active validator set. +The main system responsible for dispensing **consequences** for malicious actors is the [dispute +system](protocol-disputes.md) which eventually dispenses slash events. The slashes itself can be dispensed quickly (a +matter of blocks) but for an extra layer of auditing all slashes are deferred for 27 days (in Polkadot/Kusama) which +gives time for Governance to investigate and potentially alter the punishment. Dispute concluding by itself does not +immediately remove the validator from the active validator set. > **Note:** \ -> There was an additional mechanism of automatically chilling the validator which removed their intent to participate in the next election, but the removed validator could simply re-register his intent to validate. +> There was an additional mechanism of automatically chilling the validator which removed their intent to participate in +> the next election, but the removed validator could simply re-register his intent to validate. -There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that the validator disabling is not a replacement for the dispute or slashing systems. It is a complementary system that is focused on lighter but immediate consequences usually in the form of restricted validator privileges. +There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling +comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that +the validator disabling is not a replacement for the dispute or slashing systems. It is a complementary system that is +focused on lighter but immediate consequences usually in the form of restricted validator privileges. The primary goals are: - Eliminate or minimize cases where attackers can get free attempts at attacking the network -- Eliminate or minimize the risks of honest nodes being pushed out of consensus when getting unjustly slashed (defense in depth) +- Eliminate or minimize the risks of honest nodes being pushed out of consensus when getting unjustly slashed (defense + in depth) -The above two goals are generally at odds so a careful balance has to be struck between them. We will achieve them by sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but absolute soundness is paramount. +The above two goals are generally at odds so a careful balance has to be struck between them. We will achieve them by +sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but +absolute soundness is paramount. > **Note:** \ > Liveness = Valid candidates can go through (at a decent pace) \ @@ -31,7 +42,8 @@ Side goals are: ## System Overview -High level assumptions and goals of the validator disabling system that will be further discussed in the following sections: +High level assumptions and goals of the validator disabling system that will be further discussed in the following +sections: 1. If validator gets slashed (even 0%) we mark them as disabled in the runtime and on the node side. 1. We only disable up to byzantine threshold of the validators. @@ -52,39 +64,56 @@ High level assumptions and goals of the validator disabling system that will be ## Risks of NOT having validator disabling -Assume that if an offense is committed a slash is deposited but the perpetrator can still act normally. He will be slashed 100% with a long delay (slash deferral duration which is 27 days). This is akin to the current design. +Assume that if an offense is committed a slash is deposited but the perpetrator can still act normally. He will be +slashed 100% with a long delay (slash deferral duration which is 27 days). This is akin to the current design. -A simple argument for disabling is that if someone is already slashed 100% and they have nothing to lose they could cause harm to the network and should be silenced. +A simple argument for disabling is that if someone is already slashed 100% and they have nothing to lose they could +cause harm to the network and should be silenced. What harm could they cause? **1. Liveness attacks:** -- 1.1. Break sharding (with mass no-shows or mass disputes): It forces everyone to do all the work which affects liveness but doesn't kill it completely. The chain can progress at a slow rate. +- 1.1. Break sharding (with mass no-shows or mass disputes): It forces everyone to do all the work which affects + liveness but doesn't kill it completely. The chain can progress at a slow rate. -- 1.2. Mass invalid candidate backing: Spawns a lot of worthless work that needs to be done but it is bounded by backing numbers. Honest backers will still back valid candidates and that cannot be stopped. Honest block authors will eventually select valid candidates and even if disputed they will win and progress the chain. +- 1.2. Mass invalid candidate backing: Spawns a lot of worthless work that needs to be done but it is bounded by backing + numbers. Honest backers will still back valid candidates and that cannot be stopped. Honest block authors will + eventually select valid candidates and even if disputed they will win and progress the chain. **2. Soundness attacks:** -- 2.1. The best and possibly only way to affect soundness is by getting lucky in the approval process. If by chance all approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around approval checking. +- 2.1. The best and possibly only way to affect soundness is by getting lucky in the approval process. If by chance all + approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be + relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around + approval checking. -> **Note:** -> With 30 approvals needed chance that a malicious candidate going through is around 4\*10^-15. Assuming attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake slashed). +> **Note:** With 30 approvals needed chance that a malicious candidate going through is around 4\*10^-15. Assuming +> attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it +> still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake +> slashed). -Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes even from a single attacker. Nevertheless whatever the attack vector within the old system the attackers would get *eventually* get slashed and pushed out of the active validator set but they had plenty of time to wreck havoc. +Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes +even from a single attacker. Nevertheless whatever the attack vector within the old system the attackers would get +*eventually* get slashed and pushed out of the active validator set but they had plenty of time to wreck havoc. ## Risks of having validator disabling Assume we fully push out validator when they commit offenses. -The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest nodes. +The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs +or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest +nodes. -Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval checking (% for 30-ish malicious in a row). +Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval +checking (% for 30-ish malicious in a row). -There are also censorship or liveness issues if backing is suddenly dominated by malicious nodes but in general even if some honest blocks get backed liveness should be preserved. +There are also censorship or liveness issues if backing is suddenly dominated by malicious nodes but in general even if +some honest blocks get backed liveness should be preserved. -> **Note:** -> It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are perfect it should not be a real concern. In reality disputes and determinism are difficult to get right, and non-determinism and happen so defense in depth is crucial when handling those subsystems. +> **Note:** It is worth noting that is is fundamentally a defense in depth strategy because if we assume disputes are +> perfect it should not be a real concern. In reality disputes and determinism are difficult to get right, and +> non-determinism and happen so defense in depth is crucial when handling those subsystems.


@@ -92,34 +121,60 @@ There are also censorship or liveness issues if backing is suddenly dominated by ## Addressing the risks of having validator disabling -One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#system-overview)) or to be exact the byzantine threshold. If for any reason more than 1/3 of validators are getting disabled it means that some part of the protocol failed or there is more than 1/3 malicious nodes which breaks the assumptions. +One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#system-overview)) or to be exact the +byzantine threshold. If for any reason more than 1/3 of validators are getting disabled it means that some part of the +protocol failed or there is more than 1/3 malicious nodes which breaks the assumptions. -Even in such a dire situation where more than 1/3 got disabled the most likely scenario is a non-determinism bug or sacrifice attack bug. Those attacks generally cause minor slashes to multiple honest nodes. In such a case the situation could be salvaged by prioritizing highest offenders for disabling ([**Point 3.**](#system-overview)). +Even in such a dire situation where more than 1/3 got disabled the most likely scenario is a non-determinism bug or +sacrifice attack bug. Those attacks generally cause minor slashes to multiple honest nodes. In such a case the situation +could be salvaged by prioritizing highest offenders for disabling ([**Point 3.**](#system-overview)). > **Note:** \ -> System can be launched with re-enabling and will still provide some security improvements. Re-enabling will be launched in an upgrade after the initial deployment. +> System can be launched with re-enabling and will still provide some security improvements. Re-enabling will be +> launched in an upgrade after the initial deployment. -Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully blocking disabled nodes a different approach can be taken - one were only some functionalities are disabled ([**Point 5.**](#system-overview)). -Once of those functionalities can be approval voting which as pointed above is so crucial that even in a disabled state nodes should be able to participate in it ([**Point 7.**](#system-overview)). +Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or +sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully blocking disabled +nodes a different approach can be taken - one were only some functionalities are disabled ([**Point +5.**](#system-overview)). Once of those functionalities can be approval voting which as pointed above is so crucial that +even in a disabled state nodes should be able to participate in it ([**Point 7.**](#system-overview)). > **Note:** \ -> Approval Checking statement are implicitly valid. Sending a statement for an invalid candidate is a part of the dispute logic which we did not yet discuss. For now we only allow nodes to state that a candidate is valid or remain silent. But this solves the main risk of disabling. +> Approval Checking statement are implicitly valid. Sending a statement for an invalid candidate is a part of the +> dispute logic which we did not yet discuss. For now we only allow nodes to state that a candidate is valid or remain +> silent. But this solves the main risk of disabling. -Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes ([**Point 10.**](#system-overview)). +Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in +backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes +([**Point 10.**](#system-overview)). ## Addressing the risks of NOT having validator disabling -To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1 (Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in disablement. ([**Point 10.**](#system-overview)) +To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1 +(Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious +candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in +disablement. ([**Point 10.**](#system-overview)) -The attack vector 1.1 (Break sharding) requires a bit more nuance. If we assume that the attacker is a single entity and that he can get a lot of disputes through he could potentially incredibly easily break sharding. This generally points into the direction of disallowing that during disablement ([**Point 11.**](#system-overview)). +The attack vector 1.1 (Break sharding) requires a bit more nuance. If we assume that the attacker is a single entity and +that he can get a lot of disputes through he could potentially incredibly easily break sharding. This generally points +into the direction of disallowing that during disablement ([**Point 11.**](#system-overview)). -This might seem like an issue because it takes away the escalation privileges of disabled approval checkers but this is NOT true. By issuing a dispute statement those nodes remain silent in approval checking because they skip their approval statement and thus will count as a no-show. This will create a mini escalation for that particular candidate. This means that disabled nodes maintain just enough escalation that they can protect soundness (same argument as soundness protection during a DoS attack on approval checking) but they lose their extreme escalation privilege which are only given to flawlessly performing nodes ([**Point 12.**](#system-overview)). +This might seem like an issue because it takes away the escalation privileges of disabled approval checkers but this is +NOT true. By issuing a dispute statement those nodes remain silent in approval checking because they skip their approval +statement and thus will count as a no-show. This will create a mini escalation for that particular candidate. This means +that disabled nodes maintain just enough escalation that they can protect soundness (same argument as soundness +protection during a DoS attack on approval checking) but they lose their extreme escalation privilege which are only +given to flawlessly performing nodes ([**Point 12.**](#system-overview)). -As a defense in depth measure dispute statements from disabled validators count toward confirming disputes (byzantine threshold needed to confirm). If a dispute is confirmed everyone participates in it. This protects us from situations where due to a bug more than byzantine threshold of validators would be disabled. +As a defense in depth measure dispute statements from disabled validators count toward confirming disputes (byzantine +threshold needed to confirm). If a dispute is confirmed everyone participates in it. This protects us from situations +where due to a bug more than byzantine threshold of validators would be disabled. > **Note:** \ -> The way this behavior is achieved easily in implementation is that honest nodes note down dispute statements from disabled validators just like they would for normal nodes, but they do not release their own dispute statements unless the dispute is confirmed already. This simply stops the escalation process of disputes. +> The way this behavior is achieved easily in implementation is that honest nodes note down dispute statements from +> disabled validators just like they would for normal nodes, but they do not release their own dispute statements unless +> the dispute is confirmed already. This simply stops the escalation process of disputes.

@@ -127,7 +182,10 @@ As a defense in depth measure dispute statements from disabled validators count ## Context -A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to unbond you stake is 28 days which ensures that the stake will eventually be slashed before being withdrawn. +A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are +delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and +potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to +unbond you stake is 28 days which ensures that the stake will eventually be slashed before being withdrawn. ## Design @@ -137,77 +195,128 @@ A few options for the duration of disablement were considered: - 2-26 eras - 27 eras -1 epoch is a short period and between a few epochs the validator will most likely be exactly the same. It is also very difficult to fix any local node issues for honest validator in such a short time so the chance for a repeated offense is high. +1 epoch is a short period and between a few epochs the validator will most likely be exactly the same. It is also very +difficult to fix any local node issues for honest validator in such a short time so the chance for a repeated offense is +high. -1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of the currently disabled validator might no longer be present anyway. It also gives the time for the validator to chill themselves if they have identified a cause and want to spend more time fixing it. ([**Point 4.**](#system-overview)) +1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of +the currently disabled validator might no longer be present anyway. It also gives the time for the validator to chill +themselves if they have identified a cause and want to spend more time fixing it. ([**Point 4.**](#system-overview)) -Higher values could be considered and the main arguments for those are based around the fact that it reduces the number of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in 27 attacks at most should not compromise our safety assumptions. Although this direction could be further explored and might be parametrized for governance to decide. +Higher values could be considered and the main arguments for those are based around the fact that it reduces the number +of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in +27 attacks at most should not compromise our safety assumptions. Although this direction could be further explored and +might be parametrized for governance to decide.


# Economic consequences of Disablement -Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled validator will not receive any rewards for backing or block authoring. which will reduce its profits. +Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled +validator will not receive any rewards for backing or block authoring. which will reduce its profits. -That means that the opportunity cost of being disabled is a punishment by itself and thus it can be used for some cases where a minor punishment is needed. Current implementation was using 0% slashes to mark nodes for chilling and similar approach of 0% slashes can be used to mark validators for disablement. ([**Point 1.**](#system-overview)) -0% slashes could for instance be used to punish approval checkers voting invalid on valid candidates. +That means that the opportunity cost of being disabled is a punishment by itself and thus it can be used for some cases +where a minor punishment is needed. Current implementation was using 0% slashes to mark nodes for chilling and similar +approach of 0% slashes can be used to mark validators for disablement. ([**Point 1.**](#system-overview)) 0% slashes +could for instance be used to punish approval checkers voting invalid on valid candidates. Anything higher than 0% will of course also lead to a disablement. > **Notes:** \ -> Alternative designs incorporating disabling proportional to offenses were explored but they were deemed too complex and not worth the effort. Main issue with those is that proportional disabling would cause back and forth between disabled and enabled which complicated tracking the state of disabled validators and messes with optimistic node optimizations. Main benefits were that minor slashes will be barely disabled which has nice properties against sacrifice attacks. +> Alternative designs incorporating disabling proportional to offenses were explored but they were deemed too complex +> and not worth the effort. Main issue with those is that proportional disabling would cause back and forth between +> disabled and enabled which complicated tracking the state of disabled validators and messes with optimistic node +> optimizations. Main benefits were that minor slashes will be barely disabled which has nice properties against +> sacrifice attacks.


# Redundancy -Some systems can be greatly simplified or outright removed thanks to the above changes. This leads to reduced complexity around the systems that were hard to reason about and were sources of potential bugs or new attack vectors. +Some systems can be greatly simplified or outright removed thanks to the above changes. This leads to reduced complexity +around the systems that were hard to reason about and were sources of potential bugs or new attack vectors. ## Automatic Chilling -Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS elections and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could re-register their intent to validate at any time. The intent behind this logic was to protect honest stakes from repeated slashes caused by unnoticed bugs. It would give time for validators to fix their issue before continuing as a validator. - -Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often postponed for very lengthy amounts of time most likely due to simply not checking their stake. **This forced unsubscribing of nominators was later disabled.** - -Automatic chilling was achieving its goals in ideal scenarios (no attackers, no lazy nominators) but it opened new vulnerabilities for attackers. The biggest issue was that chilling in case of honest node slashes could lead to honest validators being quickly pushed out of the next validator set within the next era. This retains the validator set size but gives an edge to attackers as they can more easily win slots in the NPoS election. - -Disabling allows for punishment that limits the damages malicious actors can cause without having to resort to kicking them out of the validator set. This protects us from the edge case of honest validators getting quickly pushed out of the set by slashes. ([**Point 6.**](#system-overview)) +Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS +elections and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late +offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could +re-register their intent to validate at any time. The intent behind this logic was to protect honest stakes from +repeated slashes caused by unnoticed bugs. It would give time for validators to fix their issue before continuing as a +validator. + +Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything. +If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was +an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs +fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often +postponed for very lengthy amounts of time most likely due to simply not checking their stake. **This forced +unsubscribing of nominators was later disabled.** + +Automatic chilling was achieving its goals in ideal scenarios (no attackers, no lazy nominators) but it opened new +vulnerabilities for attackers. The biggest issue was that chilling in case of honest node slashes could lead to honest +validators being quickly pushed out of the next validator set within the next era. This retains the validator set size +but gives an edge to attackers as they can more easily win slots in the NPoS election. + +Disabling allows for punishment that limits the damages malicious actors can cause without having to resort to kicking +them out of the validator set. This protects us from the edge case of honest validators getting quickly pushed out of +the set by slashes. ([**Point 6.**](#system-overview)) > **Notes:** \ -> As long as honest slashes absolutely cannot occur automatic chilling is a sensible and desirable. This means it could be re-enabled once PolkaVM introduces deterministic gas metering. Then best of both worlds could be achieved. +> As long as honest slashes absolutely cannot occur automatic chilling is a sensible and desirable. This means it could +> be re-enabled once PolkaVM introduces deterministic gas metering. Then best of both worlds could be achieved. ## Forcing New Era -Previous implementation of disabling had some limited mechanisms allowing for validators disablement and if too many were disabled forcing a new era (new election). Frame staking pallet offered the ability to force a new era but it was also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the randomness used throughout the protocol. +Previous implementation of disabling had some limited mechanisms allowing for validators disablement and if too many +were disabled forcing a new era (new election). Frame staking pallet offered the ability to force a new era but it was +also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the +randomness used throughout the protocol.


# Other types of slashing -Above slashes were specifically referring to slashing events coming from disputes against candidates, but in Polkadot other types of offenses exist for example GRANDPA equivocations or block authoring offenses. Question is if the above defined design can handle those offenses. +Above slashes were specifically referring to slashing events coming from disputes against candidates, but in Polkadot +other types of offenses exist for example GRANDPA equivocations or block authoring offenses. Question is if the above +defined design can handle those offenses. ## GRANDPA/BEEFY Offenses -The main offences for GRANDPA/BEEFY are equivocations. It is not a very serious offense and some nodes committing do not endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a catastrophic failure potentially resulting in 2 finalized blocks on the same height in the case of GRANDPA. +The main offences for GRANDPA/BEEFY are equivocations. It is not a very serious offense and some nodes committing do not +endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a +catastrophic failure potentially resulting in 2 finalized blocks on the same height in the case of GRANDPA. Honest nodes generally should not commit those offenses so the goal of protecting them does not apply here. > **Note:** \ -> A validator running multiple nodes with the same identity might equivocate. Doing that is highly not advised but it has happened before. +> A validator running multiple nodes with the same identity might equivocate. Doing that is highly not advised but it +> has happened before. -It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might compromise liveness. +It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a +supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might +compromise liveness. -Best approach is to allow disabled nodes to participate in GRANDPA/BEEFY as normal and as mentioned before GRANDPA/BABE/BEEFY equivocations should not happen to honest nodes so we can safely disable the offenders. Additionally the slashes for singular equivocations will be very low so those offenders would easily get re-enabled in the case of more serious offenders showing up. ([**Point 8.**](#system-overview)) +Best approach is to allow disabled nodes to participate in GRANDPA/BEEFY as normal and as mentioned before +GRANDPA/BABE/BEEFY equivocations should not happen to honest nodes so we can safely disable the offenders. Additionally +the slashes for singular equivocations will be very low so those offenders would easily get re-enabled in the case of +more serious offenders showing up. ([**Point 8.**](#system-overview)) ## Block Authoring Offenses (BABE Equivocations) -Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad blocks but they only get to do it in bounded amounts. +Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks +produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad +blocks but they only get to do it in bounded amounts. -Disabling in BA is not a requirement as both liveness and soundness are preserved but it is the current default behavior as well as it offers a bit less wasted work. +Disabling in BA is not a requirement as both liveness and soundness are preserved but it is the current default behavior +as well as it offers a bit less wasted work. -Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be trusted with it's PVF anymore. +Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and +disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be +trusted with it's PVF anymore. -Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because of offenses in BA. ([**Point 9.**](#system-overview)) This filters out honest faulty nodes as well as protects from some attackers. +Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because +of offenses in BA. ([**Point 9.**](#system-overview)) This filters out honest faulty nodes as well as protects from some +attackers.


@@ -215,30 +324,64 @@ Both points above don't present significant risks when disabling so the default ## Disabling vs Accumulating Slashes -Instant disabling generally allows us to remove the need for accumulating slashes. It is a more immediate punishment and it is a more lenient punishment for honest nodes. +Instant disabling generally allows us to remove the need for accumulating slashes. It is a more immediate punishment and +it is a more lenient punishment for honest nodes. -The current architecture of using max slashing can be used and it works around the problems of delaying the slash for a long period. +The current architecture of using max slashing can be used and it works around the problems of delaying the slash for a +long period. -An alternative design with immediate slashing and acclimating slashing could relevant to other systems but it goes against the governance auditing mechanisms so it's not be suitable for Polkadot. +An alternative design with immediate slashing and acclimating slashing could relevant to other systems but it goes +against the governance auditing mechanisms so it's not be suitable for Polkadot. ## Disabling vs Getting Pushed Out of NPoS Elections -Validator disabling and getting forced ouf of NPoS elections (1 era) due to slashes are actually very similar processes in terms of outcomes but there are some differences: +Validator disabling and getting forced ouf of NPoS elections (1 era) due to slashes are actually very similar processes +in terms of outcomes but there are some differences: - **latency** (next few blocks for validator disabling and 27 days for getting pushed out organically) -- **pool restriction** (validator disabling could effectively lower the number of active validators during an era if we fully disable) +- **pool restriction** (validator disabling could effectively lower the number of active validators during an era if we + fully disable) - **granularity** (validator disabling could remove only a portion of validator privileges instead of all) Granularity is particularly crucial in the final design as only a few select functions are disabled while others remain. ## Enabling Approval Voter Slashes -The original Polkadot 1.0 design describes that all validators on the loosing side of the dispute are slashed. In the current system only the backers are slashed and any approval voters on the wrong side will not be slashed. This creates some undesirable incentives: +The original Polkadot 1.0 design describes that all validators on the loosing side of the dispute are slashed. In the +current system only the backers are slashed and any approval voters on the wrong side will not be slashed. This creates +some undesirable incentives: - Lazy approval checkers (approvals yay`ing everything) - Spammy approval checkers (approval voters nay`ing everything) -Initially those slashes were disabled to reduce the complexity and to minimize the risk surface in case the system malfunctioned. This is especially risky in case any nondeterministic bugs are present in the system. Once validator re-enabling is launched approval voter slashes can be re-instated. Numbers need to be further explored but slashes between 0-2% are reasonable. 0% would still disable which with the opportunity cost consideration should be enough. +Initially those slashes were disabled to reduce the complexity and to minimize the risk surface in case the system +malfunctioned. This is especially risky in case any nondeterministic bugs are present in the system. Once validator +re-enabling is launched approval voter slashes can be re-instated. Numbers need to be further explored but slashes +between 0-2% are reasonable. 0% would still disable which with the opportunity cost consideration should be enough. + + > **Note:** \ +> Spammy approval checkers are in fact not a big issue as a side effect of the offchain-disabling introduced by the Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225). It makes it so all validators loosing a dispute are locally disabled and ignored for dispute initiation so it effectively silences spammers. They can still no-show but the damage is minimized. + + +## Interaction with all types of misbehaviors + +With re-enabling in place and potentially approval voter slashes enabled the overall misbehaviour-punishment system can be as highlighted in the table below: + +| Misbehaviour | Slash % | Onchain Disabling | Offchain Disabling | Chilling | Reputation Costs | +| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | --- | +| Backing Invalid | 100% | Yes (High Prio) | Yes (High Prio) | No | No | +| ForInvalid Vote | 2% | Yes (Mid Prio) | Yes (Mid Prio) | No | No | +| AgainstValid Vote | 0% | Yes (Low Prio) | Yes (Low Prio) | No | No | +| GRANDPA / BABE / BEEFY Equivocations | 0.01-100% | Yes (Varying Prio) | No | No | No | +| Seconded + Valid Equivocation | - | No | No | No | No | +| Double Seconded Equivocation | - | No | No | No | Yes | + + +*Ignoring AURA offences. + +**There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy. + +*** BEEFY will soon introduce new slash types so this strategy table will need to be revised but no major changes are expected.


@@ -249,16 +392,15 @@ Implementation of the above design covers a few additional areas that allow for ## Core Features 1. Disabled Validators Tracking (**Runtime**) [#2950](https://github.com/paritytech/polkadot-sdk/issues/2950) - - Add and expose a ``disabled_validators`` map through a Runtime API - - Add new disabled validators when they get slashed + - Expose a ``disabled_validators`` map through a Runtime API 1. Enforce Backing Disabling (**Runtime**) [#1592](https://github.com/paritytech/polkadot-sdk/issues/1592) - Filter out votes from ``disabled_validators`` in ``BackedCandidates`` in ``process_inherent_data`` -1. Substrate Byzantine Threshold (BZT) as Limit for Disabling [#1963](https://github.com/paritytech/polkadot-sdk/issues/1963) +1. Substrate Byzantine Threshold (BZT) as Limit for Disabling + [#1963](https://github.com/paritytech/polkadot-sdk/issues/1963) - Can be parametrized but default to BZT - Disable only up to 1/3 of validators -1. Set Disabling Duration to 1 Era [#1966](https://github.com/paritytech/polkadot-sdk/issues/1966) - - Clear ``disabled_validators`` on era change -1. Respect Disabling in Backing Statement Distribution (**Node**) [#1591](https://github.com/paritytech/polkadot-sdk/issues/1951) +1. Respect Disabling in Backing Statement Distribution (**Node**) + [#1591](https://github.com/paritytech/polkadot-sdk/issues/1951) - This is an optimization as in the end it would get filtered in the runtime anyway - Filter out backing statements coming from ``disabled_validators`` 1. Respect Disablement in Backing (**Node**) [#2951](https://github.com/paritytech/polkadot-sdk/issues/2951) @@ -266,10 +408,13 @@ Implementation of the above design covers a few additional areas that allow for - Don't start backing new candidates when disabled - Don't react to backing requests when disabled 1. Stop Automatic Chilling of Offenders [#1962](https://github.com/paritytech/polkadot-sdk/issues/1962) - - Chilling still persists as a state but is no longer automatic applied on offenses + - Chilling still persists as a state but is no longer automatically applied on offenses 1. Respect Disabling in Dispute Participation (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225) - Receive dispute statements from ``disabled_validators`` but do not release own statements - Ensure dispute confirmation when BZT statements from disabled +1. Remove Liveness Slashes [#1964](https://github.com/paritytech/polkadot-sdk/issues/1964) + - Remove liveness slashes from the system + - The are other incentives to be online and they could be abused to attack the system 1. Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225) - This is needed because runtime cannot disable validators which it no longer knows about - Add a node-side parallel store of ``disabled_validators`` @@ -277,7 +422,14 @@ Implementation of the above design covers a few additional areas that allow for - Runtime ``disabled_validators`` always have priority over node-side ``disabled_validators`` - Respect the BZT threshold > **Note:** \ - > An alternative design here was considered where instead of tracking new incoming leaves a relay parent is used. This would guarantee determinism as different nodes can see different leaves, but this approach was leaving too wide of a window because of Async-Backing. Relay Parent could have been significantly in the past and it would give a lot of time for past session disputes to be spammed. + > An alternative design here was considered where instead of tracking new incoming leaves a relay parent is used. + > This would guarantee determinism as different nodes can see different leaves, but this approach was leaving too + > wide of a window because of Async-Backing. Relay Parent could have been significantly in the past and it would + > give a lot of time for past session disputes to be spammed. +1. Do not block finality for "disabled" disputes [#3358](https://github.com/paritytech/polkadot-sdk/pull/3358) + - Emergency fix to not block finality for disputes initiated only by disabled validators 1. Re-enable small offender when approaching BZT (**Runtime**) #TODO - - When BZT limit is reached and there are more offenders to be disabled re-enable the smallest offenders to disable the biggest ones + - When BZT limit is reached and there are more offenders to be disabled re-enable the smallest offenders to disable + the biggest ones + diff --git a/polkadot/runtime/parachains/src/assigner_coretime/mod.rs b/polkadot/runtime/parachains/src/assigner_coretime/mod.rs new file mode 100644 index 000000000000..15701a783354 --- /dev/null +++ b/polkadot/runtime/parachains/src/assigner_coretime/mod.rs @@ -0,0 +1,488 @@ +// Copyright (C) Parity Technologies (UK) Ltd. +// This file is part of Polkadot. + +// Polkadot is free software: you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. + +// Polkadot is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License +// along with Polkadot. If not, see . + +//! The parachain coretime assignment module. +//! +//! Handles scheduling of assignments coming from the coretime/broker chain. For on-demand +//! assignments it relies on the separate on-demand assignment provider, where it forwards requests +//! to. +//! +//! `CoreDescriptor` contains pointers to the beginning and the end of a list of schedules, together +//! with the currently active assignments. + +mod mock_helpers; +#[cfg(test)] +mod tests; + +use crate::{ + assigner_on_demand, configuration, + paras::AssignCoretime, + scheduler::common::{Assignment, AssignmentProvider}, + ParaId, +}; + +use frame_support::{defensive, pallet_prelude::*}; +use frame_system::pallet_prelude::*; +use pallet_broker::CoreAssignment; +use primitives::CoreIndex; +use sp_runtime::traits::{One, Saturating}; + +use sp_std::prelude::*; + +pub use pallet::*; + +/// Fraction expressed as a numerator with an assumed denominator of 57,600. +#[derive(RuntimeDebug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Encode, Decode, TypeInfo)] +pub struct PartsOf57600(u16); + +impl PartsOf57600 { + pub const ZERO: Self = Self(0); + pub const FULL: Self = Self(57600); + + pub fn new_saturating(v: u16) -> Self { + Self::ZERO.saturating_add(Self(v)) + } + + pub fn is_full(&self) -> bool { + *self == Self::FULL + } + + pub fn saturating_add(self, rhs: Self) -> Self { + let inner = self.0.saturating_add(rhs.0); + if inner > 57600 { + Self(57600) + } else { + Self(inner) + } + } + + pub fn saturating_sub(self, rhs: Self) -> Self { + Self(self.0.saturating_sub(rhs.0)) + } + + pub fn checked_add(self, rhs: Self) -> Option { + let inner = self.0.saturating_add(rhs.0); + if inner > 57600 { + None + } else { + Some(Self(inner)) + } + } +} + +/// Assignments as they are scheduled by block number +/// +/// for a particular core. +#[derive(Encode, Decode, TypeInfo)] +#[cfg_attr(test, derive(PartialEq, RuntimeDebug))] +struct Schedule { + // Original assignments + assignments: Vec<(CoreAssignment, PartsOf57600)>, + /// When do our assignments become invalid, if at all? + /// + /// If this is `Some`, then this `CoreState` will be dropped at that block number. If this is + /// `None`, then we will keep serving our core assignments in a circle until a new set of + /// assignments is scheduled. + end_hint: Option, + + /// The next queued schedule for this core. + /// + /// Schedules are forming a queue. + next_schedule: Option, +} + +/// Descriptor for a core. +/// +/// Contains pointers to first and last schedule into `CoreSchedules` for that core and keeps track +/// of the currently active work as well. +#[derive(Encode, Decode, TypeInfo, Default)] +#[cfg_attr(test, derive(PartialEq, RuntimeDebug, Clone))] +struct CoreDescriptor { + /// Meta data about the queued schedules for this core. + queue: Option>, + /// Currently performed work. + current_work: Option>, +} + +/// Pointers into `CoreSchedules` for a particular core. +/// +/// Schedules in `CoreSchedules` form a queue. `Schedule::next_schedule` always pointing to the next +/// item. +#[derive(Encode, Decode, TypeInfo, Copy, Clone)] +#[cfg_attr(test, derive(PartialEq, RuntimeDebug))] +struct QueueDescriptor { + /// First scheduled item, that is not yet active. + first: N, + /// Last scheduled item. + last: N, +} + +#[derive(Encode, Decode, TypeInfo)] +#[cfg_attr(test, derive(PartialEq, RuntimeDebug, Clone))] +struct WorkState { + /// Assignments with current state. + /// + /// Assignments and book keeping on how much has been served already. We keep track of serviced + /// assignments in order to adhere to the specified ratios. + assignments: Vec<(CoreAssignment, AssignmentState)>, + /// When do our assignments become invalid if at all? + /// + /// If this is `Some`, then this `CoreState` will be dropped at that block number. If this is + /// `None`, then we will keep serving our core assignments in a circle until a new set of + /// assignments is scheduled. + end_hint: Option, + /// Position in the assignments we are currently in. + /// + /// Aka which core assignment will be popped next on + /// `AssignmentProvider::pop_assignment_for_core`. + pos: u16, + /// Step width + /// + /// How much we subtract from `AssignmentState::remaining` for a core served. + step: PartsOf57600, +} + +#[derive(Encode, Decode, TypeInfo)] +#[cfg_attr(test, derive(PartialEq, RuntimeDebug, Clone, Copy))] +struct AssignmentState { + /// Ratio of the core this assignment has. + /// + /// As initially received via `assign_core`. + ratio: PartsOf57600, + /// How many parts are remaining in this round? + /// + /// At the end of each round (in preparation for the next), ratio will be added to remaining. + /// Then every time we get scheduled we subtract a core worth of points. Once we reach 0 or a + /// number lower than what a core is worth (`CoreState::step` size), we move on to the next + /// item in the `Vec`. + /// + /// The first round starts with remaining = ratio. + remaining: PartsOf57600, +} + +impl From> for WorkState { + fn from(schedule: Schedule) -> Self { + let Schedule { assignments, end_hint, next_schedule: _ } = schedule; + let step = + if let Some(min_step_assignment) = assignments.iter().min_by(|a, b| a.1.cmp(&b.1)) { + min_step_assignment.1 + } else { + // Assignments empty, should not exist. In any case step size does not matter here: + log::debug!("assignments of a `Schedule` should never be empty."); + PartsOf57600(1) + }; + let assignments = assignments + .into_iter() + .map(|(a, ratio)| (a, AssignmentState { ratio, remaining: ratio })) + .collect(); + + Self { assignments, end_hint, pos: 0, step } + } +} + +#[frame_support::pallet] +pub mod pallet { + use super::*; + + #[pallet::pallet] + #[pallet::without_storage_info] + pub struct Pallet(_); + + #[pallet::config] + pub trait Config: + frame_system::Config + configuration::Config + assigner_on_demand::Config + { + } + + /// Scheduled assignment sets. + /// + /// Assignments as of the given block number. They will go into state once the block number is + /// reached (and replace whatever was in there before). + #[pallet::storage] + pub(super) type CoreSchedules = StorageMap< + _, + Twox256, + (BlockNumberFor, CoreIndex), + Schedule>, + OptionQuery, + >; + + /// Assignments which are currently active. + /// + /// They will be picked from `PendingAssignments` once we reach the scheduled block number in + /// `PendingAssignments`. + #[pallet::storage] + pub(super) type CoreDescriptors = StorageMap< + _, + Twox256, + CoreIndex, + CoreDescriptor>, + ValueQuery, + GetDefault, + >; + + #[pallet::hooks] + impl Hooks> for Pallet {} + + #[pallet::error] + pub enum Error { + AssignmentsEmpty, + /// Assignments together exceeded 57600. + OverScheduled, + /// Assignments together less than 57600 + UnderScheduled, + /// assign_core is only allowed to append new assignments at the end of already existing + /// ones. + DisallowedInsert, + /// Tried to insert a schedule for the same core and block number as an existing schedule + DuplicateInsert, + /// Tried to add an unsorted set of assignments + AssignmentsNotSorted, + } +} + +impl AssignmentProvider> for Pallet { + fn pop_assignment_for_core(core_idx: CoreIndex) -> Option { + let now = frame_system::Pallet::::block_number(); + + CoreDescriptors::::mutate(core_idx, |core_state| { + Self::ensure_workload(now, core_idx, core_state); + + let work_state = core_state.current_work.as_mut()?; + + // Wrap around: + work_state.pos = work_state.pos % work_state.assignments.len() as u16; + let (a_type, a_state) = &mut work_state + .assignments + .get_mut(work_state.pos as usize) + .expect("We limited pos to the size of the vec one line above. qed"); + + // advance for next pop: + a_state.remaining = a_state.remaining.saturating_sub(work_state.step); + if a_state.remaining < work_state.step { + // Assignment exhausted, need to move to the next and credit remaining for + // next round. + work_state.pos += 1; + // Reset to ratio + still remaining "credits": + a_state.remaining = a_state.remaining.saturating_add(a_state.ratio); + } + + match a_type { + CoreAssignment::Idle => None, + CoreAssignment::Pool => + assigner_on_demand::Pallet::::pop_assignment_for_core(core_idx), + CoreAssignment::Task(para_id) => Some(Assignment::Bulk((*para_id).into())), + } + }) + } + + fn report_processed(assignment: Assignment) { + match assignment { + Assignment::Pool { para_id, core_index } => + assigner_on_demand::Pallet::::report_processed(para_id, core_index), + Assignment::Bulk(_) => {}, + } + } + + /// Push an assignment back to the front of the queue. + /// + /// The assignment has not been processed yet. Typically used on session boundaries. + /// Parameters: + /// - `assignment`: The on demand assignment. + fn push_back_assignment(assignment: Assignment) { + match assignment { + Assignment::Pool { para_id, core_index } => + assigner_on_demand::Pallet::::push_back_assignment(para_id, core_index), + Assignment::Bulk(_) => { + // Session changes are rough. We just drop assignments that did not make it on a + // session boundary. This seems sensible as bulk is region based. Meaning, even if + // we made the effort catching up on those dropped assignments, this would very + // likely lead to other assignments not getting served at the "end" (when our + // assignment set gets replaced). + }, + } + } + + #[cfg(any(feature = "runtime-benchmarks", test))] + fn get_mock_assignment(_: CoreIndex, para_id: primitives::Id) -> Assignment { + // Given that we are not tracking anything in `Bulk` assignments, it is safe to always + // return a bulk assignment. + Assignment::Bulk(para_id) + } + + fn session_core_count() -> u32 { + let config = configuration::ActiveConfig::::get(); + config.scheduler_params.num_cores + } +} + +impl Pallet { + /// Ensure given workload for core is up to date. + fn ensure_workload( + now: BlockNumberFor, + core_idx: CoreIndex, + descriptor: &mut CoreDescriptor>, + ) { + // Workload expired? + if descriptor + .current_work + .as_ref() + .and_then(|w| w.end_hint) + .map_or(false, |e| e <= now) + { + descriptor.current_work = None; + } + + let Some(queue) = descriptor.queue else { + // No queue. + return + }; + + let mut next_scheduled = queue.first; + + if next_scheduled > now { + // Not yet ready. + return + } + + // Update is needed: + let update = loop { + let Some(update) = CoreSchedules::::take((next_scheduled, core_idx)) else { + break None + }; + // Still good? + if update.end_hint.map_or(true, |e| e > now) { + break Some(update) + } + // Move on if possible: + if let Some(n) = update.next_schedule { + next_scheduled = n; + } else { + break None + } + }; + + let new_first = update.as_ref().and_then(|u| u.next_schedule); + descriptor.current_work = update.map(Into::into); + + descriptor.queue = new_first.map(|new_first| { + QueueDescriptor { + first: new_first, + // `last` stays unaffected, if not empty: + last: queue.last, + } + }); + } + + /// Append another assignment for a core. + /// + /// Important only appending is allowed. Meaning, all already existing assignments must have a + /// begin smaller than the one passed here. This restriction exists, because it makes the + /// insertion O(1) and the author could not think of a reason, why this restriction should be + /// causing any problems. Inserting arbitrarily causes a `DispatchError::DisallowedInsert` + /// error. This restriction could easily be lifted if need be and in fact an implementation is + /// available + /// [here](https://github.com/paritytech/polkadot-sdk/pull/1694/commits/c0c23b01fd2830910cde92c11960dad12cdff398#diff-0c85a46e448de79a5452395829986ee8747e17a857c27ab624304987d2dde8baR386). + /// The problem is that insertion complexity then depends on the size of the existing queue, + /// which makes determining weights hard and could lead to issues like overweight blocks (at + /// least in theory). + pub fn assign_core( + core_idx: CoreIndex, + begin: BlockNumberFor, + assignments: Vec<(CoreAssignment, PartsOf57600)>, + end_hint: Option>, + ) -> Result<(), DispatchError> { + // There should be at least one assignment. + ensure!(!assignments.is_empty(), Error::::AssignmentsEmpty); + + // Checking for sort and unique manually, since we don't have access to iterator tools. + // This way of checking uniqueness only works since we also check sortedness. + assignments.iter().map(|x| &x.0).try_fold(None, |prev, cur| { + if prev.map_or(false, |p| p >= cur) { + Err(Error::::AssignmentsNotSorted) + } else { + Ok(Some(cur)) + } + })?; + + // Check that the total parts between all assignments are equal to 57600 + let parts_sum = assignments + .iter() + .map(|assignment| assignment.1) + .try_fold(PartsOf57600::ZERO, |sum, parts| { + sum.checked_add(parts).ok_or(Error::::OverScheduled) + })?; + ensure!(parts_sum.is_full(), Error::::UnderScheduled); + + CoreDescriptors::::mutate(core_idx, |core_descriptor| { + let new_queue = match core_descriptor.queue { + Some(queue) => { + ensure!(begin > queue.last, Error::::DisallowedInsert); + + CoreSchedules::::try_mutate((queue.last, core_idx), |schedule| { + if let Some(schedule) = schedule.as_mut() { + debug_assert!(schedule.next_schedule.is_none(), "queue.end was supposed to be the end, so the next item must be `None`!"); + schedule.next_schedule = Some(begin); + } else { + defensive!("Queue end entry does not exist?"); + } + CoreSchedules::::try_mutate((begin, core_idx), |schedule| { + // It should already be impossible to overwrite an existing schedule due + // to strictly increasing block number. But we check here for safety and + // in case the design changes. + ensure!(schedule.is_none(), Error::::DuplicateInsert); + *schedule = + Some(Schedule { assignments, end_hint, next_schedule: None }); + Ok::<(), DispatchError>(()) + })?; + Ok::<(), DispatchError>(()) + })?; + + QueueDescriptor { first: queue.first, last: begin } + }, + None => { + // Queue empty, just insert: + CoreSchedules::::insert( + (begin, core_idx), + Schedule { assignments, end_hint, next_schedule: None }, + ); + QueueDescriptor { first: begin, last: begin } + }, + }; + core_descriptor.queue = Some(new_queue); + Ok(()) + }) + } +} + +impl AssignCoretime for Pallet { + fn assign_coretime(id: ParaId) -> DispatchResult { + let current_block = frame_system::Pallet::::block_number(); + + // Add a new core and assign the para to it. + let mut config = configuration::ActiveConfig::::get(); + let core = config.scheduler_params.num_cores; + config.scheduler_params.num_cores.saturating_inc(); + + // `assign_coretime` is only called at genesis or by root, so setting the active + // config here is fine. + configuration::Pallet::::force_set_active_config(config); + + let begin = current_block + One::one(); + let assignment = vec![(pallet_broker::CoreAssignment::Task(id.into()), PartsOf57600::FULL)]; + Pallet::::assign_core(CoreIndex(core), begin, assignment, None) + } +} From 82b48c441dfe9f0a95947530363fe9aca6a1f865 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Fri, 10 May 2024 12:43:47 +0100 Subject: [PATCH 15/18] reverting commit mistake (accidental code change) --- .../parachains/src/assigner_on_demand/mod.rs | 613 ------------------ 1 file changed, 613 deletions(-) delete mode 100644 polkadot/runtime/parachains/src/assigner_on_demand/mod.rs diff --git a/polkadot/runtime/parachains/src/assigner_on_demand/mod.rs b/polkadot/runtime/parachains/src/assigner_on_demand/mod.rs deleted file mode 100644 index 75c29bd6fbe4..000000000000 --- a/polkadot/runtime/parachains/src/assigner_on_demand/mod.rs +++ /dev/null @@ -1,613 +0,0 @@ -// Copyright (C) Parity Technologies (UK) Ltd. -// This file is part of Polkadot. - -// Polkadot is free software: you can redistribute it and/or modify -// it under the terms of the GNU General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. - -// Polkadot is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU General Public License for more details. - -// You should have received a copy of the GNU General Public License -// along with Polkadot. If not, see . - -//! The parachain on demand assignment module. -//! -//! Implements a mechanism for taking in orders for pay as you go (PAYG) or on demand -//! parachain (previously parathreads) assignments. This module is not handled by the -//! initializer but is instead instantiated in the `construct_runtime` macro. -//! -//! The module currently limits parallel execution of blocks from the same `ParaId` via -//! a core affinity mechanism. As long as there exists an affinity for a `CoreIndex` for -//! a specific `ParaId`, orders for blockspace for that `ParaId` will only be assigned to -//! that `CoreIndex`. This affinity mechanism can be removed if it can be shown that parallel -//! execution is valid. - -mod benchmarking; -mod mock_helpers; - -#[cfg(test)] -mod tests; - -use crate::{ - configuration, paras, - scheduler::common::{Assignment, AssignmentProvider, AssignmentProviderConfig}, -}; - -use frame_support::{ - pallet_prelude::*, - traits::{ - Currency, - ExistenceRequirement::{self, AllowDeath, KeepAlive}, - WithdrawReasons, - }, -}; -use frame_system::pallet_prelude::*; -use primitives::{CoreIndex, Id as ParaId}; -use sp_runtime::{ - traits::{One, SaturatedConversion}, - FixedPointNumber, FixedPointOperand, FixedU128, Perbill, Saturating, -}; - -use sp_std::{collections::vec_deque::VecDeque, prelude::*}; - -const LOG_TARGET: &str = "runtime::parachains::assigner-on-demand"; - -pub use pallet::*; - -pub trait WeightInfo { - fn place_order_allow_death(s: u32) -> Weight; - fn place_order_keep_alive(s: u32) -> Weight; -} - -/// A weight info that is only suitable for testing. -pub struct TestWeightInfo; - -impl WeightInfo for TestWeightInfo { - fn place_order_allow_death(_: u32) -> Weight { - Weight::MAX - } - - fn place_order_keep_alive(_: u32) -> Weight { - Weight::MAX - } -} - -/// Keeps track of how many assignments a scheduler currently has at a specific `CoreIndex` for a -/// specific `ParaId`. -#[derive(Encode, Decode, Default, Clone, Copy, TypeInfo)] -#[cfg_attr(test, derive(PartialEq, Debug))] -pub struct CoreAffinityCount { - core_idx: CoreIndex, - count: u32, -} - -/// An indicator as to which end of the `OnDemandQueue` an assignment will be placed. -pub enum QueuePushDirection { - Back, - Front, -} - -/// Shorthand for the Balance type the runtime is using. -type BalanceOf = - <::Currency as Currency<::AccountId>>::Balance; - -/// Errors that can happen during spot traffic calculation. -#[derive(PartialEq)] -#[cfg_attr(feature = "std", derive(Debug))] -pub enum SpotTrafficCalculationErr { - /// The order queue capacity is at 0. - QueueCapacityIsZero, - /// The queue size is larger than the queue capacity. - QueueSizeLargerThanCapacity, - /// Arithmetic error during division, either division by 0 or over/underflow. - Division, -} - -#[frame_support::pallet] -pub mod pallet { - - use super::*; - - #[pallet::pallet] - #[pallet::without_storage_info] - pub struct Pallet(_); - - #[pallet::config] - pub trait Config: frame_system::Config + configuration::Config + paras::Config { - /// The runtime's definition of an event. - type RuntimeEvent: From> + IsType<::RuntimeEvent>; - - /// The runtime's definition of a Currency. - type Currency: Currency; - - /// Something that provides the weight of this pallet. - type WeightInfo: WeightInfo; - - /// The default value for the spot traffic multiplier. - #[pallet::constant] - type TrafficDefaultValue: Get; - } - - /// Creates an empty spot traffic value if one isn't present in storage already. - #[pallet::type_value] - pub fn SpotTrafficOnEmpty() -> FixedU128 { - T::TrafficDefaultValue::get() - } - - /// Creates an empty on demand queue if one isn't present in storage already. - #[pallet::type_value] - pub fn OnDemandQueueOnEmpty() -> VecDeque { - VecDeque::new() - } - - /// Keeps track of the multiplier used to calculate the current spot price for the on demand - /// assigner. - #[pallet::storage] - pub(super) type SpotTraffic = - StorageValue<_, FixedU128, ValueQuery, SpotTrafficOnEmpty>; - - /// The order storage entry. Uses a VecDeque to be able to push to the front of the - /// queue from the scheduler on session boundaries. - #[pallet::storage] - pub type OnDemandQueue = - StorageValue<_, VecDeque, ValueQuery, OnDemandQueueOnEmpty>; - - /// Maps a `ParaId` to `CoreIndex` and keeps track of how many assignments the scheduler has in - /// it's lookahead. Keeping track of this affinity prevents parallel execution of the same - /// `ParaId` on two or more `CoreIndex`es. - #[pallet::storage] - pub(super) type ParaIdAffinity = - StorageMap<_, Twox256, ParaId, CoreAffinityCount, OptionQuery>; - - #[pallet::event] - #[pallet::generate_deposit(pub(super) fn deposit_event)] - pub enum Event { - /// An order was placed at some spot price amount. - OnDemandOrderPlaced { para_id: ParaId, spot_price: BalanceOf }, - /// The value of the spot traffic multiplier changed. - SpotTrafficSet { traffic: FixedU128 }, - } - - #[pallet::error] - pub enum Error { - /// The `ParaId` supplied to the `place_order` call is not a valid `ParaThread`, making the - /// call is invalid. - InvalidParaId, - /// The order queue is full, `place_order` will not continue. - QueueFull, - /// The current spot price is higher than the max amount specified in the `place_order` - /// call, making it invalid. - SpotPriceHigherThanMaxAmount, - /// There are no on demand cores available. `place_order` will not add anything to the - /// queue. - NoOnDemandCores, - } - - #[pallet::hooks] - impl Hooks> for Pallet { - fn on_initialize(_now: BlockNumberFor) -> Weight { - let config = >::config(); - // Calculate spot price multiplier and store it. - let old_traffic = SpotTraffic::::get(); - match Self::calculate_spot_traffic( - old_traffic, - config.on_demand_queue_max_size, - Self::queue_size(), - config.on_demand_target_queue_utilization, - config.on_demand_fee_variability, - ) { - Ok(new_traffic) => { - // Only update storage on change - if new_traffic != old_traffic { - SpotTraffic::::set(new_traffic); - Pallet::::deposit_event(Event::::SpotTrafficSet { - traffic: new_traffic, - }); - return T::DbWeight::get().reads_writes(2, 1) - } - }, - Err(SpotTrafficCalculationErr::QueueCapacityIsZero) => { - log::debug!( - target: LOG_TARGET, - "Error calculating spot traffic: The order queue capacity is at 0." - ); - }, - Err(SpotTrafficCalculationErr::QueueSizeLargerThanCapacity) => { - log::debug!( - target: LOG_TARGET, - "Error calculating spot traffic: The queue size is larger than the queue capacity." - ); - }, - Err(SpotTrafficCalculationErr::Division) => { - log::debug!( - target: LOG_TARGET, - "Error calculating spot traffic: Arithmetic error during division, either division by 0 or over/underflow." - ); - }, - }; - T::DbWeight::get().reads_writes(2, 0) - } - } - - #[pallet::call] - impl Pallet { - /// Create a single on demand core order. - /// Will use the spot price for the current block and will reap the account if needed. - /// - /// Parameters: - /// - `origin`: The sender of the call, funds will be withdrawn from this account. - /// - `max_amount`: The maximum balance to withdraw from the origin to place an order. - /// - `para_id`: A `ParaId` the origin wants to provide blockspace for. - /// - /// Errors: - /// - `InsufficientBalance`: from the Currency implementation - /// - `InvalidParaId` - /// - `QueueFull` - /// - `SpotPriceHigherThanMaxAmount` - /// - `NoOnDemandCores` - /// - /// Events: - /// - `SpotOrderPlaced` - #[pallet::call_index(0)] - #[pallet::weight(::WeightInfo::place_order_allow_death(OnDemandQueue::::get().len() as u32))] - pub fn place_order_allow_death( - origin: OriginFor, - max_amount: BalanceOf, - para_id: ParaId, - ) -> DispatchResult { - let sender = ensure_signed(origin)?; - Pallet::::do_place_order(sender, max_amount, para_id, AllowDeath) - } - - /// Same as the [`place_order_allow_death`](Self::place_order_allow_death) call , but with a - /// check that placing the order will not reap the account. - /// - /// Parameters: - /// - `origin`: The sender of the call, funds will be withdrawn from this account. - /// - `max_amount`: The maximum balance to withdraw from the origin to place an order. - /// - `para_id`: A `ParaId` the origin wants to provide blockspace for. - /// - /// Errors: - /// - `InsufficientBalance`: from the Currency implementation - /// - `InvalidParaId` - /// - `QueueFull` - /// - `SpotPriceHigherThanMaxAmount` - /// - `NoOnDemandCores` - /// - /// Events: - /// - `SpotOrderPlaced` - #[pallet::call_index(1)] - #[pallet::weight(::WeightInfo::place_order_keep_alive(OnDemandQueue::::get().len() as u32))] - pub fn place_order_keep_alive( - origin: OriginFor, - max_amount: BalanceOf, - para_id: ParaId, - ) -> DispatchResult { - let sender = ensure_signed(origin)?; - Pallet::::do_place_order(sender, max_amount, para_id, KeepAlive) - } - } -} - -impl Pallet -where - BalanceOf: FixedPointOperand, -{ - /// Helper function for `place_order_*` calls. Used to differentiate between placing orders - /// with a keep alive check or to allow the account to be reaped. - /// - /// Parameters: - /// - `sender`: The sender of the call, funds will be withdrawn from this account. - /// - `max_amount`: The maximum balance to withdraw from the origin to place an order. - /// - `para_id`: A `ParaId` the origin wants to provide blockspace for. - /// - `existence_requirement`: Whether or not to ensure that the account will not be reaped. - /// - /// Errors: - /// - `InsufficientBalance`: from the Currency implementation - /// - `InvalidParaId` - /// - `QueueFull` - /// - `SpotPriceHigherThanMaxAmount` - /// - `NoOnDemandCores` - /// - /// Events: - /// - `SpotOrderPlaced` - fn do_place_order( - sender: ::AccountId, - max_amount: BalanceOf, - para_id: ParaId, - existence_requirement: ExistenceRequirement, - ) -> DispatchResult { - let config = >::config(); - - // Are there any schedulable cores in this session - ensure!(config.on_demand_cores > 0, Error::::NoOnDemandCores); - - // Traffic always falls back to 1.0 - let traffic = SpotTraffic::::get(); - - // Calculate spot price - let spot_price: BalanceOf = - traffic.saturating_mul_int(config.on_demand_base_fee.saturated_into::>()); - - // Is the current price higher than `max_amount` - ensure!(spot_price.le(&max_amount), Error::::SpotPriceHigherThanMaxAmount); - - // Charge the sending account the spot price - T::Currency::withdraw(&sender, spot_price, WithdrawReasons::FEE, existence_requirement)?; - - let assignment = Assignment::new(para_id); - - let res = Pallet::::add_on_demand_assignment(assignment, QueuePushDirection::Back); - - match res { - Ok(_) => { - Pallet::::deposit_event(Event::::OnDemandOrderPlaced { para_id, spot_price }); - return Ok(()) - }, - Err(err) => return Err(err), - } - } - - /// The spot price multiplier. This is based on the transaction fee calculations defined in: - /// https://research.web3.foundation/Polkadot/overview/token-economics#setting-transaction-fees - /// - /// Parameters: - /// - `traffic`: The previously calculated multiplier, can never go below 1.0. - /// - `queue_capacity`: The max size of the order book. - /// - `queue_size`: How many orders are currently in the order book. - /// - `target_queue_utilisation`: How much of the queue_capacity should be ideally occupied, - /// expressed in percentages(perbill). - /// - `variability`: A variability factor, i.e. how quickly the spot price adjusts. This number - /// can be chosen by p/(k*(1-s)) where p is the desired ratio increase in spot price over k - /// number of blocks. s is the target_queue_utilisation. A concrete example: v = - /// 0.05/(20*(1-0.25)) = 0.0033. - /// - /// Returns: - /// - A `FixedU128` in the range of `Config::TrafficDefaultValue` - `FixedU128::MAX` on - /// success. - /// - /// Errors: - /// - `SpotTrafficCalculationErr::QueueCapacityIsZero` - /// - `SpotTrafficCalculationErr::QueueSizeLargerThanCapacity` - /// - `SpotTrafficCalculationErr::Division` - pub(crate) fn calculate_spot_traffic( - traffic: FixedU128, - queue_capacity: u32, - queue_size: u32, - target_queue_utilisation: Perbill, - variability: Perbill, - ) -> Result { - // Return early if queue has no capacity. - if queue_capacity == 0 { - return Err(SpotTrafficCalculationErr::QueueCapacityIsZero) - } - - // Return early if queue size is greater than capacity. - if queue_size > queue_capacity { - return Err(SpotTrafficCalculationErr::QueueSizeLargerThanCapacity) - } - - // (queue_size / queue_capacity) - target_queue_utilisation - let queue_util_ratio = FixedU128::from_rational(queue_size.into(), queue_capacity.into()); - let positive = queue_util_ratio >= target_queue_utilisation.into(); - let queue_util_diff = queue_util_ratio.max(target_queue_utilisation.into()) - - queue_util_ratio.min(target_queue_utilisation.into()); - - // variability * queue_util_diff - let var_times_qud = queue_util_diff.saturating_mul(variability.into()); - - // variability^2 * queue_util_diff^2 - let var_times_qud_pow = var_times_qud.saturating_mul(var_times_qud); - - // (variability^2 * queue_util_diff^2)/2 - let div_by_two: FixedU128; - match var_times_qud_pow.const_checked_div(2.into()) { - Some(dbt) => div_by_two = dbt, - None => return Err(SpotTrafficCalculationErr::Division), - } - - // traffic * (1 + queue_util_diff) + div_by_two - if positive { - let new_traffic = queue_util_diff - .saturating_add(div_by_two) - .saturating_add(One::one()) - .saturating_mul(traffic); - Ok(new_traffic.max(::TrafficDefaultValue::get())) - } else { - let new_traffic = queue_util_diff.saturating_sub(div_by_two).saturating_mul(traffic); - Ok(new_traffic.max(::TrafficDefaultValue::get())) - } - } - - /// Adds an assignment to the on demand queue. - /// - /// Paramenters: - /// - `assignment`: The on demand assignment to add to the queue. - /// - `location`: Whether to push this entry to the back or the front of the queue. Pushing an - /// entry to the front of the queue is only used when the scheduler wants to push back an - /// entry it has already popped. - /// Returns: - /// - The unit type on success. - /// - /// Errors: - /// - `InvalidParaId` - /// - `QueueFull` - pub fn add_on_demand_assignment( - assignment: Assignment, - location: QueuePushDirection, - ) -> Result<(), DispatchError> { - // Only parathreads are valid paraids for on the go parachains. - ensure!(>::is_parathread(assignment.para_id), Error::::InvalidParaId); - - let config = >::config(); - - OnDemandQueue::::try_mutate(|queue| { - // Abort transaction if queue is too large - ensure!(Self::queue_size() < config.on_demand_queue_max_size, Error::::QueueFull); - match location { - QueuePushDirection::Back => queue.push_back(assignment), - QueuePushDirection::Front => queue.push_front(assignment), - }; - Ok(()) - }) - } - - /// Get the size of the on demand queue. - /// - /// Returns: - /// - The size of the on demand queue. - fn queue_size() -> u32 { - let config = >::config(); - match OnDemandQueue::::get().len().try_into() { - Ok(size) => return size, - Err(_) => { - log::debug!( - target: LOG_TARGET, - "Failed to fetch the on demand queue size, returning the max size." - ); - return config.on_demand_queue_max_size - }, - } - } - - /// Getter for the order queue. - pub fn get_queue() -> VecDeque { - OnDemandQueue::::get() - } - - /// Getter for the affinity tracker. - pub fn get_affinity_map(para_id: ParaId) -> Option { - ParaIdAffinity::::get(para_id) - } - - /// Decreases the affinity of a `ParaId` to a specified `CoreIndex`. - /// Subtracts from the count of the `CoreAffinityCount` if an entry is found and the core_idx - /// matches. When the count reaches 0, the entry is removed. - /// A non-existant entry is a no-op. - fn decrease_affinity(para_id: ParaId, core_idx: CoreIndex) { - ParaIdAffinity::::mutate(para_id, |maybe_affinity| { - if let Some(affinity) = maybe_affinity { - if affinity.core_idx == core_idx { - let new_count = affinity.count.saturating_sub(1); - if new_count > 0 { - *maybe_affinity = Some(CoreAffinityCount { core_idx, count: new_count }); - } else { - *maybe_affinity = None; - } - } - } - }); - } - - /// Increases the affinity of a `ParaId` to a specified `CoreIndex`. - /// Adds to the count of the `CoreAffinityCount` if an entry is found and the core_idx matches. - /// A non-existant entry will be initialized with a count of 1 and uses the supplied - /// `CoreIndex`. - fn increase_affinity(para_id: ParaId, core_idx: CoreIndex) { - ParaIdAffinity::::mutate(para_id, |maybe_affinity| match maybe_affinity { - Some(affinity) => - if affinity.core_idx == core_idx { - *maybe_affinity = Some(CoreAffinityCount { - core_idx, - count: affinity.count.saturating_add(1), - }); - }, - None => { - *maybe_affinity = Some(CoreAffinityCount { core_idx, count: 1 }); - }, - }) - } -} - -impl AssignmentProvider> for Pallet { - fn session_core_count() -> u32 { - let config = >::config(); - config.on_demand_cores - } - - /// Take the next queued entry that is available for a given core index. - /// Invalidates and removes orders with a `para_id` that is not `ParaLifecycle::Parathread` - /// but only in [0..P] range slice of the order queue, where P is the element that is - /// removed from the order queue. - /// - /// Parameters: - /// - `core_idx`: The core index - /// - `previous_paraid`: Which paraid was previously processed on the requested core. Is None if - /// nothing was processed on the core. - fn pop_assignment_for_core( - core_idx: CoreIndex, - previous_para: Option, - ) -> Option { - // Only decrease the affinity of the previous para if it exists. - // A nonexistant `ParaId` indicates that the scheduler has not processed any - // `ParaId` this session. - if let Some(previous_para_id) = previous_para { - Pallet::::decrease_affinity(previous_para_id, core_idx) - } - - let mut queue: VecDeque = OnDemandQueue::::get(); - - let mut invalidated_para_id_indexes: Vec = vec![]; - - // Get the position of the next `ParaId`. Select either a valid `ParaId` that has an - // affinity to the same `CoreIndex` as the scheduler asks for or a valid `ParaId` with no - // affinity at all. - let pos = queue.iter().enumerate().position(|(index, assignment)| { - if >::is_parathread(assignment.para_id) { - match ParaIdAffinity::::get(&assignment.para_id) { - Some(affinity) => return affinity.core_idx == core_idx, - None => return true, - } - } - // Record no longer valid para_ids. - invalidated_para_id_indexes.push(index); - return false - }); - - // Collect the popped value. - let popped = pos.and_then(|p: usize| { - if let Some(assignment) = queue.remove(p) { - Pallet::::increase_affinity(assignment.para_id, core_idx); - return Some(assignment) - }; - None - }); - - // Only remove the invalid indexes *after* using the index. - // Removed in reverse order so that the indexes don't shift. - invalidated_para_id_indexes.iter().rev().for_each(|idx| { - queue.remove(*idx); - }); - - // Write changes to storage. - OnDemandQueue::::set(queue); - - popped - } - - /// Push an assignment back to the queue. - /// Typically used on session boundaries. - /// Parameters: - /// - `core_idx`: The core index - /// - `assignment`: The on demand assignment. - fn push_assignment_for_core(core_idx: CoreIndex, assignment: Assignment) { - Pallet::::decrease_affinity(assignment.para_id, core_idx); - // Skip the queue on push backs from scheduler - match Pallet::::add_on_demand_assignment(assignment, QueuePushDirection::Front) { - Ok(_) => {}, - Err(_) => {}, - } - } - - fn get_provider_config(_core_idx: CoreIndex) -> AssignmentProviderConfig> { - let config = >::config(); - AssignmentProviderConfig { - max_availability_timeouts: config.on_demand_retries, - ttl: config.on_demand_ttl, - } - } -} From 6bf9d00241bbd0f82cc8835be477858c237e0439 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Fri, 10 May 2024 12:50:10 +0100 Subject: [PATCH 16/18] Revert "reverting commit mistake (accidental code change)" This reverts commit 82b48c441dfe9f0a95947530363fe9aca6a1f865. --- .../parachains/src/assigner_on_demand/mod.rs | 613 ++++++++++++++++++ 1 file changed, 613 insertions(+) create mode 100644 polkadot/runtime/parachains/src/assigner_on_demand/mod.rs diff --git a/polkadot/runtime/parachains/src/assigner_on_demand/mod.rs b/polkadot/runtime/parachains/src/assigner_on_demand/mod.rs new file mode 100644 index 000000000000..75c29bd6fbe4 --- /dev/null +++ b/polkadot/runtime/parachains/src/assigner_on_demand/mod.rs @@ -0,0 +1,613 @@ +// Copyright (C) Parity Technologies (UK) Ltd. +// This file is part of Polkadot. + +// Polkadot is free software: you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. + +// Polkadot is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License +// along with Polkadot. If not, see . + +//! The parachain on demand assignment module. +//! +//! Implements a mechanism for taking in orders for pay as you go (PAYG) or on demand +//! parachain (previously parathreads) assignments. This module is not handled by the +//! initializer but is instead instantiated in the `construct_runtime` macro. +//! +//! The module currently limits parallel execution of blocks from the same `ParaId` via +//! a core affinity mechanism. As long as there exists an affinity for a `CoreIndex` for +//! a specific `ParaId`, orders for blockspace for that `ParaId` will only be assigned to +//! that `CoreIndex`. This affinity mechanism can be removed if it can be shown that parallel +//! execution is valid. + +mod benchmarking; +mod mock_helpers; + +#[cfg(test)] +mod tests; + +use crate::{ + configuration, paras, + scheduler::common::{Assignment, AssignmentProvider, AssignmentProviderConfig}, +}; + +use frame_support::{ + pallet_prelude::*, + traits::{ + Currency, + ExistenceRequirement::{self, AllowDeath, KeepAlive}, + WithdrawReasons, + }, +}; +use frame_system::pallet_prelude::*; +use primitives::{CoreIndex, Id as ParaId}; +use sp_runtime::{ + traits::{One, SaturatedConversion}, + FixedPointNumber, FixedPointOperand, FixedU128, Perbill, Saturating, +}; + +use sp_std::{collections::vec_deque::VecDeque, prelude::*}; + +const LOG_TARGET: &str = "runtime::parachains::assigner-on-demand"; + +pub use pallet::*; + +pub trait WeightInfo { + fn place_order_allow_death(s: u32) -> Weight; + fn place_order_keep_alive(s: u32) -> Weight; +} + +/// A weight info that is only suitable for testing. +pub struct TestWeightInfo; + +impl WeightInfo for TestWeightInfo { + fn place_order_allow_death(_: u32) -> Weight { + Weight::MAX + } + + fn place_order_keep_alive(_: u32) -> Weight { + Weight::MAX + } +} + +/// Keeps track of how many assignments a scheduler currently has at a specific `CoreIndex` for a +/// specific `ParaId`. +#[derive(Encode, Decode, Default, Clone, Copy, TypeInfo)] +#[cfg_attr(test, derive(PartialEq, Debug))] +pub struct CoreAffinityCount { + core_idx: CoreIndex, + count: u32, +} + +/// An indicator as to which end of the `OnDemandQueue` an assignment will be placed. +pub enum QueuePushDirection { + Back, + Front, +} + +/// Shorthand for the Balance type the runtime is using. +type BalanceOf = + <::Currency as Currency<::AccountId>>::Balance; + +/// Errors that can happen during spot traffic calculation. +#[derive(PartialEq)] +#[cfg_attr(feature = "std", derive(Debug))] +pub enum SpotTrafficCalculationErr { + /// The order queue capacity is at 0. + QueueCapacityIsZero, + /// The queue size is larger than the queue capacity. + QueueSizeLargerThanCapacity, + /// Arithmetic error during division, either division by 0 or over/underflow. + Division, +} + +#[frame_support::pallet] +pub mod pallet { + + use super::*; + + #[pallet::pallet] + #[pallet::without_storage_info] + pub struct Pallet(_); + + #[pallet::config] + pub trait Config: frame_system::Config + configuration::Config + paras::Config { + /// The runtime's definition of an event. + type RuntimeEvent: From> + IsType<::RuntimeEvent>; + + /// The runtime's definition of a Currency. + type Currency: Currency; + + /// Something that provides the weight of this pallet. + type WeightInfo: WeightInfo; + + /// The default value for the spot traffic multiplier. + #[pallet::constant] + type TrafficDefaultValue: Get; + } + + /// Creates an empty spot traffic value if one isn't present in storage already. + #[pallet::type_value] + pub fn SpotTrafficOnEmpty() -> FixedU128 { + T::TrafficDefaultValue::get() + } + + /// Creates an empty on demand queue if one isn't present in storage already. + #[pallet::type_value] + pub fn OnDemandQueueOnEmpty() -> VecDeque { + VecDeque::new() + } + + /// Keeps track of the multiplier used to calculate the current spot price for the on demand + /// assigner. + #[pallet::storage] + pub(super) type SpotTraffic = + StorageValue<_, FixedU128, ValueQuery, SpotTrafficOnEmpty>; + + /// The order storage entry. Uses a VecDeque to be able to push to the front of the + /// queue from the scheduler on session boundaries. + #[pallet::storage] + pub type OnDemandQueue = + StorageValue<_, VecDeque, ValueQuery, OnDemandQueueOnEmpty>; + + /// Maps a `ParaId` to `CoreIndex` and keeps track of how many assignments the scheduler has in + /// it's lookahead. Keeping track of this affinity prevents parallel execution of the same + /// `ParaId` on two or more `CoreIndex`es. + #[pallet::storage] + pub(super) type ParaIdAffinity = + StorageMap<_, Twox256, ParaId, CoreAffinityCount, OptionQuery>; + + #[pallet::event] + #[pallet::generate_deposit(pub(super) fn deposit_event)] + pub enum Event { + /// An order was placed at some spot price amount. + OnDemandOrderPlaced { para_id: ParaId, spot_price: BalanceOf }, + /// The value of the spot traffic multiplier changed. + SpotTrafficSet { traffic: FixedU128 }, + } + + #[pallet::error] + pub enum Error { + /// The `ParaId` supplied to the `place_order` call is not a valid `ParaThread`, making the + /// call is invalid. + InvalidParaId, + /// The order queue is full, `place_order` will not continue. + QueueFull, + /// The current spot price is higher than the max amount specified in the `place_order` + /// call, making it invalid. + SpotPriceHigherThanMaxAmount, + /// There are no on demand cores available. `place_order` will not add anything to the + /// queue. + NoOnDemandCores, + } + + #[pallet::hooks] + impl Hooks> for Pallet { + fn on_initialize(_now: BlockNumberFor) -> Weight { + let config = >::config(); + // Calculate spot price multiplier and store it. + let old_traffic = SpotTraffic::::get(); + match Self::calculate_spot_traffic( + old_traffic, + config.on_demand_queue_max_size, + Self::queue_size(), + config.on_demand_target_queue_utilization, + config.on_demand_fee_variability, + ) { + Ok(new_traffic) => { + // Only update storage on change + if new_traffic != old_traffic { + SpotTraffic::::set(new_traffic); + Pallet::::deposit_event(Event::::SpotTrafficSet { + traffic: new_traffic, + }); + return T::DbWeight::get().reads_writes(2, 1) + } + }, + Err(SpotTrafficCalculationErr::QueueCapacityIsZero) => { + log::debug!( + target: LOG_TARGET, + "Error calculating spot traffic: The order queue capacity is at 0." + ); + }, + Err(SpotTrafficCalculationErr::QueueSizeLargerThanCapacity) => { + log::debug!( + target: LOG_TARGET, + "Error calculating spot traffic: The queue size is larger than the queue capacity." + ); + }, + Err(SpotTrafficCalculationErr::Division) => { + log::debug!( + target: LOG_TARGET, + "Error calculating spot traffic: Arithmetic error during division, either division by 0 or over/underflow." + ); + }, + }; + T::DbWeight::get().reads_writes(2, 0) + } + } + + #[pallet::call] + impl Pallet { + /// Create a single on demand core order. + /// Will use the spot price for the current block and will reap the account if needed. + /// + /// Parameters: + /// - `origin`: The sender of the call, funds will be withdrawn from this account. + /// - `max_amount`: The maximum balance to withdraw from the origin to place an order. + /// - `para_id`: A `ParaId` the origin wants to provide blockspace for. + /// + /// Errors: + /// - `InsufficientBalance`: from the Currency implementation + /// - `InvalidParaId` + /// - `QueueFull` + /// - `SpotPriceHigherThanMaxAmount` + /// - `NoOnDemandCores` + /// + /// Events: + /// - `SpotOrderPlaced` + #[pallet::call_index(0)] + #[pallet::weight(::WeightInfo::place_order_allow_death(OnDemandQueue::::get().len() as u32))] + pub fn place_order_allow_death( + origin: OriginFor, + max_amount: BalanceOf, + para_id: ParaId, + ) -> DispatchResult { + let sender = ensure_signed(origin)?; + Pallet::::do_place_order(sender, max_amount, para_id, AllowDeath) + } + + /// Same as the [`place_order_allow_death`](Self::place_order_allow_death) call , but with a + /// check that placing the order will not reap the account. + /// + /// Parameters: + /// - `origin`: The sender of the call, funds will be withdrawn from this account. + /// - `max_amount`: The maximum balance to withdraw from the origin to place an order. + /// - `para_id`: A `ParaId` the origin wants to provide blockspace for. + /// + /// Errors: + /// - `InsufficientBalance`: from the Currency implementation + /// - `InvalidParaId` + /// - `QueueFull` + /// - `SpotPriceHigherThanMaxAmount` + /// - `NoOnDemandCores` + /// + /// Events: + /// - `SpotOrderPlaced` + #[pallet::call_index(1)] + #[pallet::weight(::WeightInfo::place_order_keep_alive(OnDemandQueue::::get().len() as u32))] + pub fn place_order_keep_alive( + origin: OriginFor, + max_amount: BalanceOf, + para_id: ParaId, + ) -> DispatchResult { + let sender = ensure_signed(origin)?; + Pallet::::do_place_order(sender, max_amount, para_id, KeepAlive) + } + } +} + +impl Pallet +where + BalanceOf: FixedPointOperand, +{ + /// Helper function for `place_order_*` calls. Used to differentiate between placing orders + /// with a keep alive check or to allow the account to be reaped. + /// + /// Parameters: + /// - `sender`: The sender of the call, funds will be withdrawn from this account. + /// - `max_amount`: The maximum balance to withdraw from the origin to place an order. + /// - `para_id`: A `ParaId` the origin wants to provide blockspace for. + /// - `existence_requirement`: Whether or not to ensure that the account will not be reaped. + /// + /// Errors: + /// - `InsufficientBalance`: from the Currency implementation + /// - `InvalidParaId` + /// - `QueueFull` + /// - `SpotPriceHigherThanMaxAmount` + /// - `NoOnDemandCores` + /// + /// Events: + /// - `SpotOrderPlaced` + fn do_place_order( + sender: ::AccountId, + max_amount: BalanceOf, + para_id: ParaId, + existence_requirement: ExistenceRequirement, + ) -> DispatchResult { + let config = >::config(); + + // Are there any schedulable cores in this session + ensure!(config.on_demand_cores > 0, Error::::NoOnDemandCores); + + // Traffic always falls back to 1.0 + let traffic = SpotTraffic::::get(); + + // Calculate spot price + let spot_price: BalanceOf = + traffic.saturating_mul_int(config.on_demand_base_fee.saturated_into::>()); + + // Is the current price higher than `max_amount` + ensure!(spot_price.le(&max_amount), Error::::SpotPriceHigherThanMaxAmount); + + // Charge the sending account the spot price + T::Currency::withdraw(&sender, spot_price, WithdrawReasons::FEE, existence_requirement)?; + + let assignment = Assignment::new(para_id); + + let res = Pallet::::add_on_demand_assignment(assignment, QueuePushDirection::Back); + + match res { + Ok(_) => { + Pallet::::deposit_event(Event::::OnDemandOrderPlaced { para_id, spot_price }); + return Ok(()) + }, + Err(err) => return Err(err), + } + } + + /// The spot price multiplier. This is based on the transaction fee calculations defined in: + /// https://research.web3.foundation/Polkadot/overview/token-economics#setting-transaction-fees + /// + /// Parameters: + /// - `traffic`: The previously calculated multiplier, can never go below 1.0. + /// - `queue_capacity`: The max size of the order book. + /// - `queue_size`: How many orders are currently in the order book. + /// - `target_queue_utilisation`: How much of the queue_capacity should be ideally occupied, + /// expressed in percentages(perbill). + /// - `variability`: A variability factor, i.e. how quickly the spot price adjusts. This number + /// can be chosen by p/(k*(1-s)) where p is the desired ratio increase in spot price over k + /// number of blocks. s is the target_queue_utilisation. A concrete example: v = + /// 0.05/(20*(1-0.25)) = 0.0033. + /// + /// Returns: + /// - A `FixedU128` in the range of `Config::TrafficDefaultValue` - `FixedU128::MAX` on + /// success. + /// + /// Errors: + /// - `SpotTrafficCalculationErr::QueueCapacityIsZero` + /// - `SpotTrafficCalculationErr::QueueSizeLargerThanCapacity` + /// - `SpotTrafficCalculationErr::Division` + pub(crate) fn calculate_spot_traffic( + traffic: FixedU128, + queue_capacity: u32, + queue_size: u32, + target_queue_utilisation: Perbill, + variability: Perbill, + ) -> Result { + // Return early if queue has no capacity. + if queue_capacity == 0 { + return Err(SpotTrafficCalculationErr::QueueCapacityIsZero) + } + + // Return early if queue size is greater than capacity. + if queue_size > queue_capacity { + return Err(SpotTrafficCalculationErr::QueueSizeLargerThanCapacity) + } + + // (queue_size / queue_capacity) - target_queue_utilisation + let queue_util_ratio = FixedU128::from_rational(queue_size.into(), queue_capacity.into()); + let positive = queue_util_ratio >= target_queue_utilisation.into(); + let queue_util_diff = queue_util_ratio.max(target_queue_utilisation.into()) - + queue_util_ratio.min(target_queue_utilisation.into()); + + // variability * queue_util_diff + let var_times_qud = queue_util_diff.saturating_mul(variability.into()); + + // variability^2 * queue_util_diff^2 + let var_times_qud_pow = var_times_qud.saturating_mul(var_times_qud); + + // (variability^2 * queue_util_diff^2)/2 + let div_by_two: FixedU128; + match var_times_qud_pow.const_checked_div(2.into()) { + Some(dbt) => div_by_two = dbt, + None => return Err(SpotTrafficCalculationErr::Division), + } + + // traffic * (1 + queue_util_diff) + div_by_two + if positive { + let new_traffic = queue_util_diff + .saturating_add(div_by_two) + .saturating_add(One::one()) + .saturating_mul(traffic); + Ok(new_traffic.max(::TrafficDefaultValue::get())) + } else { + let new_traffic = queue_util_diff.saturating_sub(div_by_two).saturating_mul(traffic); + Ok(new_traffic.max(::TrafficDefaultValue::get())) + } + } + + /// Adds an assignment to the on demand queue. + /// + /// Paramenters: + /// - `assignment`: The on demand assignment to add to the queue. + /// - `location`: Whether to push this entry to the back or the front of the queue. Pushing an + /// entry to the front of the queue is only used when the scheduler wants to push back an + /// entry it has already popped. + /// Returns: + /// - The unit type on success. + /// + /// Errors: + /// - `InvalidParaId` + /// - `QueueFull` + pub fn add_on_demand_assignment( + assignment: Assignment, + location: QueuePushDirection, + ) -> Result<(), DispatchError> { + // Only parathreads are valid paraids for on the go parachains. + ensure!(>::is_parathread(assignment.para_id), Error::::InvalidParaId); + + let config = >::config(); + + OnDemandQueue::::try_mutate(|queue| { + // Abort transaction if queue is too large + ensure!(Self::queue_size() < config.on_demand_queue_max_size, Error::::QueueFull); + match location { + QueuePushDirection::Back => queue.push_back(assignment), + QueuePushDirection::Front => queue.push_front(assignment), + }; + Ok(()) + }) + } + + /// Get the size of the on demand queue. + /// + /// Returns: + /// - The size of the on demand queue. + fn queue_size() -> u32 { + let config = >::config(); + match OnDemandQueue::::get().len().try_into() { + Ok(size) => return size, + Err(_) => { + log::debug!( + target: LOG_TARGET, + "Failed to fetch the on demand queue size, returning the max size." + ); + return config.on_demand_queue_max_size + }, + } + } + + /// Getter for the order queue. + pub fn get_queue() -> VecDeque { + OnDemandQueue::::get() + } + + /// Getter for the affinity tracker. + pub fn get_affinity_map(para_id: ParaId) -> Option { + ParaIdAffinity::::get(para_id) + } + + /// Decreases the affinity of a `ParaId` to a specified `CoreIndex`. + /// Subtracts from the count of the `CoreAffinityCount` if an entry is found and the core_idx + /// matches. When the count reaches 0, the entry is removed. + /// A non-existant entry is a no-op. + fn decrease_affinity(para_id: ParaId, core_idx: CoreIndex) { + ParaIdAffinity::::mutate(para_id, |maybe_affinity| { + if let Some(affinity) = maybe_affinity { + if affinity.core_idx == core_idx { + let new_count = affinity.count.saturating_sub(1); + if new_count > 0 { + *maybe_affinity = Some(CoreAffinityCount { core_idx, count: new_count }); + } else { + *maybe_affinity = None; + } + } + } + }); + } + + /// Increases the affinity of a `ParaId` to a specified `CoreIndex`. + /// Adds to the count of the `CoreAffinityCount` if an entry is found and the core_idx matches. + /// A non-existant entry will be initialized with a count of 1 and uses the supplied + /// `CoreIndex`. + fn increase_affinity(para_id: ParaId, core_idx: CoreIndex) { + ParaIdAffinity::::mutate(para_id, |maybe_affinity| match maybe_affinity { + Some(affinity) => + if affinity.core_idx == core_idx { + *maybe_affinity = Some(CoreAffinityCount { + core_idx, + count: affinity.count.saturating_add(1), + }); + }, + None => { + *maybe_affinity = Some(CoreAffinityCount { core_idx, count: 1 }); + }, + }) + } +} + +impl AssignmentProvider> for Pallet { + fn session_core_count() -> u32 { + let config = >::config(); + config.on_demand_cores + } + + /// Take the next queued entry that is available for a given core index. + /// Invalidates and removes orders with a `para_id` that is not `ParaLifecycle::Parathread` + /// but only in [0..P] range slice of the order queue, where P is the element that is + /// removed from the order queue. + /// + /// Parameters: + /// - `core_idx`: The core index + /// - `previous_paraid`: Which paraid was previously processed on the requested core. Is None if + /// nothing was processed on the core. + fn pop_assignment_for_core( + core_idx: CoreIndex, + previous_para: Option, + ) -> Option { + // Only decrease the affinity of the previous para if it exists. + // A nonexistant `ParaId` indicates that the scheduler has not processed any + // `ParaId` this session. + if let Some(previous_para_id) = previous_para { + Pallet::::decrease_affinity(previous_para_id, core_idx) + } + + let mut queue: VecDeque = OnDemandQueue::::get(); + + let mut invalidated_para_id_indexes: Vec = vec![]; + + // Get the position of the next `ParaId`. Select either a valid `ParaId` that has an + // affinity to the same `CoreIndex` as the scheduler asks for or a valid `ParaId` with no + // affinity at all. + let pos = queue.iter().enumerate().position(|(index, assignment)| { + if >::is_parathread(assignment.para_id) { + match ParaIdAffinity::::get(&assignment.para_id) { + Some(affinity) => return affinity.core_idx == core_idx, + None => return true, + } + } + // Record no longer valid para_ids. + invalidated_para_id_indexes.push(index); + return false + }); + + // Collect the popped value. + let popped = pos.and_then(|p: usize| { + if let Some(assignment) = queue.remove(p) { + Pallet::::increase_affinity(assignment.para_id, core_idx); + return Some(assignment) + }; + None + }); + + // Only remove the invalid indexes *after* using the index. + // Removed in reverse order so that the indexes don't shift. + invalidated_para_id_indexes.iter().rev().for_each(|idx| { + queue.remove(*idx); + }); + + // Write changes to storage. + OnDemandQueue::::set(queue); + + popped + } + + /// Push an assignment back to the queue. + /// Typically used on session boundaries. + /// Parameters: + /// - `core_idx`: The core index + /// - `assignment`: The on demand assignment. + fn push_assignment_for_core(core_idx: CoreIndex, assignment: Assignment) { + Pallet::::decrease_affinity(assignment.para_id, core_idx); + // Skip the queue on push backs from scheduler + match Pallet::::add_on_demand_assignment(assignment, QueuePushDirection::Front) { + Ok(_) => {}, + Err(_) => {}, + } + } + + fn get_provider_config(_core_idx: CoreIndex) -> AssignmentProviderConfig> { + let config = >::config(); + AssignmentProviderConfig { + max_availability_timeouts: config.on_demand_retries, + ttl: config.on_demand_ttl, + } + } +} From e89abe9f6437a8a18c591d3674d8aca8a62d779d Mon Sep 17 00:00:00 2001 From: Overkillus Date: Fri, 10 May 2024 12:51:27 +0100 Subject: [PATCH 17/18] erroneous file removal --- .../parachains/src/assigner_coretime/mod.rs | 488 ------------------ 1 file changed, 488 deletions(-) delete mode 100644 polkadot/runtime/parachains/src/assigner_coretime/mod.rs diff --git a/polkadot/runtime/parachains/src/assigner_coretime/mod.rs b/polkadot/runtime/parachains/src/assigner_coretime/mod.rs deleted file mode 100644 index 15701a783354..000000000000 --- a/polkadot/runtime/parachains/src/assigner_coretime/mod.rs +++ /dev/null @@ -1,488 +0,0 @@ -// Copyright (C) Parity Technologies (UK) Ltd. -// This file is part of Polkadot. - -// Polkadot is free software: you can redistribute it and/or modify -// it under the terms of the GNU General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. - -// Polkadot is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU General Public License for more details. - -// You should have received a copy of the GNU General Public License -// along with Polkadot. If not, see . - -//! The parachain coretime assignment module. -//! -//! Handles scheduling of assignments coming from the coretime/broker chain. For on-demand -//! assignments it relies on the separate on-demand assignment provider, where it forwards requests -//! to. -//! -//! `CoreDescriptor` contains pointers to the beginning and the end of a list of schedules, together -//! with the currently active assignments. - -mod mock_helpers; -#[cfg(test)] -mod tests; - -use crate::{ - assigner_on_demand, configuration, - paras::AssignCoretime, - scheduler::common::{Assignment, AssignmentProvider}, - ParaId, -}; - -use frame_support::{defensive, pallet_prelude::*}; -use frame_system::pallet_prelude::*; -use pallet_broker::CoreAssignment; -use primitives::CoreIndex; -use sp_runtime::traits::{One, Saturating}; - -use sp_std::prelude::*; - -pub use pallet::*; - -/// Fraction expressed as a numerator with an assumed denominator of 57,600. -#[derive(RuntimeDebug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Encode, Decode, TypeInfo)] -pub struct PartsOf57600(u16); - -impl PartsOf57600 { - pub const ZERO: Self = Self(0); - pub const FULL: Self = Self(57600); - - pub fn new_saturating(v: u16) -> Self { - Self::ZERO.saturating_add(Self(v)) - } - - pub fn is_full(&self) -> bool { - *self == Self::FULL - } - - pub fn saturating_add(self, rhs: Self) -> Self { - let inner = self.0.saturating_add(rhs.0); - if inner > 57600 { - Self(57600) - } else { - Self(inner) - } - } - - pub fn saturating_sub(self, rhs: Self) -> Self { - Self(self.0.saturating_sub(rhs.0)) - } - - pub fn checked_add(self, rhs: Self) -> Option { - let inner = self.0.saturating_add(rhs.0); - if inner > 57600 { - None - } else { - Some(Self(inner)) - } - } -} - -/// Assignments as they are scheduled by block number -/// -/// for a particular core. -#[derive(Encode, Decode, TypeInfo)] -#[cfg_attr(test, derive(PartialEq, RuntimeDebug))] -struct Schedule { - // Original assignments - assignments: Vec<(CoreAssignment, PartsOf57600)>, - /// When do our assignments become invalid, if at all? - /// - /// If this is `Some`, then this `CoreState` will be dropped at that block number. If this is - /// `None`, then we will keep serving our core assignments in a circle until a new set of - /// assignments is scheduled. - end_hint: Option, - - /// The next queued schedule for this core. - /// - /// Schedules are forming a queue. - next_schedule: Option, -} - -/// Descriptor for a core. -/// -/// Contains pointers to first and last schedule into `CoreSchedules` for that core and keeps track -/// of the currently active work as well. -#[derive(Encode, Decode, TypeInfo, Default)] -#[cfg_attr(test, derive(PartialEq, RuntimeDebug, Clone))] -struct CoreDescriptor { - /// Meta data about the queued schedules for this core. - queue: Option>, - /// Currently performed work. - current_work: Option>, -} - -/// Pointers into `CoreSchedules` for a particular core. -/// -/// Schedules in `CoreSchedules` form a queue. `Schedule::next_schedule` always pointing to the next -/// item. -#[derive(Encode, Decode, TypeInfo, Copy, Clone)] -#[cfg_attr(test, derive(PartialEq, RuntimeDebug))] -struct QueueDescriptor { - /// First scheduled item, that is not yet active. - first: N, - /// Last scheduled item. - last: N, -} - -#[derive(Encode, Decode, TypeInfo)] -#[cfg_attr(test, derive(PartialEq, RuntimeDebug, Clone))] -struct WorkState { - /// Assignments with current state. - /// - /// Assignments and book keeping on how much has been served already. We keep track of serviced - /// assignments in order to adhere to the specified ratios. - assignments: Vec<(CoreAssignment, AssignmentState)>, - /// When do our assignments become invalid if at all? - /// - /// If this is `Some`, then this `CoreState` will be dropped at that block number. If this is - /// `None`, then we will keep serving our core assignments in a circle until a new set of - /// assignments is scheduled. - end_hint: Option, - /// Position in the assignments we are currently in. - /// - /// Aka which core assignment will be popped next on - /// `AssignmentProvider::pop_assignment_for_core`. - pos: u16, - /// Step width - /// - /// How much we subtract from `AssignmentState::remaining` for a core served. - step: PartsOf57600, -} - -#[derive(Encode, Decode, TypeInfo)] -#[cfg_attr(test, derive(PartialEq, RuntimeDebug, Clone, Copy))] -struct AssignmentState { - /// Ratio of the core this assignment has. - /// - /// As initially received via `assign_core`. - ratio: PartsOf57600, - /// How many parts are remaining in this round? - /// - /// At the end of each round (in preparation for the next), ratio will be added to remaining. - /// Then every time we get scheduled we subtract a core worth of points. Once we reach 0 or a - /// number lower than what a core is worth (`CoreState::step` size), we move on to the next - /// item in the `Vec`. - /// - /// The first round starts with remaining = ratio. - remaining: PartsOf57600, -} - -impl From> for WorkState { - fn from(schedule: Schedule) -> Self { - let Schedule { assignments, end_hint, next_schedule: _ } = schedule; - let step = - if let Some(min_step_assignment) = assignments.iter().min_by(|a, b| a.1.cmp(&b.1)) { - min_step_assignment.1 - } else { - // Assignments empty, should not exist. In any case step size does not matter here: - log::debug!("assignments of a `Schedule` should never be empty."); - PartsOf57600(1) - }; - let assignments = assignments - .into_iter() - .map(|(a, ratio)| (a, AssignmentState { ratio, remaining: ratio })) - .collect(); - - Self { assignments, end_hint, pos: 0, step } - } -} - -#[frame_support::pallet] -pub mod pallet { - use super::*; - - #[pallet::pallet] - #[pallet::without_storage_info] - pub struct Pallet(_); - - #[pallet::config] - pub trait Config: - frame_system::Config + configuration::Config + assigner_on_demand::Config - { - } - - /// Scheduled assignment sets. - /// - /// Assignments as of the given block number. They will go into state once the block number is - /// reached (and replace whatever was in there before). - #[pallet::storage] - pub(super) type CoreSchedules = StorageMap< - _, - Twox256, - (BlockNumberFor, CoreIndex), - Schedule>, - OptionQuery, - >; - - /// Assignments which are currently active. - /// - /// They will be picked from `PendingAssignments` once we reach the scheduled block number in - /// `PendingAssignments`. - #[pallet::storage] - pub(super) type CoreDescriptors = StorageMap< - _, - Twox256, - CoreIndex, - CoreDescriptor>, - ValueQuery, - GetDefault, - >; - - #[pallet::hooks] - impl Hooks> for Pallet {} - - #[pallet::error] - pub enum Error { - AssignmentsEmpty, - /// Assignments together exceeded 57600. - OverScheduled, - /// Assignments together less than 57600 - UnderScheduled, - /// assign_core is only allowed to append new assignments at the end of already existing - /// ones. - DisallowedInsert, - /// Tried to insert a schedule for the same core and block number as an existing schedule - DuplicateInsert, - /// Tried to add an unsorted set of assignments - AssignmentsNotSorted, - } -} - -impl AssignmentProvider> for Pallet { - fn pop_assignment_for_core(core_idx: CoreIndex) -> Option { - let now = frame_system::Pallet::::block_number(); - - CoreDescriptors::::mutate(core_idx, |core_state| { - Self::ensure_workload(now, core_idx, core_state); - - let work_state = core_state.current_work.as_mut()?; - - // Wrap around: - work_state.pos = work_state.pos % work_state.assignments.len() as u16; - let (a_type, a_state) = &mut work_state - .assignments - .get_mut(work_state.pos as usize) - .expect("We limited pos to the size of the vec one line above. qed"); - - // advance for next pop: - a_state.remaining = a_state.remaining.saturating_sub(work_state.step); - if a_state.remaining < work_state.step { - // Assignment exhausted, need to move to the next and credit remaining for - // next round. - work_state.pos += 1; - // Reset to ratio + still remaining "credits": - a_state.remaining = a_state.remaining.saturating_add(a_state.ratio); - } - - match a_type { - CoreAssignment::Idle => None, - CoreAssignment::Pool => - assigner_on_demand::Pallet::::pop_assignment_for_core(core_idx), - CoreAssignment::Task(para_id) => Some(Assignment::Bulk((*para_id).into())), - } - }) - } - - fn report_processed(assignment: Assignment) { - match assignment { - Assignment::Pool { para_id, core_index } => - assigner_on_demand::Pallet::::report_processed(para_id, core_index), - Assignment::Bulk(_) => {}, - } - } - - /// Push an assignment back to the front of the queue. - /// - /// The assignment has not been processed yet. Typically used on session boundaries. - /// Parameters: - /// - `assignment`: The on demand assignment. - fn push_back_assignment(assignment: Assignment) { - match assignment { - Assignment::Pool { para_id, core_index } => - assigner_on_demand::Pallet::::push_back_assignment(para_id, core_index), - Assignment::Bulk(_) => { - // Session changes are rough. We just drop assignments that did not make it on a - // session boundary. This seems sensible as bulk is region based. Meaning, even if - // we made the effort catching up on those dropped assignments, this would very - // likely lead to other assignments not getting served at the "end" (when our - // assignment set gets replaced). - }, - } - } - - #[cfg(any(feature = "runtime-benchmarks", test))] - fn get_mock_assignment(_: CoreIndex, para_id: primitives::Id) -> Assignment { - // Given that we are not tracking anything in `Bulk` assignments, it is safe to always - // return a bulk assignment. - Assignment::Bulk(para_id) - } - - fn session_core_count() -> u32 { - let config = configuration::ActiveConfig::::get(); - config.scheduler_params.num_cores - } -} - -impl Pallet { - /// Ensure given workload for core is up to date. - fn ensure_workload( - now: BlockNumberFor, - core_idx: CoreIndex, - descriptor: &mut CoreDescriptor>, - ) { - // Workload expired? - if descriptor - .current_work - .as_ref() - .and_then(|w| w.end_hint) - .map_or(false, |e| e <= now) - { - descriptor.current_work = None; - } - - let Some(queue) = descriptor.queue else { - // No queue. - return - }; - - let mut next_scheduled = queue.first; - - if next_scheduled > now { - // Not yet ready. - return - } - - // Update is needed: - let update = loop { - let Some(update) = CoreSchedules::::take((next_scheduled, core_idx)) else { - break None - }; - // Still good? - if update.end_hint.map_or(true, |e| e > now) { - break Some(update) - } - // Move on if possible: - if let Some(n) = update.next_schedule { - next_scheduled = n; - } else { - break None - } - }; - - let new_first = update.as_ref().and_then(|u| u.next_schedule); - descriptor.current_work = update.map(Into::into); - - descriptor.queue = new_first.map(|new_first| { - QueueDescriptor { - first: new_first, - // `last` stays unaffected, if not empty: - last: queue.last, - } - }); - } - - /// Append another assignment for a core. - /// - /// Important only appending is allowed. Meaning, all already existing assignments must have a - /// begin smaller than the one passed here. This restriction exists, because it makes the - /// insertion O(1) and the author could not think of a reason, why this restriction should be - /// causing any problems. Inserting arbitrarily causes a `DispatchError::DisallowedInsert` - /// error. This restriction could easily be lifted if need be and in fact an implementation is - /// available - /// [here](https://github.com/paritytech/polkadot-sdk/pull/1694/commits/c0c23b01fd2830910cde92c11960dad12cdff398#diff-0c85a46e448de79a5452395829986ee8747e17a857c27ab624304987d2dde8baR386). - /// The problem is that insertion complexity then depends on the size of the existing queue, - /// which makes determining weights hard and could lead to issues like overweight blocks (at - /// least in theory). - pub fn assign_core( - core_idx: CoreIndex, - begin: BlockNumberFor, - assignments: Vec<(CoreAssignment, PartsOf57600)>, - end_hint: Option>, - ) -> Result<(), DispatchError> { - // There should be at least one assignment. - ensure!(!assignments.is_empty(), Error::::AssignmentsEmpty); - - // Checking for sort and unique manually, since we don't have access to iterator tools. - // This way of checking uniqueness only works since we also check sortedness. - assignments.iter().map(|x| &x.0).try_fold(None, |prev, cur| { - if prev.map_or(false, |p| p >= cur) { - Err(Error::::AssignmentsNotSorted) - } else { - Ok(Some(cur)) - } - })?; - - // Check that the total parts between all assignments are equal to 57600 - let parts_sum = assignments - .iter() - .map(|assignment| assignment.1) - .try_fold(PartsOf57600::ZERO, |sum, parts| { - sum.checked_add(parts).ok_or(Error::::OverScheduled) - })?; - ensure!(parts_sum.is_full(), Error::::UnderScheduled); - - CoreDescriptors::::mutate(core_idx, |core_descriptor| { - let new_queue = match core_descriptor.queue { - Some(queue) => { - ensure!(begin > queue.last, Error::::DisallowedInsert); - - CoreSchedules::::try_mutate((queue.last, core_idx), |schedule| { - if let Some(schedule) = schedule.as_mut() { - debug_assert!(schedule.next_schedule.is_none(), "queue.end was supposed to be the end, so the next item must be `None`!"); - schedule.next_schedule = Some(begin); - } else { - defensive!("Queue end entry does not exist?"); - } - CoreSchedules::::try_mutate((begin, core_idx), |schedule| { - // It should already be impossible to overwrite an existing schedule due - // to strictly increasing block number. But we check here for safety and - // in case the design changes. - ensure!(schedule.is_none(), Error::::DuplicateInsert); - *schedule = - Some(Schedule { assignments, end_hint, next_schedule: None }); - Ok::<(), DispatchError>(()) - })?; - Ok::<(), DispatchError>(()) - })?; - - QueueDescriptor { first: queue.first, last: begin } - }, - None => { - // Queue empty, just insert: - CoreSchedules::::insert( - (begin, core_idx), - Schedule { assignments, end_hint, next_schedule: None }, - ); - QueueDescriptor { first: begin, last: begin } - }, - }; - core_descriptor.queue = Some(new_queue); - Ok(()) - }) - } -} - -impl AssignCoretime for Pallet { - fn assign_coretime(id: ParaId) -> DispatchResult { - let current_block = frame_system::Pallet::::block_number(); - - // Add a new core and assign the para to it. - let mut config = configuration::ActiveConfig::::get(); - let core = config.scheduler_params.num_cores; - config.scheduler_params.num_cores.saturating_inc(); - - // `assign_coretime` is only called at genesis or by root, so setting the active - // config here is fine. - configuration::Pallet::::force_set_active_config(config); - - let begin = current_block + One::one(); - let assignment = vec![(pallet_broker::CoreAssignment::Task(id.into()), PartsOf57600::FULL)]; - Pallet::::assign_core(CoreIndex(core), begin, assignment, None) - } -} From 879e8d41c3cdd32823b37000dc5361346c129610 Mon Sep 17 00:00:00 2001 From: Overkillus Date: Fri, 10 May 2024 13:06:18 +0100 Subject: [PATCH 18/18] table fmt --- .../src/protocol-validator-disabling.md | 26 ++++++++++--------- 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md index 85b8ded42a13..9fd44c00fa0a 100644 --- a/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md +++ b/polkadot/roadmap/implementers-guide/src/protocol-validator-disabling.md @@ -360,21 +360,25 @@ re-enabling is launched approval voter slashes can be re-instated. Numbers need between 0-2% are reasonable. 0% would still disable which with the opportunity cost consideration should be enough. > **Note:** \ -> Spammy approval checkers are in fact not a big issue as a side effect of the offchain-disabling introduced by the Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225). It makes it so all validators loosing a dispute are locally disabled and ignored for dispute initiation so it effectively silences spammers. They can still no-show but the damage is minimized. +> Spammy approval checkers are in fact not a big issue as a side effect of the offchain-disabling introduced by the +> Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/paritytech/polkadot-sdk/issues/2225). It +> makes it so all validators loosing a dispute are locally disabled and ignored for dispute initiation so it effectively +> silences spammers. They can still no-show but the damage is minimized. ## Interaction with all types of misbehaviors -With re-enabling in place and potentially approval voter slashes enabled the overall misbehaviour-punishment system can be as highlighted in the table below: +With re-enabling in place and potentially approval voter slashes enabled the overall misbehaviour-punishment system can +be as highlighted in the table below: -| Misbehaviour | Slash % | Onchain Disabling | Offchain Disabling | Chilling | Reputation Costs | -| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | --- | -| Backing Invalid | 100% | Yes (High Prio) | Yes (High Prio) | No | No | -| ForInvalid Vote | 2% | Yes (Mid Prio) | Yes (Mid Prio) | No | No | -| AgainstValid Vote | 0% | Yes (Low Prio) | Yes (Low Prio) | No | No | -| GRANDPA / BABE / BEEFY Equivocations | 0.01-100% | Yes (Varying Prio) | No | No | No | -| Seconded + Valid Equivocation | - | No | No | No | No | -| Double Seconded Equivocation | - | No | No | No | Yes | +|Misbehaviour |Slash % |Onchain Disabling |Offchain Disabling |Chilling |Reputation Costs | +|------------ |------- |----------------- |------------------ |-------- |----------------- | +|Backing Invalid |100% |Yes (High Prio) |Yes (High Prio) |No |No | +|ForInvalid Vote |2% |Yes (Mid Prio) |Yes (Mid Prio) |No |No | +|AgainstValid Vote |0% |Yes (Low Prio) |Yes (Low Prio) |No |No | +|GRANDPA / BABE / BEEFY Equivocations |0.01-100% |Yes (Varying Prio) |No |No |No | +|Seconded + Valid Equivocation |- |No |No |No |No | +|Double Seconded Equivocation |- |No |No |No |Yes | *Ignoring AURA offences. @@ -431,5 +435,3 @@ Implementation of the above design covers a few additional areas that allow for 1. Re-enable small offender when approaching BZT (**Runtime**) #TODO - When BZT limit is reached and there are more offenders to be disabled re-enable the smallest offenders to disable the biggest ones - -