Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

pvf-precheck: Integrate PVF pre-checking into paras module #4457

Merged
merged 6 commits into from
Dec 16, 2021

Conversation

pepyakin
Copy link
Contributor

@pepyakin pepyakin commented Dec 3, 2021

Closes #4009

This is the most of the runtime-side change needed for #3211.

Here is how it works.

                                           ┌──────────┐
                       supermajority       │          │
                   ┌────────for───────────▶│ accepted │
       vote────┐   │                       │          │
        │      │   │                       └──────────┘
        │      │   │
        │  ┌───────┐
        │  │       │
        └─▶│ init  │────supermajority      ┌──────────┐
           │       │       against         │          │
           └───────┘           └──────────▶│ rejected │
            ▲  │                           │          │
            │  │ session                   └──────────┘
            │  └──change
            │     │
            │     ▼
            ┌─────┐
start──────▶│reset│
            └─────┘

The PVF pre-checking can be triggered either by an upgrade or by
onboarding (i.e. calling schedule_para_initialize). The PVF
pre-checking process is identified by the PVF code hash that is being
voted on. If there is already PVF pre-checking process running, then no
new PVF pre-checking process will be started. Instead, we just subscribe
to the existing one.

If there is no PVF pre-checking process running but the PVF code hash
was already saved in the storage, that necessarily means (I invite the
reviewers to double-check this invariant) that the PVF already passed
pre-checking. This is equivalent to instant approving of the PVF.

The pre-checking process can be concluded either by obtaining a
supermajority or if it expires.

Each validator checks the list of PVFs available for voting. The vote is
binary, i.e. accept or reject a given PVF. As soon as the supermajority
of votes are collected for one of the sides of the vote, the voting is
concluded in that direction and the effects of the voting are enacted.

Only validators from the active set can participate in the vote. The set
of active validators can change each session. That's why we reset the
votes each session. A voting that observed a certain number of sessions
will be rejected.

The effects of the PVF accepting depend on the operations requested it:

  1. All onboardings subscribed to the approved PVF pre-checking process will
    get scheduled and after passing 2 session boundaries they will be onboarded.
  2. All upgrades subscribed to the approved PVF pre-checking process will
    get scheduled very similarly to the existing process. Upgrades with
    pre-checking are really the same process that is just delayed by the
    time required for pre-checking voting. In case of instant approval the
    mechanism is exactly the same. This is important from parachains
    compatibility standpoint since following the delayed upgrade requires
    the parachain to implement
    Look at the upgrade go-ahead and restriction signals cumulus#517.

In case, PVF pre-checking process was concluded with rejection, then all
the requesting operations get cancelled. For onboarding it means it gets
without movement: the lifecycle of such parachain is terminated on the
Onboarding state and after rejection the lifecycle is none. That in
turn means that the caller can attempt registering the parachain once
more. For upgrading it means that the upgrade process is aborted: that
flashes go-ahead signal with Abort flag.

Rejection leads to removing the allegedly bad validation code from the
chain storage. Among other things, this implies that the operation can
be re-requested. That allows for retrying an operation in case there was
some bug. At the same time it does not look as a DoS vector due to the
caching performed by the nodes.

PVF pre-checking can be enabled and disabled. Initially, according to
the changes in #4420, this mechanism is disabled. Triggering the PVF
pre-checking when it is disabled just means that we insta approve the
requesting operation. This should lead to the behavior being unchanged.

TODO:

  • if UpgradeGoAheadSignal is set in schedule_code_upgrade it will be immediately reset in note_new_head
  • run benchmarks

Follow-ups:

  • expose runtime APIs
  • implementers' guide changes
  • ensure this module can be controlled by governance (e.g. via force_* functions)
  • deposit events for PVF pre-checking related events
  • weights for include_pvf_check_statement

skip check-dependent-cumulus

@pepyakin
Copy link
Contributor Author

pepyakin commented Dec 3, 2021

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@github-actions github-actions bot added the A0-please_review Pull request needs code review. label Dec 3, 2021
@pepyakin pepyakin added B7-runtimenoteworthy C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. labels Dec 3, 2021
@pepyakin
Copy link
Contributor Author

pepyakin commented Dec 3, 2021

Oh sh~, TIL: "votings" is not a thing.

@pepyakin pepyakin mentioned this pull request Dec 7, 2021
Base automatically changed from pep-pvf-configuration to master December 8, 2021 14:50
@pepyakin pepyakin requested a review from drahnr December 8, 2021 14:51
@@ -73,6 +73,9 @@ pub struct HostConfiguration<BlockNumber> {
/// This parameter affects the upper bound of size of `CandidateCommitments`.
pub hrmp_max_message_num_per_candidate: u32,
/// The minimum period, in blocks, between which parachains can update their validation code.
///
/// If PVF pre-checking is enabled this should be greater than the maximum number of blocks
/// PVF pre-checking can take.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as in a timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to be vague that the pre-checking should finish first and only then a new upgrade will be allowed. It just so happens that the pre-checking timeout is typically takes the most time. However, it is also possible that the last decisive vote comes in on the last block of the session and thus ends the PVF voting, so it is not always timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exhaustiveness is imho better than trying to avoid it. Helps reviewing the code and on-boarding new people.

Copy link
Contributor

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, I have to review a few sections more carefully:

  • the "hack"
  • ref counting consistency

All my comments are just nits, code looks pretty nice (minus the existence of "hack")

runtime/parachains/src/paras.rs Outdated Show resolved Hide resolved
runtime/parachains/src/paras.rs Show resolved Hide resolved
runtime/parachains/src/paras.rs Show resolved Hide resolved
Error::<T>::PvfCheckInvalidSignature,
);

let mut active_vote = PvfActiveVoteMap::<T>::get(&stmt.subject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a particular reason we don't try_mutate this in place? It feels a bit error prone having the vote tracking the vote at the very end in an else case.

Copy link
Contributor Author

@pepyakin pepyakin Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it was also bothering me, but I think in this case cure is the same or worse than the disease.

In the case of achieving the quorum, we should remove the vote from the list and the map. Probably we can switch the entire storage value to be OptionQuery, but then we will have to deal with this everywhere as well. Then, I am trying to put mutations of the map closer to the list, but in this case the mutation will happen inside of try_mutate but the list mutation will be left alone.

We could perhaps refactor the code so that we return an enum which then says what to do with the vote: update or remove, but that's additional code only to make sure we do not accidentally miss this update, so it feels to me not worth it

runtime/parachains/src/paras.rs Show resolved Hide resolved
runtime/parachains/src/paras.rs Outdated Show resolved Hide resolved
runtime/parachains/src/paras.rs Show resolved Hide resolved
@drahnr
Copy link
Contributor

drahnr commented Dec 8, 2021

Uh, also: could you include the ascii graph in the comment :)

@pepyakin
Copy link
Contributor Author

pepyakin commented Dec 8, 2021

ah, right, I can add it in the next PR in the stack: #4459

@pepyakin pepyakin force-pushed the pep-pvf-paras branch 2 times, most recently from 4bc08c4 to 63b82d7 Compare December 10, 2021 16:13
drahnr pushed a commit that referenced this pull request Dec 16, 2021
* pvf-precheck: Integrate PVF pre-checking into paras module

Closes #4009

This is the most of the runtime-side change needed for #3211.

Here is how it works.

The PVF pre-checking can be triggered either by an upgrade or by
onboarding (i.e. calling `schedule_para_initialize`). The PVF
pre-checking process is identified by the PVF code hash that is being
voted on. If there is already PVF pre-checking process running, then no
new PVF pre-checking process will be started. Instead, we just subscribe
to the existing one.

If there is no PVF pre-checking process running but the PVF code hash
was already saved in the storage, that necessarily means (I invite the
reviewers to double-check this invariant) that the PVF already passed
pre-checking. This is equivalent to instant approving of the PVF.

The pre-checking process can be concluded either by obtaining a
supermajority or if it expires.

Each validator checks the list of PVFs available for voting. The vote is
binary, i.e. accept or reject a given PVF. As soon as the supermajority
of votes are collected for one of the sides of the vote, the voting is
concluded in that direction and the effects of the voting are enacted.

Only validators from the active set can participate in the vote. The set
of active validators can change each session. That's why we reset the
votes each session. A voting that observed a certain number of sessions
will be rejected.

The effects of the PVF accepting depend on the operations requested it:

1. All onboardings subscribed to the approved PVF pre-checking process will
get scheduled and after passing 2 session boundaries they will be onboarded.
2. All upgrades subscribed to the approved PVF pre-checking process will
get scheduled very similarly to the existing process. Upgrades with
pre-checking are really the same process that is just delayed by the
time required for pre-checking voting. In case of instant approval the
mechanism is exactly the same. This is important from parachains
compatibility standpoint since following the delayed upgrade requires
the parachain to implement
paritytech/cumulus#517.

In case, PVF pre-checking process was concluded with rejection, then all
the requesting operations get cancelled. For onboarding it means it gets
without movement: the lifecycle of such parachain is terminated on the
`Onboarding` state and after rejection the lifecycle is none. That in
turn means that the caller can attempt registering the parachain once
more. For upgrading it means that the upgrade process is aborted: that
flashes go-ahead signal with `Abort` flag.

Rejection leads to removing the allegedly bad validation code from the
chain storage. Among other things, this implies that the operation can
be re-requested. That allows for retrying an operation in case there was
some bug. At the same time it does not look as a DoS vector due to the
caching performed by the nodes.

PVF pre-checking can be enabled and disabled. Initially, according to
the changes in #4420, this mechanism is disabled. Triggering the PVF
pre-checking when it is disabled just means that we insta approve the
requesting operation. This should lead to the behavior being unchanged.

Follow-ups:

- expose runtime APIs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=polkadot-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/polkadot/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=westend-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/westend/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=kusama-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/kusama/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features runtime-benchmarks -- benchmark --chain=rococo-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/rococo/src/weights/runtime_parachains_paras.rs

* Review fixes

Co-authored-by: Parity Bot <admin@parity.io>
pepyakin added a commit that referenced this pull request Dec 17, 2021
This commit hooks up the API provided by #4457 to the runtime API
subsystem. In a following PR this API will be consumed by the PVF
pre-checking subsystem.

Co-authored-by: Chris Sosnin <chris125_@live.com>
pepyakin added a commit that referenced this pull request Dec 20, 2021
This commit hooks up the API provided by #4457 to the runtime API
subsystem. In a following PR this API will be consumed by the PVF
pre-checking subsystem.

Co-authored-by: Chris Sosnin <chris125_@live.com>
eskimor pushed a commit that referenced this pull request Dec 21, 2021
This commit hooks up the API provided by #4457 to the runtime API
subsystem. In a following PR this API will be consumed by the PVF
pre-checking subsystem.

Co-authored-by: Chris Sosnin <chris125_@live.com>

Co-authored-by: Chris Sosnin <chris125_@live.com>
pepyakin added a commit that referenced this pull request Dec 24, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
pepyakin added a commit that referenced this pull request Dec 27, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
pepyakin added a commit that referenced this pull request Dec 27, 2021
This commit incorporates the changes made to the runtime in the
following PRs:

- #4408
- #4457
- #4540
- #4542
- #4581

Note that this PR does not include the description of the PVF
pre-checker subsystem. This should be addressed within
#4611
pepyakin added a commit that referenced this pull request Dec 28, 2021
This commit incorporates the changes made to the runtime in the
following PRs:

- #4408
- #4457
- #4540
- #4542
- #4581

Note that this PR does not include the description of the PVF
pre-checker subsystem. This should be addressed within
#4611

Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
pepyakin added a commit that referenced this pull request Dec 28, 2021
This commit incorporates the changes made to the runtime in the
following PRs:

- #4408
- #4457
- #4540
- #4542
- #4581

Note that this PR does not include the description of the PVF
pre-checker subsystem. This should be addressed within
#4611

Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
pepyakin added a commit that referenced this pull request Dec 28, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
pepyakin added a commit that referenced this pull request Dec 28, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
pepyakin added a commit that referenced this pull request Dec 29, 2021
This commit incorporates the changes made to the runtime in the
following PRs:

- #4408
- #4457
- #4540
- #4542
- #4581

Note that this PR does not include the description of the PVF
pre-checker subsystem. This should be addressed within
#4611

Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
pepyakin added a commit that referenced this pull request Dec 30, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
pepyakin added a commit that referenced this pull request Dec 30, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
paritytech-processbot bot pushed a commit that referenced this pull request Dec 31, 2021
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
drahnr pushed a commit that referenced this pull request Jan 4, 2022
* pvf-precheck: Integrate PVF pre-checking into paras module

Closes #4009

This is the most of the runtime-side change needed for #3211.

Here is how it works.

The PVF pre-checking can be triggered either by an upgrade or by
onboarding (i.e. calling `schedule_para_initialize`). The PVF
pre-checking process is identified by the PVF code hash that is being
voted on. If there is already PVF pre-checking process running, then no
new PVF pre-checking process will be started. Instead, we just subscribe
to the existing one.

If there is no PVF pre-checking process running but the PVF code hash
was already saved in the storage, that necessarily means (I invite the
reviewers to double-check this invariant) that the PVF already passed
pre-checking. This is equivalent to instant approving of the PVF.

The pre-checking process can be concluded either by obtaining a
supermajority or if it expires.

Each validator checks the list of PVFs available for voting. The vote is
binary, i.e. accept or reject a given PVF. As soon as the supermajority
of votes are collected for one of the sides of the vote, the voting is
concluded in that direction and the effects of the voting are enacted.

Only validators from the active set can participate in the vote. The set
of active validators can change each session. That's why we reset the
votes each session. A voting that observed a certain number of sessions
will be rejected.

The effects of the PVF accepting depend on the operations requested it:

1. All onboardings subscribed to the approved PVF pre-checking process will
get scheduled and after passing 2 session boundaries they will be onboarded.
2. All upgrades subscribed to the approved PVF pre-checking process will
get scheduled very similarly to the existing process. Upgrades with
pre-checking are really the same process that is just delayed by the
time required for pre-checking voting. In case of instant approval the
mechanism is exactly the same. This is important from parachains
compatibility standpoint since following the delayed upgrade requires
the parachain to implement
paritytech/cumulus#517.

In case, PVF pre-checking process was concluded with rejection, then all
the requesting operations get cancelled. For onboarding it means it gets
without movement: the lifecycle of such parachain is terminated on the
`Onboarding` state and after rejection the lifecycle is none. That in
turn means that the caller can attempt registering the parachain once
more. For upgrading it means that the upgrade process is aborted: that
flashes go-ahead signal with `Abort` flag.

Rejection leads to removing the allegedly bad validation code from the
chain storage. Among other things, this implies that the operation can
be re-requested. That allows for retrying an operation in case there was
some bug. At the same time it does not look as a DoS vector due to the
caching performed by the nodes.

PVF pre-checking can be enabled and disabled. Initially, according to
the changes in #4420, this mechanism is disabled. Triggering the PVF
pre-checking when it is disabled just means that we insta approve the
requesting operation. This should lead to the behavior being unchanged.

Follow-ups:

- expose runtime APIs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=polkadot-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/polkadot/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=westend-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/westend/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=kusama-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/kusama/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features runtime-benchmarks -- benchmark --chain=rococo-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/rococo/src/weights/runtime_parachains_paras.rs

* Review fixes

Co-authored-by: Parity Bot <admin@parity.io>
drahnr pushed a commit that referenced this pull request Jan 4, 2022
This commit hooks up the API provided by #4457 to the runtime API
subsystem. In a following PR this API will be consumed by the PVF
pre-checking subsystem.

Co-authored-by: Chris Sosnin <chris125_@live.com>

Co-authored-by: Chris Sosnin <chris125_@live.com>
drahnr pushed a commit that referenced this pull request Jan 4, 2022
This commit incorporates the changes made to the runtime in the
following PRs:

- #4408
- #4457
- #4540
- #4542
- #4581

Note that this PR does not include the description of the PVF
pre-checker subsystem. This should be addressed within
#4611

Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
drahnr pushed a commit that referenced this pull request Jan 4, 2022
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: #4457 (comment)
@jakoblell jakoblell added D1-audited 👍 PR contains changes to critical logic that has been properly reviewed and externally audited. and removed D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. labels Jan 18, 2022
Wizdave97 pushed a commit to ComposableFi/polkadot that referenced this pull request Feb 3, 2022
This commit hooks up the API provided by paritytech#4457 to the runtime API
subsystem. In a following PR this API will be consumed by the PVF
pre-checking subsystem.

Co-authored-by: Chris Sosnin <chris125_@live.com>

Co-authored-by: Chris Sosnin <chris125_@live.com>
Wizdave97 pushed a commit to ComposableFi/polkadot that referenced this pull request Feb 3, 2022
This commit incorporates the changes made to the runtime in the
following PRs:

- paritytech#4408
- paritytech#4457
- paritytech#4540
- paritytech#4542
- paritytech#4581

Note that this PR does not include the description of the PVF
pre-checker subsystem. This should be addressed within
paritytech#4611

Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
Wizdave97 pushed a commit to ComposableFi/polkadot that referenced this pull request Feb 3, 2022
Closes paritytech#3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: paritytech#4457 (comment)
ggwpez pushed a commit to ggwpez/runtimes that referenced this pull request Mar 10, 2023
Closes #3971

Read the linked issue.

Apart from that, this addresses the concern raised in this [comment] by
just adding a test. I couldn't find a clean way to reconcile a block
number delay with a PVF voting TTL, so I just resorted to rely on the
test. Should be fine for now.

[comment]: paritytech/polkadot#4457 (comment)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D1-audited 👍 PR contains changes to critical logic that has been properly reviewed and externally audited.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement pre-checking changes in the paras module
6 participants