Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Start counting expected_at from the inclusion #4601

Closed
pepyakin opened this issue Dec 24, 2021 · 10 comments · Fixed by #7486
Closed

Start counting expected_at from the inclusion #4601

pepyakin opened this issue Dec 24, 2021 · 10 comments · Fixed by #7486

Comments

@pepyakin
Copy link
Contributor

pepyakin commented Dec 24, 2021

Right now, the first relay-chain block at which parachain can upgrade its code (dubbed expected_at) is counted from the relay-parent 1. The delay is specified by the validation_upgrade_delay. This delay acts as a safeguard: the relay-chain can be reverted up to this number of blocks after the upgrade, and still the upgraded code can be found in the relay-chain and can be retrieved via the validation_code_by_hash runtime API.

However, right now, this property is broken by a couple of blocks: the problem is that if the relay-chain is reverted by validation_upgrade_delay we will return to the relay-parent, and the validation code will be only stored at the time the candidate is enacted. More specifically, there are two factors at play:

  1. relay-parent currently can be one block from where the candidate is backed. This may be increased by a couple of blocks with Asynchronous Backing Spec & Tracking Issue #3779.
  2. The maximum delay until the enactment is defined by chain_availability_period or thread_availability_period which is also a couple of blocks.

The solution to this problem would be to start counting the delay from the enacting the candidate that scheduled the upgrade and not from the relay-parent of that candidate.

This should be changed/enabled only when every parachain follows the go-ahead signal, i.e. has this paritytech/cumulus#517 on board. This was released as part of Cumulus polkadot-v0.9.122.

Footnotes

  1. Historically, I think we started counting from the relay-parent because otherwise the parachain does not know at which relay-chain block expected_at will be. This was important when the upgrades were coordinated based on block numbers. With https://github.com/paritytech/polkadot/pull/3371 this is no longer important since the relay-chain in control when the upgrade should happen.

  2. In fact, https://github.com/paritytech/polkadot/pull/4457 initially came with this fix. However, it was then dropped since we discovered that there are parachains that do not support go-ahead signal and thus will be broken by this change.

@bkchr
Copy link
Member

bkchr commented Dec 24, 2021

The solution to this problem would be to start counting the delay from the upgrade enactment and not from the relay-parent of that candidate.

What is the upgrade enactment? Or better, what do you mean by this?

@pepyakin
Copy link
Contributor Author

Sorry, it is indeed confusing.

I meant there, that we should count the delay not from the relay-parent but rather from the block number when the candidate that signalled the upgrade was enacted.

@bkchr
Copy link
Member

bkchr commented Dec 25, 2021

Ahh okay. So this should also solve this problem: #4248?

@pepyakin
Copy link
Contributor Author

Not quite!

The main target of this issue is to fix the violation of the guarantee, that at the first usage of the new PVF, the chain can revert for validation_upgrade_delay 1 blocks back and still see the code. Right now, it's only possible to revert for like validation_upgrade_delay - X, where X is ≈10 after the upgrade.

Footnotes

  1. At least this number, because the first block can also be not at expected_at.

@bkchr
Copy link
Member

bkchr commented Jan 1, 2022

the problem is that if the relay-chain is reverted by validation_upgrade_delay we will return to the relay-parent, and the validation code will be only stored at the time the candidate is enacted.

Can you again describe the problem? I don't really understand it 🙈

@pepyakin
Copy link
Contributor Author

pepyakin commented Jan 1, 2022

Ok, let's try with an example of how it works now:

Let's assume that validation_upgrade_delay is 600 blocks and chain_availability_period is 10. There is a parachain candidate X created with the relay-parent 1337. This candidate is backed on the relay-chain on relay block 1338. At the block 1347 that candidate gains availability which leads to enacting that candidate. As part of enacting of it, we schedule the upgrade saving the new code into the storage. Assuming that the PVF pre-checking happens immediately (or equivalently is just disabled), expected_at will be equal to 1937. At a relay-chain block 1939 the candidate from the same parachain is produced with the relay-parent 1937 thus applying the upgrade. The next candidate Y for the same parachain enacted at the block 1941 turned out to be invalid. Then, due to unrelated reasons the relay-chain is reverted 600 blocks back, to the relay-chain block 1341. Because the candidate with relay-parent 1939 is invalid nodes will dispute it, however, they won't be able since the new code is not available.

The purpose of validation_upgrade_delay is to allow the relay-chain to revert for that many blocks and yet still have the new code in storage to prevent that. In our case here though at the block 1341 the candidate X has not yet gained availability and thus the new code is not available and thus the candidate Y cannot be validated.

This can be fixed if we started counting down the validation_upgrade_delay not from the relay-parent of the candidate that signals the upgrade but from the moment when that candidate is enacted and thus the code hit the storage.

@bkchr
Copy link
Member

bkchr commented Jan 2, 2022

Because the candidate with relay-parent 1939 is invalid nodes will dispute it, however, they won't be able since the new code is not available.

Why will we dispute this one? If the relay chain reverted 600 blocks, anything on the old chain can be discarded? We don't even need to validate it?

And is validation_upgrade_delay the maximum the relay chain can be reverted? If not, we could end in the same situation again?

@pepyakin
Copy link
Contributor Author

pepyakin commented Jan 3, 2022

Why will we dispute this one? If the relay chain reverted 600 blocks, anything on the old chain can be discarded? We don't even need to validate it?

That's correct. However, IIUC, we want to punish each time, otherwise it's a free try.

And is validation_upgrade_delay the maximum the relay chain can be reverted? If not, we could end in the same situation again?

Yeah, theoretically reverts can go past that. The idea of validation_upgrade_delay is to give a good safety padding not to provide an iron guarantee. The value is set very high though so it seems really unlikely we get there.

@bkchr
Copy link
Member

bkchr commented Jan 3, 2022

That's correct. However, IIUC, we want to punish each time, otherwise it's a free try.

That means by "invalid" you mean that someone from the primary backers issued an incorrect attestation?

Yeah, theoretically reverts can go past that.

Okay, so after this issue you will have the full validation_upgrade_delay instead of validation_upgrade_delay - 10 or 20. Sounds like we spent more time discussing here than the issue is worthwhile :D (sorry) But this also doesn't sound like a big problem or solves that much (as we theoretically can still run into this problem).

Nevertheless, counting from inclusion sounds more reasonable.

@pepyakin
Copy link
Contributor Author

pepyakin commented Jan 3, 2022

That means by "invalid" you mean that someone from the primary backers issued an incorrect attestation?

Yeah,

Okay, so after this issue you will have the full validation_upgrade_delay instead of validation_upgrade_delay - 10 or 20. Sounds like we spent more time discussing here than the issue is worthwhile :D (sorry) But this also doesn't sound like a big problem or solves that much (as we theoretically can still run into this problem).

Yeah exactly! It's not a big issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants