Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove im-online pallet from Kusama and Polkadot runtimes #178

Merged
merged 26 commits into from
Feb 29, 2024

Conversation

s0me0ne-unkn0wn
Copy link
Contributor

Summary

This PR aims to remove im-online pallet, its session keys, and its on-chain storage from both Kusama and Polkadot relay chain runtimes, thus giving up liveness slashing.

Motivation

  • Missing out on rewards because of being offline is enough disincentive for validators. Slashing them for being offline is redundant.
  • Disabling liveness slashing is a prerequisite for validator disabling.

See also

paritytech/polkadot-sdk#1964
paritytech/polkadot-sdk#784

@s0me0ne-unkn0wn s0me0ne-unkn0wn marked this pull request as ready for review February 21, 2024 21:04
@s0me0ne-unkn0wn
Copy link
Contributor Author

@bkchr @ordian I think it's ready for review now. try-runtime is still failing in CI, but that'll be fixed with paritytech/try-runtime-cli#79. I've tested locally using a state from a block not containing any im-online events, and it passes OK.

@bkchr WDYT about adding offchain storage cleanup into the same PR? Seems unsound to me, as some previous offchain workers may still be there, but I'm not sure if there's any synchronization of their access to the offchain db. Could it make sense?

relay/polkadot/src/lib.rs Outdated Show resolved Hide resolved
relay/polkadot/src/lib.rs Outdated Show resolved Hide resolved
relay/polkadot/src/lib.rs Show resolved Hide resolved
@bkchr
Copy link
Contributor

bkchr commented Feb 21, 2024

@bkchr WDYT about adding offchain storage cleanup into the same PR? Seems unsound to me, as some previous offchain workers may still be there, but I'm not sure if there's any synchronization of their access to the offchain db. Could it make sense?

Yeah good question, probably better to do this the next release.

relay/polkadot/src/lib.rs Outdated Show resolved Hide resolved
relay/polkadot/src/lib.rs Outdated Show resolved Hide resolved
/// Upgrade Session keys to exclude `ImOnline` key.
/// When this is removed, should also remove `OldSessionKeys`.
pub struct UpgradeSessionKeys;
const UPGRADE_SESSION_KEYS_FROM_SPEC: u32 = 1001002;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's hope we include it in the next release 🤞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep an eye on it. If it's not in the next release somehow, I'll update the version.

@s0me0ne-unkn0wn
Copy link
Contributor Author

@bkchr okay, I bumped the Kusama spec version in this PR to let it pass the idempotent try-runtime check, and now the only thing failing is the encointer runtime migrations that IIUC have nothing to do with this PR. So it's good to go, but I'm not sure how to sort out those encointer problems, and that CI job is required 🤷‍♂️

@bkontur
Copy link
Contributor

bkontur commented Feb 29, 2024

@bkchr okay, I bumped the Kusama spec version in this PR to let it pass the idempotent try-runtime check, and now the only thing failing is the encointer runtime migrations that IIUC have nothing to do with this PR. So it's good to go, but I'm not sure how to sort out those encointer problems, and that CI job is required 🤷‍♂️

@s0me0ne-unkn0wn Encointer is known issue, please check comment: #159 (comment)

@bkchr I think we should temporary skip check-migrations (Encointer) as required, because until it is enacted, you will need to use superpower for every PR, wdyt?

relay/kusama/src/lib.rs Outdated Show resolved Hide resolved
@bkchr
Copy link
Contributor

bkchr commented Feb 29, 2024

Yes, I also already thought of disabling the try runtime check for encointer. Can you please make this happen?

@bkchr
Copy link
Contributor

bkchr commented Feb 29, 2024

/merge

@fellowship-merge-bot
Copy link
Contributor

Enabled auto-merge in Pull Request

Available commands
  • /merge: Enables auto-merge for Pull Request
  • /merge cancel: Cancels auto-merge for Pull Request
  • /merge help: Shows this menu

For more information see the documentation

@bkchr
Copy link
Contributor

bkchr commented Feb 29, 2024

/merge cancel

@bkchr bkchr merged commit f1fb358 into polkadot-fellows:main Feb 29, 2024
29 of 30 checks passed
@fellowship-merge-bot
Copy link
Contributor

There was a problem running the action.

❌😵❌

Please find more information in the logs.

@bkchr
Copy link
Contributor

bkchr commented Mar 4, 2024

@s0me0ne-unkn0wn I have thought again about this pr, in the context of: paritytech/polkadot-sdk#64

I think we need to revert this PR for now or do some other manual work. The problem being that when we call candidate_events it will iterate over the events. Now, when we remove the im-online, they would be undecodable in the block that enacted the runtime upgrade. im-online is sending these events in on-initialize (aka before the parachain inherent is executed).

So, the problem could be that we miss backed candidates (not such a big issue I think). However when we miss the candidate included events we could run into a finality stall until the training wheels force approve the candidates. Given this, we should fix this.

A possible fix could be that we filter out these broken events in candidate_events before we do the actual call to the implementation.

@s0me0ne-unkn0wn
Copy link
Contributor Author

@bkchr Hmmm 🤔
Any ideas why didn't we hit it on Rococo and Westend?
Anyway, I'll research that and will try to push on the fix to make it into the current release.

@bkchr
Copy link
Contributor

bkchr commented Mar 4, 2024

You need to be "lucky". The problem only appears when the block we enact the runtime upgrade, also contains one of these events. As they don't appear that often, triggering this bug is quite unlikely, but it can happen. At least from my understanding. I didn't looked that deep into the approval voting and if this is maybe also triggered in a different way than using the candidate events it should be fine as is.

@s0me0ne-unkn0wn
Copy link
Contributor Author

im-online is sending these events in on-initialize (aka before the parachain inherent is executed).

Looking at that closer, the code you're pointing to is executed in on_before_session_ending, that is, not in every on_initialize but at the session boundary only. As we're scheduling a new runtime enactment at a specific block, IIUC, it wouldn't be a big problem for us to schedule it in such a way that it's guaranteed to miss the session boundary. Does that make sense?

@bkchr
Copy link
Contributor

bkchr commented Mar 6, 2024

We don't schedule the upgrade on a specific block. It depends on OpenGov voting etc when the upgrade is enacted.

@s0me0ne-unkn0wn
Copy link
Contributor Author

We don't schedule the upgrade on a specific block

Then I probably don't fully understand what's going on here: https://polkadot.polkassembly.io/referenda/458

@bkchr
Copy link
Contributor

bkchr commented Mar 11, 2024

You got me :P I also didn't know of this :D

However, if a block number is a session change or not is still something that is not 100% predictable. Sessions end based on time. Thus, skipped blocks let a session end at an earlier block number.

fellowship-merge-bot bot pushed a commit that referenced this pull request Mar 15, 2024
This is a follow-up to #178, which aims to keep the `im-online` pallet
events decodable after the pallet itself is removed.

As
[discussed](#178 (comment)),
the inability to decode events may result in missing a candidate in
approval voting and a parachain finality stall. Although the
circumstances of such an event are unlikely, it's better to safeguard
them.

The transient code introduced in this PR should be removed after the
upgrade is enacted.

- [x] Does not require a CHANGELOG entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants