Skip to content

Commit

Permalink
Fix kusama validators getting 0 backing rewards the first session the…
Browse files Browse the repository at this point in the history
…y enter the active set (paritytech#3722)

There is a problem in the way we update `authorithy-discovery` next keys
and because of that nodes that enter the active set would be noticed at
the start of the session they become active, instead of the start of the
previous session as it was intended. This is problematic because:

1. The node itself advertises its addresses on the DHT only when it
notices it should become active on around ~10m loop, so in this case it
would notice after it becomes active.
2. The other nodes won't be able to detect the new nodes addresses at
the beginning of the session, so it won't added them to the reserved
set.

With 1 + 2, we end-up in a situation where the the new node won't be
able to properly connect to its peers because it won't be in its peers
reserved set. Now, the nodes accept by default`MIN_GOSSIP_PEERS: usize =
25` connections to nodes that are not in the reserved set, but given
Kusama size(> 1000 nodes) you could easily have more than`25` new nodes
entering the active set or simply the nodes don't have slots anymore
because, they already have connections to peers not in the active set.

In the end what the node would notice is 0 backing rewards because it
wasn't directly connected to the peers in its backing group.

## Root-cause

The flow is like this:
1. At BAD_SESSION - 1, in `rotate_session` new nodes are added to
QueuedKeys
https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/session/src/lib.rs#L609
```
 <QueuedKeys<T>>::put(queued_amalgamated.clone());
<QueuedChanged<T>>::put(next_changed);
```
2. AuthorityDiscovery::on_new_session is called with `changed` being the
value of `<QueuedChanged<T>>:` at BAD_SESSION - **2** because it was
saved before being updated
https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/session/src/lib.rs#L613
3. At BAD_SESSION - 1, `AuthorityDiscovery::on_new_session` doesn't
updated its next_keys because `changed` was false.
4. For the entire durations of `BAD_SESSION - 1` everyone calling
runtime api `authorities`(should return past, present and future
authorities) won't discover the nodes that should become active .
5. At the beginning of BAD_SESSION, all nodes discover the new nodes are
authorities, but it is already too late because reserved_nodes are
updated only at the beginning of the session by the `gossip-support`.
See above why this bad.

## Fix
Update next keys with the queued_validators at every session, not matter
the value of `changed` this is the same way babe pallet correctly does
it.
https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/babe/src/lib.rs#L655

## Notes

- The issue doesn't reproduce with proof-authorities changes like
`versi` because `changed` would always be true and `AuthorityDiscovery`
correctly updates its next_keys every time.
- Confirmed at session `37651` on kusama that this is exactly what it
happens by looking at blocks with polkadot.js.

## TODO
- [ ] Move versi on proof of stake and properly test before and after
fix to confirm there is no other issue.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
  • Loading branch information
2 people authored and dharjeezy committed Mar 24, 2024
1 parent 933d857 commit 1d64fe1
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 23 deletions.
13 changes: 13 additions & 0 deletions prdoc/pr_3722.prdoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Schema: Polkadot SDK PRDoc Schema (prdoc) v1.0.0
# See doc at https://raw.githubusercontent.com/paritytech/polkadot-sdk/master/prdoc/schema_user.json

title: Fix kusama 0 backing rewards when entering active set

doc:
- audience: Runtime Dev
description: |
This PR fixes getting 0 backing rewards the first session when
a node enters the active set.

crates:
- name: pallet-authority-discovery
66 changes: 44 additions & 22 deletions substrate/frame/authority-discovery/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -144,19 +144,21 @@ impl<T: Config> OneSessionHandler<T::AccountId> for Pallet<T> {
);

Keys::<T>::put(bounded_keys);
}

let next_keys = queued_validators.map(|x| x.1).collect::<Vec<_>>();
// `changed` represents if queued_validators changed in the previous session not in the
// current one.
let next_keys = queued_validators.map(|x| x.1).collect::<Vec<_>>();

let next_bounded_keys = WeakBoundedVec::<_, T::MaxAuthorities>::force_from(
next_keys,
Some(
"Warning: The session has more queued validators than expected. \
A runtime configuration adjustment may be needed.",
),
);
let next_bounded_keys = WeakBoundedVec::<_, T::MaxAuthorities>::force_from(
next_keys,
Some(
"Warning: The session has more queued validators than expected. \
A runtime configuration adjustment may be needed.",
),
);

NextKeys::<T>::put(next_bounded_keys);
}
NextKeys::<T>::put(next_bounded_keys);
}

fn on_disabled(_i: u32) {
Expand Down Expand Up @@ -270,7 +272,7 @@ mod tests {
.map(|id| (&account_id, id))
.collect::<Vec<(&AuthorityId, AuthorityId)>>();

let mut third_authorities: Vec<AuthorityId> = vec![4, 5]
let third_authorities: Vec<AuthorityId> = vec![4, 5]
.into_iter()
.map(|i| AuthorityPair::from_seed_slice(vec![i; 32].as_ref()).unwrap().public())
.map(AuthorityId::from)
Expand All @@ -282,6 +284,18 @@ mod tests {
.map(|id| (&account_id, id))
.collect::<Vec<(&AuthorityId, AuthorityId)>>();

let mut fourth_authorities: Vec<AuthorityId> = vec![6, 7]
.into_iter()
.map(|i| AuthorityPair::from_seed_slice(vec![i; 32].as_ref()).unwrap().public())
.map(AuthorityId::from)
.collect();
// Needed for `pallet_session::OneSessionHandler::on_new_session`.
let fourth_authorities_and_account_ids = fourth_authorities
.clone()
.into_iter()
.map(|id| (&account_id, id))
.collect::<Vec<(&AuthorityId, AuthorityId)>>();

// Build genesis.
let mut t = frame_system::GenesisConfig::<Test>::default().build_storage().unwrap();

Expand Down Expand Up @@ -310,25 +324,33 @@ mod tests {
third_authorities_and_account_ids.clone().into_iter(),
);
let authorities_returned = AuthorityDiscovery::authorities();
let mut first_and_third_authorities = first_authorities
.iter()
.chain(third_authorities.iter())
.cloned()
.collect::<Vec<AuthorityId>>();
first_and_third_authorities.sort();

assert_eq!(
first_authorities, authorities_returned,
first_and_third_authorities, authorities_returned,
"Expected authority set not to change as `changed` was set to false.",
);

// When `changed` set to true, the authority set should be updated.
AuthorityDiscovery::on_new_session(
true,
second_authorities_and_account_ids.into_iter(),
third_authorities_and_account_ids.clone().into_iter(),
third_authorities_and_account_ids.into_iter(),
fourth_authorities_and_account_ids.clone().into_iter(),
);
let mut second_and_third_authorities = second_authorities

let mut third_and_fourth_authorities = third_authorities
.iter()
.chain(third_authorities.iter())
.chain(fourth_authorities.iter())
.cloned()
.collect::<Vec<AuthorityId>>();
second_and_third_authorities.sort();
third_and_fourth_authorities.sort();
assert_eq!(
second_and_third_authorities,
third_and_fourth_authorities,
AuthorityDiscovery::authorities(),
"Expected authority set to contain both the authorities of the new as well as the \
next session."
Expand All @@ -337,12 +359,12 @@ mod tests {
// With overlapping authority sets, `authorities()` should return a deduplicated set.
AuthorityDiscovery::on_new_session(
true,
third_authorities_and_account_ids.clone().into_iter(),
third_authorities_and_account_ids.clone().into_iter(),
fourth_authorities_and_account_ids.clone().into_iter(),
fourth_authorities_and_account_ids.clone().into_iter(),
);
third_authorities.sort();
fourth_authorities.sort();
assert_eq!(
third_authorities,
fourth_authorities,
AuthorityDiscovery::authorities(),
"Expected authority set to be deduplicated."
);
Expand Down
6 changes: 5 additions & 1 deletion substrate/frame/session/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,11 @@ pub trait SessionHandler<ValidatorId> {
/// before initialization of your pallet.
///
/// `changed` is true whenever any of the session keys or underlying economic
/// identities or weightings behind those keys has changed.
/// identities or weightings behind `validators` keys has changed. `queued_validators`
/// could change without `validators` changing. Example of possible sequent calls:
/// Session N: on_new_session(false, unchanged_validators, unchanged_queued_validators)
/// Session N + 1: on_new_session(false, unchanged_validators, new_queued_validators)
/// Session N + 2: on_new_session(true, new_queued_validators, new_queued_validators)
fn on_new_session<Ks: OpaqueKeys>(
changed: bool,
validators: &[(ValidatorId, Ks)],
Expand Down

0 comments on commit 1d64fe1

Please sign in to comment.