Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Rework dispute-coordinator to use RuntimeInfo for obtaining session information instead of RollingSessionWindow #6968

Merged
merged 42 commits into from
Apr 24, 2023

Conversation

tdimitrov
Copy link
Contributor

@tdimitrov tdimitrov commented Mar 28, 2023

Part of #6812

The PR contains two notable changes:

  1. Remove RollingSessionInfo usage from dispute-coordinator
  2. Simplify disputes spam slots pruning logic - now they are cleared after a 6 session window passes. The window is no longer extended up to the session whereh the last finalized block was produced.

To use RuntimeInfo instead of RollingSessionWindow two problems has to be solved:

  1. Each RuntimeInfo call needs a sender and a parent_hash for querying the runtime. The first one seems to be always available. The situation with the second one is more complicated.
  2. RollingSessionWindow provides two methods - earliest_sesson and latest_session. These are the first and the last session cached. RuntimeInfo hasn't got such concept so it needs to be emulated (or dispute-coordinator to be reworked) somehow.

Problem 1 is mainly caused by the way dispute-coordinator is initialized. It gets the first active leaf as an Option and two Vecs with scraped onchain votes and participations. On further investigation I realised that these parameters are used only on startup and then they are empty. The Vecs are drain-ed and the Option is takeen. So if the subsytem crashes and gets restarted - it starts up with empty data.

For this reason I introduce struct InitialData containing all three of them:

pub struct InitialData {
	pub participations: Vec<(ParticipationPriority, ParticipationRequest)>,
	pub votes: Vec<ScrapedOnChainVotes>,
	pub leaf: ActivatedLeaf,
}

It is passed to the dispute-coordinator via Option. So if there is some initialization data (which requires runtime calls) we have got the leaf used to fetch it and we can use it. If there is no initialization data - we are good.

Problem 2 is more straightforward. dispute-coordinator will keep track of the sessions by itself. I made one simplification here which I'm not sure is correct (see notable change 2).

A few words about caching, which I have misunderstood initially.
RuntimeInfo has its own cache for SessionInfo (and doesn't solely rely on the ruintime-api subsystem cache for runtime calls as I wrongly assumed). This has got a nice side effect. For example in this call:

self.get_session_info_by_index(sender, relay_parent, session_index).await

If the session with index session_index is in cache the relay_parent parameter doesn't matter as no runtime call is made.
This is an improvement over the runtime-api cache where the results for different relay parents are treated as separate keys.

TODOs:

  • Fix tests
  • Burn-in on versi

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. We can maximize robustness (left comments) by using different relay_parents in pre-filling and actual use where possible. This way if at least one of them works, we are good, which means maximum robustness against errors. (Pruning, migration problems, ..)

node/core/dispute-coordinator/src/import.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/initialized.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/initialized.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/initialized.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/lib.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/lib.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@BradleyOlson64 BradleyOlson64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Getting rid of the rolling session window demystifies session logic a lot! Much appreciated.

@eskimor
Copy link
Member

eskimor commented Mar 30, 2023

This is an improvement over the runtime-api cache where the results for different relay parents are treated as separate keys.

This is not true. runtime-api cache caches session info also only by session index. So you have a valid point, that making the LRU size 6 is not strictly necessary. I think it is a good idea regardless, as we have better control and more importantly a guarantee*) that the last 6 sessions are indeed cached and not already pruned.

*) In the absence of errors.

@tdimitrov
Copy link
Contributor Author

This is not true. runtime-api cache caches session info also only by session index.

I haven't noticed that SessionInfo is handled more specially than the rest of the calls in runtime-api subsystem. Lesson learned: Runtime calls are treated differently and making assumptions is a mistake.

think it is a good idea regardless, as we have better control and more importantly a guarantee*) that the last 6 sessions are indeed cached and not already pruned.

I agree, I've modified it already.

@ordian ordian self-requested a review April 12, 2023 09:20
@ordian
Copy link
Member

ordian commented Apr 12, 2023

This is an improvement over the runtime-api cache where the results for different relay parents are treated as separate keys.

If you look at runtime-api cache for session info, it actually ignores relay parent:

SessionInfo(_relay_parent, session_index, info) =>

@tdimitrov
Copy link
Contributor Author

This is an improvement over the runtime-api cache where the results for different relay parents are treated as separate keys.

If you look at runtime-api cache for session info, it actually ignores relay parent:

SessionInfo(_relay_parent, session_index, info) =>

That's true. I think @eskimor mentioned that somewhere but I had forgotten to edit the description.

Comment on lines 312 to 318
match session_idx {
Ok(session_idx)
if self.last_consecutive_cached_session.is_none() ||
session_idx >
self.last_consecutive_cached_session.expect(
"The first clause explicitly handles `None` case. qed.",
) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match session_idx {
Ok(session_idx)
if self.last_consecutive_cached_session.is_none() ||
session_idx >
self.last_consecutive_cached_session.expect(
"The first clause explicitly handles `None` case. qed.",
) =>
let should_cache_session = |session_idx: &SessionIndex| {
self.last_consecutive_cached_session.is_none() ||
session_idx >
&self
.last_consecutive_cached_session
.expect("The first clause explicitly handles `None` case. qed.")
};
match session_idx {
Ok(session_idx) if should_cache_session(&session_idx) => {

Is it more readable to extract the check as a closure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, no :)

@tdimitrov tdimitrov requested a review from ordian April 12, 2023 11:20
node/core/dispute-coordinator/src/lib.rs Show resolved Hide resolved
node/core/dispute-coordinator/src/initialized.rs Outdated Show resolved Hide resolved
Comment on lines 312 to 318
match session_idx {
Ok(session_idx)
if self.last_consecutive_cached_session.is_none() ||
session_idx >
self.last_consecutive_cached_session.expect(
"The first clause explicitly handles `None` case. qed.",
) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, no :)

node/core/dispute-coordinator/src/initialized.rs Outdated Show resolved Hide resolved
gap_in_cache = true;
}

if !gap_in_cache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does last_consecutive_cached_session stand for exactly ? For example if we have this this situation: [ fail ok, ok, ok, ok] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the last DISPUTE_WINDOW sessions cached. IF we have to cache sessions 1..5 and for some reason 3 fails, on next ActiveLeaves update we want to retry fetching it and start caching from 3. So in this case last_consecutive_cached_session will be set to 2.
Regarding your question: [ fail ok, ok, ok, ok]. If the failed session is with index X, then last_consecutive_cached_session will be set to x-1.

node/core/dispute-coordinator/src/initialized.rs Outdated Show resolved Hide resolved
@tdimitrov
Copy link
Contributor Author

I'll do one final burn in next week and merge.

@eskimor
Copy link
Member

eskimor commented Apr 24, 2023

@tdimitrov can we merge? I will likely have to touch some dispute-coordinator code soon, would be good to have this merged first.

@tdimitrov
Copy link
Contributor Author

I wanted to do one more burnin but the new changes are small - I'd say it's safe to merge.
I'll fix the pipeline and we can merge.

@eskimor
Copy link
Member

eskimor commented Apr 24, 2023

bot merge

@paritytech-processbot paritytech-processbot bot merged commit c9f89da into master Apr 24, 2023
@paritytech-processbot paritytech-processbot bot deleted the tsv-rolling-session-dispute-coord branch April 24, 2023 12:27
ordian added a commit that referenced this pull request Apr 26, 2023
* master:
  malus: dont panic on missing validation data (#6952)
  Offences Migration v1: Removes `ReportsByKindIndex` (#7114)
  Fix stalling dispute coordinator. (#7125)
  Fix rolling session window (#7126)
  [ci] Update buildah command and version (#7128)
  Bump assigned_slots params (#6991)
  XCM: Remote account converter (#6662)
  Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968)
  Revert default proof size back to 64 KB (#7115)
ordian added a commit that referenced this pull request Apr 26, 2023
* master: (39 commits)
  malus: dont panic on missing validation data (#6952)
  Offences Migration v1: Removes `ReportsByKindIndex` (#7114)
  Fix stalling dispute coordinator. (#7125)
  Fix rolling session window (#7126)
  [ci] Update buildah command and version (#7128)
  Bump assigned_slots params (#6991)
  XCM: Remote account converter (#6662)
  Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968)
  Revert default proof size back to 64 KB (#7115)
  update rocksdb to 0.20.1 (#7113)
  Reduce base proof size weight component to zero (#7081)
  PVF: Move PVF workers into separate crate (#7101)
  Companion for #13923 (#7111)
  update safe call filter (#7080)
  PVF: Don't dispute on missing artifact (#7011)
  XCM: Properly set the pricing for the DMP router (#6843)
  pvf: Update docs for PVF artifacts (#6551)
  Bump syn from 2.0.14 to 2.0.15 (#7093)
  Companion for substrate#13771 (#6983)
  Added Dwellir Nigeria bootnodes. (#7097)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants