Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prospective-parachains rework: take II #4937

Merged
merged 64 commits into from
Aug 12, 2024

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Jul 3, 2024

Resolves #4800

Problem

In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores).

Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future).

Description

A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash).
This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded).
Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones.
But they still accept new ones during the seconding phase (until the backing quorum is reached).

Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one).

Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted.

Some implementation details

CandidateStorage is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability.

FragmentChains are now only creating a chain with backed candidates and the fork selection rule. As said before, FragmentChains are now also responsible for maintaining their own potential candidate storage.

Since we no longer have the subsytem-wide CandidateStorage, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope.

When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain).

I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates)

TODO:

  • update metrics
  • update docs and comments
  • fix and add unit tests
  • tested with fork-producing parachain
  • tested with cycle-producing parachain
  • versi test
  • prdoc

@alindima alindima added R0-silent Changes should not be mentioned in any release notes T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Jul 3, 2024
@alindima alindima requested a review from a team as a code owner July 3, 2024 14:42
@alindima alindima marked this pull request as draft July 3, 2024 14:42
@alindima
Copy link
Contributor Author

alindima commented Jul 9, 2024

This is ready for an initial review. CI zombienet tests pass and all of my local zombienet tests work well.
I'm working on adding unit tests and after that I'll test on versi.

In the meantime, it would be good to get some in-depth reviews. I'd appreciate very much reviews that focus on the protocol changes and the design rather than nitpicks about function names and structure (these can come at a later stage).

CC: @sandreim @eskimor @alexggh @tdimitrov

@tdimitrov
Copy link
Contributor

Great work, Alin! I did one pass focusing mainly on the logic and the PR looks good. I'll do another full pass when you are ready.

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass on the changes. Looks good in general, I could not find any logic errors wrt the introduced changes in behaviour. However I do appreciate the changes that go deeper into the original prospective parachain subsystem architecture. Now things are much better to understand.

I

pred(&elem.candidate_hash)
{
res.push(elem.candidate_hash);
for elem in &self.best_chain.chain[base_pos..actual_end_index] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this logic into find_acestor_paths and just return the remaining chain from there.

candidate_hash: CandidateHash,
candidate: CommittedCandidateReceipt,
persisted_validation_data: PersistedValidationData,
) -> Result<Self, CandidateEntryError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment should mention that if fails if 0 lenght cycle (ZeroLengthCycle)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple cases where this function could fail. I think it's a good practice to just have the user just look at the possible Error variants, rather than mentioning them in doc comments (which can easily become outdated)

}

res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem true, it will be dropped in handle_introduce_seconded_candidate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you referring to?

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have a closer look tomorrow, from a quick pass it seems that advance_scope does not actually advance scope, but instead the advancing logic still remains in handle_active_leaves_update. You mentioned that and there may be good reasons, but the very least the function name seems off then.

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Approving modulo the todo.

self.can_add_candidate_as_potential(candidate)?;

// This clone is cheap, as it uses an Arc for the expensive stuff.
// We can't consume the candidate because other fragment chains may use it also.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to leave that decision to the caller. If we need a CandidateEntry and not a &CandidateEntry, we should just state it so and leave it to the caller whether any cloning is required or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to not clone unless we have to. There's no need to clone if the self.can_add_candidate_as_potential(candidate)?; returns an Error

// form a chain with the latest included head.
fragment_chain.populate_chain(&mut candidates_pending_availability);

// TODO: return error if not all candidates were introduced successfully.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I chose to log a message instead

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

@alindima alindima added this pull request to the merge queue Aug 12, 2024
Merged via the queue into master with commit 0b52a2c Aug 12, 2024
169 of 172 checks passed
@alindima alindima deleted the alindima/prospective-parachains-allow-forks branch August 12, 2024 08:35
alindima added a commit that referenced this pull request Aug 13, 2024
Resolves #4800

In #4035, we removed
support for parachain forks and cycles and added support for backing
unconnected candidates (candidates for which we don't yet know the full
path to the latest included block), which is useful for elastic scaling
(parachains using multiple cores).

Removing support for backing forks turned out to be a bad idea, as there
are legitimate cases for a parachain to fork (if they have other
consensus mechanism for example, like BABE or PoW). This leads to
validators getting lower backing rewards (depending on whether they back
the winning fork or not) and a higher pressure on only the half of the
backing group (during availability-distribution for example). Since we
don't yet have approval voting rewards, backing rewards are a pretty big
deal (which may change in the future).

A backing group is now allowed to back forks. Once a candidate becomes
backed (has the minimum backing votes), we don't accept new forks unless
they adhere to the new fork selection rule (have a lower candidate
hash).
This helps with keeping the implementation simpler, since forks will
only be taken into account for candidates which are not backed yet (only
seconded).
Having this fork selection rule also helps with reducing the work
backing validators need to do, since they have a shared way of picking
the winning fork. Once they see a candidate backed, they can all decide
to back a fork and not accept new ones.
But they still accept new ones during the seconding phase (until the
backing quorum is reached).

Therefore, a block author which is not part of the backing group will
likely not even see the forks (only the winning one).

Just as before, a parachain producing forks will still not be able to
leverage elastic scaling but will still work with a single core. Also,
cycles are still not accepted.

`CandidateStorage` is no longer a subsystem-wide construct. It was
previously holding candidates from all relay chain forks and complicated
the code. Each fragment chain now holds their candidate chain and their
potential candidates. This should not increase the storage consumption
since the heavy candidate data is already wrapped in an Arc and shared.
It however allows for great simplifications and increase readability.

`FragmentChain`s are now only creating a chain with backed candidates
and the fork selection rule. As said before, `FragmentChain`s are now
also responsible for maintaining their own potential candidate storage.

Since we no longer have the subsytem-wide `CandidateStorage`, when
getting a new leaf update, we use the storage of our latest ancestor,
which may contain candidates seconded/backed that are still in scope.

When a candidate is backed, the fragment chains which hold it are
recreated (due to the fork selection rule, it could trigger a "reorg" of
the fragment chain).

I generally tried to simplify the subsystem and not introduce
unneccessary optimisations that would otherwise complicate the code and
not gain us much (fragment chains wouldn't realistically ever hold many
candidates)

TODO:
- [x] update metrics
- [x] update docs and comments
- [x] fix and add unit tests
- [x] tested with fork-producing parachain
- [x] tested with cycle-producing parachain
- [x] versi test
- [x] prdoc
EgorPopelyaev pushed a commit that referenced this pull request Aug 13, 2024
Backport #4937 on the
stable release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
Status: Audited
Development

Successfully merging this pull request may close these issues.

Poor PV performance (missed votes) with v1.12.0 and v1.13.0
4 participants