Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

NetworkBridge: validator (authorities) discovery api #1699

Merged
merged 74 commits into from
Oct 6, 2020

Conversation

ordian
Copy link
Member

@ordian ordian commented Sep 10, 2020

Part of #1461 (only current session for now).

My suggestion is to limit NetworkBridge to AuthorityIds only and to not track the ValidatorId -> AuthorityId mapping (already implemented as runtime API call validator_discovery), and ask a caller to do that.
This logic could be extracted somewhere to be reused by different subsystems. This separates concerns better IMHO especially when dealing with multiple sessions.

Unresolved:

Todo:

  • Update collation protocol to use ConnectToAuthorities
  • Handle multiple in-flight requests
  • API for revoking discovery requests
  • Tests

@ordian ordian added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Sep 10, 2020
node/network/bridge/src/lib.rs Outdated Show resolved Hide resolved
node/subsystem/src/messages.rs Outdated Show resolved Hide resolved
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
where
T: initializer::Trait + pallet_authority_discovery::Trait,
{
// FIXME: the mapping might be invalid if a session change happens in between the calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a SessionInfo module that will alleviate this in #1691

montekki and others added 4 commits September 10, 2020 16:42
* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* master:
  Allow the watermark to always land on the relay parent (#1689)
  Limit the maximum size of a downward message (#1690)
  Add deb and RPM repository config and documentation (#1676)
  Collator protocol subsystem (#1659)
@ordian
Copy link
Member Author

ordian commented Sep 11, 2020

I implemented multiple in-flight requests in b80e050, but handling leakage is tricky. If we have another message for revoking, since validator sets of different requests may overlap (or is it not true?), what happens is one subsystem revokes an id another is waiting on?

Another unresolved question is what to do if a peer we are waiting for was connected but then disconnected. Should we add it back to pending list or just return a list w/o this peer? Currently the latter is implemented.

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
use polkadot_node_network_protocol::PeerId;
use polkadot_primitives::v1::{AuthorityDiscoveryId, Block, Hash};

const PRIORITY_GROUP: &'static str = "parachain_validators";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should each protocol (collation, validation) have its own priority group?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good question, I'm not sure about this.
From the NetworkBridge perspective there is no difference. Or rather, what would be the benefit of separating the two?

@@ -389,6 +403,11 @@ pub enum RuntimeApiRequest {
/// Get all events concerning candidates (backing, inclusion, time-out) in the parent of
/// the block in whose state this request is executed.
CandidateEvents(RuntimeApiSender<Vec<CandidateEvent>>),
/// Get the `AuthorityDiscoveryId`s corresponding to the given `ValidatorId`s.
/// Currently this request is limited to validators in the current session.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then let's ignore the session index elsewhere for now as is currently done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what I'm worried about though is breaking changes. It's not clear how to make the API future-proof though.

Copy link
Contributor

@rphmeier rphmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approach looks good. Are there associated guide changes?

primitives/src/v1.rs Outdated Show resolved Hide resolved
@ordian
Copy link
Member Author

ordian commented Sep 29, 2020

Not sure if we want to merge it now or wait till 0.8.25 branches out.
Also does it need burnin?

@rphmeier
Copy link
Contributor

I don't think this needs any burn-in. I'm not too worried about breaking changes before any point where we deploy this on Kusama or Polkadot.

@ordian
Copy link
Member Author

ordian commented Sep 30, 2020

To clarify my worry:
if we merge the PR now, it will get deployed on Kusama and Polkadot in 0.8.25 (if I understand correctly)
this PR introduces a runtime API call validator_discovery that doesn't take SessionIndex into account, so it will be changed sometime in the future anyway
if we don't merge it now, there is a slight chance that we will fix the API before the next release.

@rphmeier
Copy link
Contributor

rphmeier commented Oct 1, 2020

if we merge the PR now, it will get deployed on Kusama and Polkadot in 0.8.25 (if I understand correctly)
this PR introduces a runtime API call validator_discovery that doesn't take SessionIndex into account, so it will be changed sometime in the future anyway

There is a dummy implementation of the v1 runtime APIs including this new set of calls on Polkadot and Kusama. I figure that's OK as long as the node-side is resilient to runtime API errors, so it can author blocks even if a runtime API call fails. Runtime API call failure is really the only type of failure we have to worry about, as long as the dummy API is OK for authoring under the overseer. It would happen because we'd update the runtime API definition to be different than what the runtime actually exposes.


// this method takes `network_service` and `authority_discovery_service` by value
// and returns them as a workaround for the Future: Send requirement imposed by async fn impl.
pub async fn on_request(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some docs what this is doing?

* master:
  Make collation an optional return (#1787)
  XCM: Land xcm-handler and xcm-executor (#1771)
  v0.8.25 (#1785)
  add two node local net script (#1781)
  Adjust max nominators down to 128 (from 256) (#1782)
  Companion for substrate/pull/7215 (#1768)
  Remove Stale Upgrades (#1780)
  Update Polkadot Weights for Substrate 2.0 (#1761)
  Parachains v1 registrar module. (#1559)
  Derive `From` for `AllMessages` and simplify `send_msg` (#1774)
  implement remaining subsystem metrics (#1770)
  Companion for paritytech/substrate#7236 (#1773)
  WIP: remove deprecated only/except clauses, build is now manual on PRs (#1769)
  Increase Westend `spec_version` (#1766)
  move Metrics to utils (#1765)
@ordian ordian merged commit b84f3c0 into master Oct 6, 2020
@ordian ordian deleted the ao-validator-discovery-api branch October 6, 2020 11:34
ordian added a commit that referenced this pull request Oct 6, 2020
* master:
  NetworkBridge: validator (authorities) discovery api (#1699)
  Registrar v1 follow-ups (#1786)
  Make collation an optional return (#1787)
  XCM: Land xcm-handler and xcm-executor (#1771)
  v0.8.25 (#1785)
  add two node local net script (#1781)
  Adjust max nominators down to 128 (from 256) (#1782)
  Companion for substrate/pull/7215 (#1768)
  Remove Stale Upgrades (#1780)
  Update Polkadot Weights for Substrate 2.0 (#1761)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants