Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add a Statement Gossip Subsystem section to the implementors guide #1151

Closed
wants to merge 3 commits into from

Conversation

expenses
Copy link
Contributor

@expenses expenses commented May 26, 2020

Migrating from w3f/research#89.

Fixes #1142

@expenses expenses requested a review from rphmeier May 26, 2020 15:55
@expenses
Copy link
Contributor Author

expenses commented May 26, 2020

Can you add sections with information on:

  • Message types

  • Under which circumstances we will accept a message

  • Under which circumstances we will send a message to a peer

  • A note on how many chain heads are allowed to be sent (as this should be bounded)

Also, and this is beyond the scope of the section, but we should provide some context and describe the components that make up a gossip protocol. We're going to have a few different sections that each define gossip protocols, so having some common reference point for what we mean is important for readability.
https://github.com/w3f/research/blob/master/docs/polkadot/networking/0-overview.md is one link that you could reference, specifically the bounded-gossip section. It looks better in HackMD. The propagation pool section describes substrate gossip in an abstract way that we can reference or paraphrase here. I recommend doing that as part of this PR

@rphmeier What sort of content should go in the 'Gossip Protocols' section? I don't want to rewrite what's in the research documents so I think I'll just mention and link that we're using 'Bounded Gossip Protocols' and give a brief overview of a propagation pool.

@expenses
Copy link
Contributor Author

  • A note on how many chain heads are allowed to be sent (as this should be bounded)

I think you might have mentioned this on the call on friday, but what is this again?

@rphmeier
Copy link
Contributor

rphmeier commented May 26, 2020

I don't want to rewrite what's in the research documents so I think I'll just mention and link that we're using 'Bounded Gossip Protocols' and give a brief overview of a propagation pool

OK, that seems reasonable. I'd recommend summarizing topics and the components of gossip protocols for a) validating messages and b) deciding when to propagate messages to other peers, and then describe all gossip protocols in terms of the messages they are in charge of, their neighbor packet description, and their implementations of those functionalities. More or less detail is fine, up to your judgement, as long as the guide doesn't leave ambiguity.

For chain heads, we've currently used 5 as a maximum, although in practice it will not be more than 2 except in bizarre circumstances.

@rphmeier
Copy link
Contributor

cc @infinity0

What's your plan with these old networking documents? Can we expect the links to remain stable?

@rphmeier rphmeier added the A0-please_review Pull request needs code review. label May 26, 2020
@infinity0
Copy link
Contributor

@rphmeier I recently (~2 weeks ago) restructured some of the pages, so any links from before that would already have been broken. I'm not planning another restructure for a while, but I do intend to revamp the content at some point.

@@ -1030,6 +1031,7 @@ The following conditions need to be met for an incoming statement to be accepted

* The sender of the statement has to be in the validator set for the relay block.
* If the statement is `Valid` or `Invalid`, then the candidate that the hash refers to must be known.
* As only `MAX_CHAIN_HEADS` (currently set to 5) competing parachain heads are accepted at a time, if the statement is `Seconded`, then there must be less than `MAX_CHAIN_HEADS` other candidate receipts for the parachain block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then there must be less than MAX_CHAIN_HEADS other candidate receipts for the parachain block.

This is not accurate - max chain heads is about how many relay chain heads a particular peer is allowed to say they are interested in, not about how many candidates there are for any specific relay chain head.


When the node wants to gossip a message:
* The message is put into the propagation pool and stays there until cleaned up.
* The message is sent out to the nodes that it is currently connected to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this is not accurate and does not capture all of the aspects of bounded gossip protocols. We only send out messages to peers that pass the filtration criterion for that peer. And then later on we may "unlock" those messages once we are aware of the filtration criterion for that peer changing.

* The message is sent out to the nodes that it is currently connected to.

When a new node connects to our node:
* All messages in the propagation pool are sent to it, and we receive all of its propagation pool messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - we only send messages to a peer that pass the filtration-criterion allowed_k. At this level of detail, we can paraphrase that as a function is_message_allowed(peer, msg) which gossip protocols need to implement


#### Message Types

This subsystem communicates with the overseer via [two message types](#Candidate-Backing-Subsystem-Messages), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This subsystem communicates with the overseer via [two message types](#Candidate-Backing-Subsystem-Messages), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received.
This subsystem communicates with the overseer via [two message types](#candidate-backing-subsystem-message), one for overseer -> subsystem and another for subsystem -> overseer. Currently this just differentiates between statements that the subsystem has been instructed to gossip out, and statements that the subsystem has received.

Did you meant to reference the singular?


#### Functionality

The subsystem needs to contain a handle to a gossiping engine to gossip and recieve messages. This will be cloned into each job that is spawned. Apart from that, it also contains the general structures that all subsystems contain, e.g. channels to communicate with the overseer and handles to spawned jobs in order to shut them down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be cloned into each job that is spawned.

Why clone the GossipEngine for each job? GossipEngine does not implement Clone by design, thus you would need to wrap it in an Arc<Mutex<>>. The statement gossip subsystem will run as a future, correct? If so wrapping a GossipEngine in a Mutex will prevent you from using async/await as (1) it is very difficult to make sure the lock is only held for short amount of time and definitely not while being preempted / sleeping and (2) a MutexGuard is not Send which would be required for async/await.

Does the above make sense? I am happy to expand further on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, it seems reasonable for the gossip engine to have a single owner (the subsystem). A job for this subsystem would only entail stringing together gossip_messages_for(specific_parent_hash_topic) and sending a message to the other subsystem.


On `OverseerSignal::StartWork` it should:
* Spawn a job and pass in `relay_parent`, a clone of the gossiping engine and a handle to a message validator.
* Send out a neighbour packet with the `relay_parent` added to the list of accepted chain heads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you agree with me that the neighbor packet should be sent via the GossipEngine and not via the Validator, would you mind adding a note here?

If there is no agreement I can expand on why this would help the architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to be aware of is that the validator needs to be aware of the most recent neighbor packets we have sent to peers. I suppose we just need to be careful when the GossipEngine is polled when doing that update.

On `StatementGossipSubsystemMessageIn::StatementToGossip` it should:
* Send the signed statement to the job running for the `relay_parent`.

The statement gossip job needs to poll two seperate futures (as well as the exit signal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the statement gossip job a pure state machine that gets events as input and outputs commands to either be forwarded to the GossipEngine or to the Overseer?

Thus business logic (Polkadot specific) would stay pure within a single state machine and I/O logic would stay within the Subsystem.

* As only `MAX_CHAIN_HEADS` (currently set to 5) competing parachain heads are accepted at a time, if the statement is `Seconded`, then there must be less than `MAX_CHAIN_HEADS` other candidate receipts for the parachain block.
* The statement signature must be valid.

I have a basic implementation of this code on the [`ashley-test-statement-gossip-subsystem`](https://github.com/paritytech/polkadot/tree/ashley-test-statement-gossip-subsystem) branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you would like to have input on this, would you mind opening up a pull request? That would make discussions a lot easier.

@rphmeier rphmeier closed this Jun 6, 2020
@expenses expenses deleted the ashley-statement-gossip branch July 10, 2020 12:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guide: Statement Gossip Subystem
4 participants