Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Approval Distribution Subsystem #1951

Merged
merged 19 commits into from
Nov 25, 2020
Merged

Approval Distribution Subsystem #1951

merged 19 commits into from
Nov 25, 2020

Conversation

rphmeier
Copy link
Contributor

Closes #1603

@rphmeier rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 13, 2020
@rphmeier rphmeier added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Nov 18, 2020
@rphmeier rphmeier marked this pull request as ready for review November 18, 2020 02:08
1. Is a particular assignment relevant under a given `View`?
2. Is a particular approval relevant to any assignment in a set?

These two queries need not be perfect, but they must never yield false positives. For our own local view, they must not yield false negatives. When applied to our peers' views, it is acceptable for them to yield false negatives. The reason for that is that our peers' views may be beyond ours, and we are not capable of fully evaluating them. Once we have caught up, we can check again for false negatives to continue distributing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it is acceptable for our conclusions to yield false negatives"

* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation boost and return. Note that we must do this after checking for out-of-view to avoid being spammed. If we did this check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* Dispatch `ApprovalVotingMessage::CheckAndImportAssignment(assignment)` and wait for the response.
* If the result is `AssignmentCheckResult::Accepted(candidate_index)` or `AssignmentCheckResult::AcceptedDuplicate(candidate_index)`
* If `candidate_index` does not match `claimed_candidate_index`, punish the peer's reputation, recompute the fingerprint, and re-do our knowledge checks. The goal here is to accept messages which peers send us that are labeled wrongly, but punish them for it as they've made us do extra work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"labeled wrongly"? Are these indexes the vacated cores number? If so, we could always include the index inside the assignment, even when doing a modulo VRF criteria for tranche zero, so it could be read before proceeding, which makes the assignment outright invalid if the index is wrong. Is that helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm unsure where claimed_candidate_index originates. I suppose the peer sends us data it believes important, but why does it need to tell us why the data matters? We either agree or not, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think including that data in the VRF would make this point redundant. At the moment, the vacated core index isn't part of the VRF, and we have peers send us the vacated core index alongside the assignment which makes it easier for us to do some politeness checks.

So the conundrum this is solving is what we should do if a peer sends us something labeled with the wrong core index but that is still a valid assignment for some other core. And my answer was to re-label it, punish the peer, and distribute.

Your general advice so far has been to keep as much data as possible out of the VRF, so it didn't cross my mind that was an option.

Copy link
Contributor

@burdges burdges Nov 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our VRF signatures have an input that corresponds to the output, which yes we minimize, but the same VRF also signs an extra auxiliary message, that influences only it proof, not its output.

We can pack anything we like into this extra auxiliary message, like whatever claims simplify the code, or even the entire gossip message containing the VRF (IndirectAssignmentCert?). In fact, including the whole gossip message saves a signature that costs at least 45ms and 64 bytes per assignment message, so I slanted the draft code I wrote in this direction.

Copy link
Contributor Author

@rphmeier rphmeier Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So it is possible to include the core index under the proof, even if that core index was unknown prior to the output being generated? That does indeed save us a ton of trouble. What's the schnorrkel API for that?

* Dispatch a `ApprovalDistributionV1Message::Approval(approval)` to all peers in the `BlockEntry`'s `known_by` set, excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add the fingerprint of the assignment to the knowledge of each peer. Note that this obeys the politeness conditions:
* We guarantee elsewhere that all peers within `known_by` are aware of all assignments relative to the block.
* We've checked that this specific approval has a corresponding assignment within the `BlockEntry`.
* Thus, all peers are aware of the assignment or have a message to them in-flight which will make them so.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this sounds fine, but it's quadratic so another reason candidates should be rejected long before "escalation by DoS".

It breaks gossip assumptions if one compares fingerprints between peers using publicly keyed bloom filters, but if each pair of peers computes their own shared secret bloom filter key, then they could compare using bloom filters. Yet another optimization I guess.

* Broadcast on network with an `ApprovalNetworkingMessage::DistributeAssignment`.
* Kick off approval work with `launch_approval`
* Broadcast on network with an `ApprovalDistributionMessage::DistributeAssignment`.
* Kick off approval work with `launch_approval`. Note that if the candidate appears in multiple current blocks, we will launch approval for each block it appears in. It may make sense to shortcut around this with caching either at this level or on the level of the other subsystems invoked by that function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these approval checkers being different is an unfortunate consequence of using the relay VRF modulo criteria for tranche zero.

If the relay chain largely avoids forks then the relay VRF modulo criteria ensures all validators get the same number of tranche zero assignments, which improves tranche zero assignments in several respects:

  • We avoid a choice between either overworking nodes so much or else giving them more arbitrary choices about what they check.
  • We estimate the number of checkers with a binimial distribution, which gives a variance n p q which looks less than the variance obtained from the more complex distributions resulting from other strategies.
  • We could support validators of different computing capacities if we force validators to take assignments with sample numbers 1,..,k where k cannot exceed by too much their previous average number of samples completed on the same inclusion batch. In this way, we pay more for validators with more CPU but permit others to participate. This might not be a good thing, but nice option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that sounds fine, but the point here is actually about the fact that we literally re-execute the candidate even if we've just validated it and we should have some LRU cache somewhere so that being "re-assigned" to the same candidate across multiple forks is ultra cheap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed interpreted and read this in terms of across validators, thanks!

@burdges
Copy link
Contributor

burdges commented Nov 22, 2020

LGTM :)

@@ -153,7 +153,7 @@ On receiving an `OverseerSignal::ActiveLeavesUpdate(update)`:
* Ensure that the `CandidateEntry` contains a `block_assignments` entry for the block, with the correct backing group set.
* If a validator in this session, compute and assign `our_assignment` for the `block_assignments`
* Only if not a member of the backing group.
* Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol section](../../protocol-approval.md#assignment-criteria)
* Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol section](../../protocol-approval.md#assignment-criteria). Ensure that the assigned core derived from the output is covered by the auxiliary signature aggregated in the `VRFPRoof`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine. The core is always either and input or an output already. If you want it to always be an input then we can just include it in the message and check it from the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we feasibly include the core as an input to the VRF, given that it's derived from the output? That seems cyclical, but maybe through the power of cryptography...

@@ -166,6 +166,7 @@ On receiving a `ApprovalVotingMessage::CheckAndImportAssignment` message, we che
* Check the assignment cert
* If the cert kind is `RelayVRFModulo`, then the certificate is valid as long as `sample < session_info.relay_vrf_samples` and the VRF is valid for the validator's key with the input `block_entry.relay_vrf_story ++ sample.encode()` as described with [the approvals protocol section](../../protocol-approval.md#assignment-criteria). We set `core_index = vrf.make_bytes().to_u32() % session_info.n_cores`. If the `BlockEntry` causes inclusion of a candidate at `core_index`, then this is a valid assignment for the candidate at `core_index` and has delay tranche 0. Otherwise, it can be ignored.
* If the cert kind is `RelayVRFDelay`, then we check if the VRF is valid for the validator's key with the input `block_entry.relay_vrf_story ++ cert.core_index.encode()` as described in [the approvals protocol section](../../protocol-approval.md#assignment-criteria). The cert can be ignored if the block did not cause inclusion of a candidate on that core index. Otherwise, this is a valid assignment for the included candidate. The delay tranche for the assignment is determined by reducing `(vrf.make_bytes().to_u64() % (session_info.n_delay_tranches + session_info.zeroth_delay_tranche_width)).saturating_sub(session_info.zeroth_delay_tranche_width)`.
* We also check that the core index derived by the output is covered by the `VRFProof` by means of an auxiliary signature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auxiliary message checked by the proof is just like a signature, so it gives us the ability to include stuff that is not in the VRF input. The core is either an input or output of the given criteria already, but we can include it in the auxiliary message and check that it is correct if that simplifies some code somewhere.

@rphmeier rphmeier merged commit 029c8a2 into master Nov 25, 2020
@rphmeier rphmeier deleted the rh-approval-networking branch November 25, 2020 04:40
ordian added a commit that referenced this pull request Nov 25, 2020
* master:
  Some code cleanup in overseer (#2008)
  PoV Distribution optimization (#1990)
  Approval Distribution Subsystem (#1951)
  Session management for approval voting (#1973)
  Do not send messages twice in bitfield distribution (#2005)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guide: Approval Voting Distribution Subsystem
3 participants