Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Offline fallback for GRANDPA #1619

Merged
merged 65 commits into from
Mar 5, 2019
Merged

Offline fallback for GRANDPA #1619

merged 65 commits into from
Mar 5, 2019

Conversation

rphmeier
Copy link
Contributor

@rphmeier rphmeier commented Jan 30, 2019

Depends on #1808

Motivation

GRANDPA voting is completely off-chain, so detecting offline validators and changing sets is difficult. Furthermore, changing validator sets is usually done in conjunction with finality -- when finality is stalled, how are we supposed to switch to a new set?

The answer is to fall back onto the block production system: GRANDPA is an asynchronously safe finality gadget, while the block production system is typically safe under synchrony. So if the block authors believe that nothing has been finalized for a sufficiently long time, we can trigger a "forced" set change that allows full nodes to begin to follow the new best set.

By nature, this forks off light clients (including warp sync), who won't be able to function again until aware of a trusted checkpoint after the forced change.


Abstract

Alistair described a scheme for figuring out when finality is stalled on-chain in the GRANDPA paper: https://github.com/w3f/consensus/blob/master/pdf/grandpa.pdf

To paraphrase:

Every block producer puts the highest block number that they see as finalised in their block.

Then any participant sees that if there is an n such that

  1. their best chain is at least length n + 100
  2. the indicators of the last finalised block height of blocks n-100 to n in their best chain have median at most n-1050 and
  3. n is the minimum that satisfied 1 and 2.

Implementation notes

We implement this by adding an srml-finality-tracker which adds an inherent for block authors to report the finalized block, and records a rolling window of reports, tracking the median.

Whenever the median is sufficiently low (step 2 of above), srml-finality-tracker signals the listeners.

srml-grandpa is one such listener: when it is notified of finality lagging, it issues a new forced authority set-change event.

forced set changes are applied on block import, unlike regular set changes which are applied on finality.

Forced set changes are also applied after a delay, but upon witnessing the signal the old voters should stop voting, if they were at all. This PR also implements "pausing" of voters.

We introduce an aux_schema module for handling persistent data in a consolidated way.

TODO:

  • Add tests for finality-tracker
  • Add tests for srml-grandpa
  • Add tests for finality-grandpa AuthoritySet struct
  • Add tests for finality-grandpa's handling of forced changes
  • Determine if this PR is breaksconsensus OR breaksapi -- I have been careful to keep storage and APIs backwards compatible but @gavofyork or @bkchr a careful review of the runtime to determine this would be appreciated.

@rphmeier rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. B0-patchthis labels Jan 30, 2019
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to increase the version of the runtime,

core/finality-grandpa/src/authorities.rs Outdated Show resolved Hide resolved
core/finality-grandpa/src/authorities.rs Outdated Show resolved Hide resolved
core/finality-grandpa/src/authorities.rs Outdated Show resolved Hide resolved
core/finality-grandpa/src/lib.rs Outdated Show resolved Hide resolved
core/finality-grandpa/src/lib.rs Outdated Show resolved Hide resolved
@andresilva andresilva added A0-please_review Pull request needs code review. and removed A4-gotissues labels Feb 26, 2019
@gavofyork
Copy link
Member

@rphmeier @andresilva how is this looking?

@gavofyork gavofyork dismissed their stale review February 27, 2019 17:56

much changed

@andresilva
Copy link
Contributor

I think it's ready for another review. I didn't approve the PR because most of the changes since the last review have been made by me and I'd like a sign off from @rphmeier.

@rphmeier
Copy link
Contributor Author

I'll give it another review but not sure I can get to it this week. I'll try on the weekend or Monday.

// This can happen after a forced change (triggered by the finality tracker when finality is stalled), since
// the voter will be restarted at the median last finalized block, which can be lower than the local best
// finalized block.
warn!(target: "afg", "Safety violation detected, tried to finalize {:?} while the current best finalized is {:?}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a safety violation to re-finalize blocks on the canonical chain -- perhaps we can only warn when it does that?


new_set = Some((median_last_finalized, AuthoritySet {
current_authorities: change.next_authorities.clone(),
set_id: self.set_id + 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm yet in-progress of trying to understand this PR in details - so probably I'll discover the answer for my question later. But here is the question: is this possible that there will be two adjacent forced-change periods where set_id has increased by 2 without any justification (i.e. set A#1 has changed to set B#2 AND set B#2 was also unable to finalize something AND it has been changed to set C#3 AND only C#3 has generated new justification with set_id=3).

I'm currently relying on the fact that set_id is always accompanied by justification in the #1669 (in this check to be specific). It isn't critical for that code - just wanted to clarify if this is still a rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe the situation you described can happen.

Copy link
Contributor Author

@rphmeier rphmeier Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svyatonik worth noting that the offline change boots out light clients, so you can still rely on that assumption as long as you handle offline changes by shutting down the light client.

A checkpointing system is the only way to get light clients back onto the network.

@@ -292,31 +436,37 @@ mod tests {

#[test]
fn changes_iterated_in_pre_order() {
// TODO: include forced change and make sure it's iterated last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this TODO is already solved (below in the same test).

@andresilva
Copy link
Contributor

andresilva commented Mar 1, 2019

When this PR is approved please don't merge it right away since I'm still doing some local testing.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 1, 2019

Contesting the patch label -- this is probably a bit too unstable to backport to earlier testnets safely.

@andresilva andresilva merged commit 4399e57 into master Mar 5, 2019
@andresilva
Copy link
Contributor

I removed the patch label but we should still backport this and disable the finality tracker module (otherwise it will be harder to backport new fixes for grandpa).

@andresilva andresilva deleted the rh-grandpa-offline-fallback branch March 18, 2019 23:12
MTDK1 pushed a commit to bdevux/substrate that referenced this pull request Apr 12, 2019
Co-authored-by: André Silva <andre.beat@gmail.com>

* skeleton for finality tracker

* dispatch events when nothing finalized for a long time

* begin integrating finality tracker into grandpa

* add delay field to pending change

* add has_api_with function to sr_version for querying APIs

* partially integrate new force changes into grandpa

* implement forced changes

* get srml-grandpa compiling

* Update core/finality-grandpa/src/authorities.rs

Co-Authored-By: rphmeier <rphmeier@gmail.com>

* Update core/finality-grandpa/src/authorities.rs

Co-Authored-By: rphmeier <rphmeier@gmail.com>

* Update core/finality-grandpa/src/authorities.rs

Co-Authored-By: rphmeier <rphmeier@gmail.com>

* remove explicit dependence on CoreApi

* increase node runtime version

* integrate grandpa forced changes into node runtime

* add some tests to finality-tracker

* integrate finality tracking into node-runtime

* test forced-change logic

* test forced changes in the authority-set handler

* kill some unneeded bounds in client

* test forced-changes in finality-grandpa and fix logic

* build wasm and finality-tracker is no-std

* restart voter on forced change

* allow returning custom error type from lock_import_and_run

* extract out most DB logic to aux_schema and use atomic client ops

* unify authority set writing

* implement set pausing

* bump runtime version

* note on DB when we pause.

* core: grandpa: integrate forced changes with multiple pending standard changes

* core: grandpa: fix AuthoritySet tests

* runtime: bump impl_version

* core: clear pending justification requests after forced change import

* srml: finality-tracker: use FinalizedInherentData

* core: log requests for clearing justification requests

* core, node: update runtimes

* core: grandpa: fix tests

* core: grandpa: remove todos and add comments

* core: grandpa: use has_api_with from ApiExt

* core: fix tests

* core: grandpa: remove unnecessary mut modifier

* core: replace PostImportActions bitflags with struct

* core: grandpa: restrict genesis on forced authority set change

* core: grandpa: add more docs

* core: grandpa: prevent safety violations in Environment::finalize_block

* core: grandpa: register finality tracker inherent data provider

* core: grandpa: fix tests

* node: update runtime blobs

* core: grandpa: remove outdated todo

* core: aura: fix typo in log message

* core: grandpa: check re-finalization is on canonical chain

* srml: finality-tracker: fix initialization

* node: update runtime wasm

* srml: finality-tracker: don't re-initialize config keys
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants