Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus/clique: replace static 1/2 difficulties with dynamic 1-n scale #166

Merged
merged 4 commits into from
May 11, 2018

Conversation

jmank88
Copy link
Contributor

@jmank88 jmank88 commented May 8, 2018

DO NOT MERGE - REQUIRES RESET

This PR proposes replacing the static clique difficulties with dynamic, scaled values derived from the last signed block.

The existing clique consensus protocol uses two static difficulties, 2 for the 'in-turn' signer, and 1 for 'out-of-turn' signers. This prioritizes in-turn signing over out-of-turn. However, it does not distinguish between 'out-of-turn' signers, and 'in-turn' is based only on the block number, with no consideration of recent history. This leads to a few problems:

  1. When two or more nodes try to sign 'out-of-turn', there is no clear priority since they all have difficulty 1. This ambiguity seems to occasionally lead to a kind of split-decision logical deadlock in our testnet.
  2. When less than all n nodes are signing (x are down), a smooth n-x round-robin is likely not possible, because when a node fills in out-of-turn, it may make itself ineligible (too recent) for it's own next in-turn block, requiring another out-of-turn signature. This effect can cascade or repeat depending on chance and the least common multiple of n and n-x.
  3. When a signer is added, the period of the 'in-turn' schedule changes, making it possible for nodes to be 'in-turn' for two consecutive blocks (or for two blocks too near each other).

These problems can be avoided by using a distinct, dynamic, scaled difficulty, based on the last block signed by each signer. From CalcDifficulty:

// Difficulty for ineligible signers (too recent) is always 0. For eligible signers, difficulty is defined as 1 plus the
// number of lower priority signers, with more recent signers have lower priority. If multiple signers have not yet
// signed (0), then addresses which lexicographical sort later have lower priority.

The most recent n/2 signers are ineligible to sign, so this produces difficulties from n/2+1 to n, inclusive, with the 'in-turn' signer always having difficulty n. This has several benefits which solve or reduce the aforementioned problems:

  1. Each signer always has a distinct difficulty, falling back to lexicographical sort when no blocks have been signed.
  2. Because the 'in-turn' signer is the node which signed least recently, when less than all n nodes are signing, a graceful n-x round-robin schedule will still be prioritized, with random out-of-turn signatures only shifting or reordering the schedule.

@jmank88 jmank88 force-pushed the scaled-difficulty branch from 9623df8 to 26eae68 Compare May 8, 2018 17:31
@@ -51,29 +52,27 @@ type Snapshot struct {

Number uint64 `json:"number"` // Block number where the snapshot was created
Hash common.Hash `json:"hash"` // Block hash where the snapshot was created
Signers map[common.Address]struct{} `json:"signers"` // Set of authorized signers at this moment
Signers map[common.Address]uint64 `json:"signers"` // Each authorized signer at this moment and their most recently signed block
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually becomes simpler, since we drop Recents, but it may also be incompatible and require a chain reset.

{signer: "A", voted: "C", auth: false},
{signer: "B", voted: "C", auth: false},
{signer: "A", voted: "B", auth: false},
{signer: "A", voted: "D", auth: true},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was identical to the previous. This is my best guess at the original intention, based on the comment.

Tally: make(map[common.Address]Tally),
}
for _, signer := range signers {
snap.Signers[signer] = struct{}{}
snap.Signers[signer] = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means 'no blocks signed' - we have to be sure to handle this specially and not interpret it as 'signed the genesis block', so that the initial n/2 blocks can be signed. I'm still not certain if it's important to assign distinct difficulties for those initial blocks, but it would be trivial to use the old algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a unit test for this case?

)

const (
checkpointInterval = 1024 // Number of blocks after which to save the vote snapshot to the database
inmemorySnapshots = 128 // Number of recent vote snapshots to keep in memory
inmemorySignatures = 4096 // Number of recent block signatures to keep in memory

wiggleTime = 500 * time.Millisecond // Random delay (per signer) to allow concurrent signers
recentSignerDelay = 1 * time.Second // Full delay for most recent eligible signer.
Copy link
Contributor Author

@jmank88 jmank88 May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised since we are doing fractions of this value, instead of multiples (though perhaps this is unwise).

// A difficulty <= limit would be too recent; limit+1 is the most recent eligible signer.
// So by subtracting limit, limit+1 becomes 1, which is a full delay.
fraction := diff - limit
delay = recentSignerDelay / time.Duration(fraction)
Copy link
Contributor Author

@jmank88 jmank88 May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks off-by-one and I need to revisit, but the general idea is to order these delays by difficulty, since they were random before. This fractional solution deals with the the scaled difficulties nicely, since the delay asymptotically approaches 0. However, I'm thinking that the maximum delay should still be based on the number of signers (like before), so they don't get so crammed together as we scale up. I'm not really sure of the importance though, since the distinct difficulties should resolve conflicts immediately.

Copy link
Contributor Author

@jmank88 jmank88 May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh, these only apply to diff < n, which is then shifted, so the scale isn't really relevant. I will rework this. We can distribute at most n/2 linearly into the range used before or something like that.

Copy link
Contributor Author

@jmank88 jmank88 May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked into the much simpler: delay = time.Duration(n-diff) * wiggleTime, which is a maximum delay of n/10 seconds for the most recent eligible signer (with the current wiggleTime of 200ms).

@jmank88 jmank88 force-pushed the scaled-difficulty branch from 006fd8f to 9646f57 Compare May 9, 2018 13:15
)

const (
checkpointInterval = 1024 // Number of blocks after which to save the vote snapshot to the database
inmemorySnapshots = 128 // Number of recent vote snapshots to keep in memory
inmemorySignatures = 4096 // Number of recent block signatures to keep in memory

wiggleTime = 500 * time.Millisecond // Random delay (per signer) to allow concurrent signers
wiggleTime = 200 * time.Millisecond // Delay step for out-of-turn signers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored to a multiplier, but with a reduced value since we have faster blocks.

Copy link
Contributor

@benbjohnson benbjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this lgtm. I added mostly stylistic comments.

return errUnauthorized
signed, authorized := snap.Signers[signer]
if !authorized {
return fmt.Errorf("%s not authorized to sign", signer.Hex())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change signed to something more clear? e.g. lastSignedBlockNumber. Right now signed seems like it would be a bool.

return nil, errUnauthorized
signed, authorized := snap.Signers[signer]
if !authorized {
return nil, fmt.Errorf("%s not authorized to sign", signer.Hex())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update name of signed variable here too.

if signed > 0 {
limit := uint64(len(snap.Signers)/2 + 1)
if next := limit + signed; number < next {
return nil, fmt.Errorf("%s not authorized to sign %d: signed %d, next eligible signature %d", signer.Hex(), number, signed, next)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the next calculation to snap so it's not duplicated and clearer? e.g. NextSignableBlockNumber(lastSignedBlockNumber uint64) uint64

for name, tt := range tests {
t.Run(name, tt.run)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting tests from a slice to a map is going to make tests run in a different order every time. Not the biggest deal but another alternative would be to put a name string in the tests struct{} definition.

@jmank88
Copy link
Contributor Author

jmank88 commented May 9, 2018

I'm beginning to think that letting the difficulty scale up unbounded could be problematic, but also that we can work around it, and cap difficulty to 2n, by essentially setting any >n to n+node_#. We would still favor the least recent in the normal case and during small hiccups, but in more irregular cases we'd have a logical set of 'least-recent's to choose from, but I don't think that's a problem. This may also pair well with the policy for the initial n blocks after genesis as well.

@jmank88
Copy link
Contributor Author

jmank88 commented May 10, 2018

I've been resisting assigning simple difficulties from [1,n] since I was thinking it would require sorting (and possibly fetching) all the other signers, and the simplicity of diff = current - last was appealing and adequate. However, it turns out not to be necessary to fetch and sort (already have them all, and can just iterate once and count) and the 'simpler' calculation has messy edge cases anyways. Instead, we can just assign difficulties from the range [1,n], corresponding to a sort order based on recency (and lexicographical sort, when not signed yet). With this model, we keep a narrow range of sequential difficulties capped at n, and there are never any ambiguous, equal values.

I will update the OP.

@jmank88 jmank88 force-pushed the scaled-difficulty branch from 0415bcd to c9e765f Compare May 10, 2018 15:36
@jmank88 jmank88 changed the title WIP: consensus/clique: replace static 1/2 difficulties with distance from last signed consensus/clique: replace static 1/2 difficulties with distance from last signed May 10, 2018
@jmank88 jmank88 changed the title consensus/clique: replace static 1/2 difficulties with distance from last signed consensus/clique: replace static 1/2 difficulties with dynamic 1-n scale May 10, 2018
@benbjohnson
Copy link
Contributor

lgtm 👍

@rlegene
Copy link

rlegene commented Jan 18, 2019

I am in favour of this code, though, I need a way to fork my existing network into accepting any new block validation algorithm.

@jmank88
Copy link
Contributor Author

jmank88 commented Jan 21, 2019

@rlegene You can try this (arguably a bug in the client - background here) and this (less significant, just makes a random choice a little more deterministic) - they only adjust how to handle same-difficulty blocks, and don't modify the protocol at all, so existing clients can upgrade without a fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants