diff --git a/ouroboros-consensus/docs/ChainDB.md b/ouroboros-consensus/docs/ChainDB.md deleted file mode 100644 index e11a7fee5dc..00000000000 --- a/ouroboros-consensus/docs/ChainDB.md +++ /dev/null @@ -1,538 +0,0 @@ -# The Chain Database - -The immutable database records a linear prefix of our current chain; the -volatile DB records a (possibly fragmented) tree of extensions: - -``` - /--- - / - / -|---------------------------| - \ - \ - \--- - IMMUTABLE DB VOLATILE DB -``` - -When we start up the system we must find the best possible path through the -volatile DB and adopt that as our current chain; then every time a new block is -added to the volatile DB we have recompute what the best possible path is now. -In other words, we maintain the invariant that - -**Invariant.** The current chain is the best possible path through the volatile DB. - -In an ideal world this would mean we have some kind of specialized data -structure supporting - -* Efficient insertion of new blocks -* Efficient computation of the best chain - -It's however not at all clear what such a data structure would look like if we -don't want to hard-code the specific chain selection rule. Instead we take a -simpler approach. - -## Preliminary: consensus protocol chain selection - -A choice of consensus protocol includes a choice of chain selection algorithm, -a binary relation `(⊑)` between chains indicating which chains are "preferred" -over which other chains. In the simplest case we just prefer longer chains -over shorter ones: - -``` - C ⊑ C' iff length C ≤ length C' -``` - -More realistic protocols might involve checking things such as delegation -certificate issue numbers (Permissive BFT) or chain density (Ouroboros Genesis). -However, one property that all chain selection algorithms we are interested in -share is the following: - -**Property "Always Extend".** - -``` - ∀ C, B ∙ C ⊏ (C :> B) -``` - -In other words, if we can extend a chain, we should. - -**Definitions.** - -1. Given a set of blocks `V`, let - - ``` - candidates(I, V) - ``` - - be the set of chain fragments anchored at `I` using blocks picked from - `V`.[^forwardIndex] This set has some properties: - - a. It is prefix closed - - ``` - ∀ F, B ∙ if (F :> B) ∈ candidates(I, V) then F ∈ candidates(I, V) - ``` - - b. Conversely, we have - - ``` - ∀ F, B ∙ If F ∈ candidates(I, V) then (F :> B) ∈ candidates(I, V ∪ {B}) - ``` - - provided that `F :> B` is valid chain. - - c. Adding blocks doesn't remove any candidates - - ``` - candidates(I, V) ⊆ candidates(I, V ∪ {B}) - ``` - - d. The only new candidates in `candidates(I, V ∪ {B})` must involve `B`; i.e. - - ``` - ∀ F ∈ candidates(I, V ∪ {B}) ∙ F ∈ candidates(I, V) or F = (... :> B :> ...) - ``` - -2. We overload the notation `(⊑)` to mean that a fragment `F` is preferred over - all fragments in a set of candidates: - - ``` - C ⊑ F iff F ∈ C and ∀ F' ∈ C ∙ F' ⊑ F - ``` - -**Lemma "Prefer Empty".** - -If `C ⊑ ε` (for empty fragment `ε`) then `ε` must be the _only_ candidate in `C`. - -_Proof (sketch)_. - -Suppose we have another candidate[^anchored] `(B :> ...)` in `C`. Then we'd have - -``` - ε ⊏ (B :> ...) -``` - -by "Always Extend", violating the assumption `C ⊑ ε`. ∎ - -[^anchored]: All candidates in `C` must have the same anchor. - -[^forwardIndex]: In order to compute `candidates` efficiency the volatile -DB must support a "forward chain index", able to efficiently answer -the question "which blocks succeed this one?". - -**Lemma "Local Chain Selection".** - -If - -``` - candidates(I, V) ⊑ F -``` - -then for all blocks `B` such that `F :> B` is a valid chain, -there exists an `F_new` extending `F :> B` such that - -``` - candidates(I, V ∪ {B}) ⊑ F_new -``` - -_Proof (sketch)._ - -Let's first consider the case where `F` is non-empty, i.e., `F = F' :> B_pred`, -with `B_pred` the predecessor of `B` (i.e., our current tip). - -1. `B_pred` cannot be the tip of any other candidate in `candidates(I, V)` - (because two candidates with the same tip must _be_ the same candidate). - -2. Since `candidates(I, V) ⊑ F`, we know that there cannot be any extension of - `F` in `candidates(I, V)` and hence there cannot be any other candidate that - contains `B_pred`. - -3. Since the new candidates in `candidates(I, V ∪ {B})` must involve `B` - (definition 1.d, above), this therefore means they can only be `(F :> B)` or - further extensions thereof. We can compute all possible such extensions - - ``` - candidates(B, V ∪ {B}) - ``` - - Then compare them using `(⊑)`, and use that to pick a preferred candidate - `F_new`. Since this candidate is preferred over all extensions `(F :> B :> - ..)`, which in turn is preferred over `F` (because they are extensions), - which was preferred over all existing candidates, we must indeed have - - ``` - candidates(I, V ∪ {B}) ⊑ F_new - ``` - - as required. - -The case where `F = ε` is simpler because in this case the empty fragment must -be the _only_ candidate (lemma "Prefer Empty", above), and so the reasoning in -step (3) applies immediately. ∎ - -## Invariant - -Given the tip of the immutable database `I` and volatile database `V`, the -chain DB maintains a current fragment `F` such that - -``` - candidates(I, V) ⊑ F -``` - -Technically speaking the type of `I` is `Maybe Block`, not `Block`, since the -immutable database may be empty. If that is the case, the predecessor of the -first block of `F` (if any) must be the genesis block. - -## Initialization - -The initialization of the chain DB proceeds as follows. - -1. Initialize the immutable DB, determine its tip `I`, and ask the - ledger DB for the corresponding ledger state `L`. - -2. Compute - - ``` - candidates(I, V) - ``` - - ignoring known-to-be-invalid blocks (if any) and blocks from the future - (i.e., `blockSlot B > currentSlot`), and order them using (`⊑`) so that we - process the preferred candidate first[^selectThenValidate]. We also ignore - any candidates that are prefixes of other candidates (justified by the - "Always Extend" property). - -3. Not all of these candidates may be valid, because the volatile DB stores blocks - whose _header_ have been validated, but whose _body_ is still unverified - (other than to check that it corresponds to the header). We therefore - validate each candidate chain fragment, starting with `L` each - time[^ledgerState]. As soon as we find a candidate that is valid, we adopt - it as our current chain. If we find a candidate that is _invalid_, we mark - the invalid block[^invalidSuccessors], and go back[^whyGoBack] to step - (2). - -[^ledgerState]: We make no attempt to share ledger states between candidates, -even if they share a common prefix, trading runtime performance for lower memory -pressure. - -[^whyGoBack]: We recompute the set of candidates after marking some block(s) as -invalid because (1) those blocks may also exist in other candidates and (2) we -do not know how the valid prefixes of those candidates should now be ordered. - -[^invalidSuccessors]: We do not need to also mark the successors of the -invalid block as invalid. The chain sync client will use this information to -terminate connections to nodes with a chain that contains an invalid block. -Say the node has the following chain: -``` -A -> I -> C -``` -where `I` is an invalid block. It is impossible for there to be a candidate -chain containing `C`, but not `I`, which means that it is not necessary to -also mark `C` (and any other successors) as invalid. Proof: every chain sync -candidate fragment is anchored at a point on _our_ chain, and since `I` is -invalid, we will never adopt `I`. So if a candidate fragment contains `C` and -is anchored on our chain, it must also contain `I`. - -[^selectThenValidate]: Technically speaking we should _first_ validate all -chains, and then apply selection only to the valid chains. We run chain selection -first, because that is much cheaper. It does not matter, since -``` - sortBy f . filter p = filter p . sortBy f -``` -since `sortBy` is stable. - -## Adding a block - -When a new block `B` comes, we need to add it to the volatile DB and recompute -our current chain. We distinguish between the following different cases. - -### Ignore - -We can just ignore the block if either of the following is true. - -* the block was already in the volatile DB - - ``` - B ∈ V - ``` - -* the block is already in the immutable DB, _or_ it belongs to a branch - which forks more than `k` blocks away from our tip - - ``` - blockNo B <= blockNo I - ``` - - We could distinguish between between the block being on our chain or on a - distant fork by doing a single query on the immutable DB, but it does not - matter: either way we do not care about this block. - - We don't expect the chain sync client to feed us such blocks under normal - circumstances, though it's not impossible (by the time a block is downloaded - it's conceivable, albeit unlikely, that that block is now older than `k`). - We may wish to issue a warning when this happens. - -### Store but don't change the current chain - -We store the block, but do nothing else as we are missing one of the -(transitive) predecessors of the block. - -We can check this by following back pointers until we reach a block `B'` such -that `B' ∉ V` and `B' ≠ I`. The cost of this is bounded by the length of the -longest fragment in the volatile DB, and will typically be low; moreover, the -chain fragment we are constructing this way will be used in the switch-to-fork -case.[^firstCheckTip] - -At this point we _could_ do a single query on the immutable DB to check if -`B'` is in the immutable DB or not. If it is, then this block is on a distant -branch that we will never switch to, and so we can ignore it. If it is not, we -may or may not need this block later and we must store it; if it turns out we -will never need it, it will eventually be garbage collected.[^gc] - -Alternatively, and easier, we can also just omit the check on the immutable DB -and just assume we might need the block and rely on GC to eventually remove it -if we don't. - -[^firstCheckTip]: It might make sense to check the "Add to current chain" -case before doing the missing predecessor check (provided that the block is -not in the future). - -[^gc]: Blocks on chains that are never selected, or indeed blocks whose -predecessor we never learn, will eventually be garbage collected when their -block number number is more than `k` away from the tip of the selected chain. -The chain DB (more specifically, the volatile DB) can still grow without bound -if we allow upstream nodes to rapidly switch between forks; this should be -addressed at the network layer (for instance, by introducing rate limiting for -rollback in the chain sync client). - -### Store but schedule chain selection - -When the block belongs to a future slot: - -``` - blockSlot B > currentSlot -``` - -We write the block to the VolatileDB and then schedule a chain selection for -`B` at `blockSlot B`. - -We have the following bound on the number of blocks that can arrive from the -future: - - nbPeers * maxClockSkew * chainSyncRateLimit - -The _max clock skew_ is a parameter of the chain sync client, which accepts -the header of a block `B` from the future, provided that: - -``` -blockSlot B < currentSlot + maxClockSkew -``` - -The headers of such blocks will be included in candidate chains, which are -advertised to the block fetch client, which can decide to download the -corresponding blocks and add them to the chain database. - -The `chainSyncRateLimit` is the rate limit on the number of headers that will -be processed from a particular peer. - -### Add to current chain - -If `B` fits onto the end of our current chain `F`, i.e. - -* `F = ε` and `B_pred = I`, or - -* `exists F' ∙ F = F' :> B_pred` - -we take advantage of lemma Local Chain Selection and run chain selection on - -``` - candidates(B, V ∪ {B}) -``` - -Apart from the starting point, chain selection will work in the same way as -described in Initialization. Note that this case takes care of the common case -where we just add a block to our chain, as well as the case where we stay -with the same branch but receive some blocks out of order. Moreover, we can use -the _current_ ledger state as the starting point for validation. - -### Switch to a fork - -If none of the cases above apply, we have a block `B` such that - -a. `B ∉ V` - -b. `blockNo B > blockNo I` (and hence `B` cannot be in the immutable DB) - -c. For all transitive predecessors `B'` of `B` we have `B' ∈ V` or `B' = I` - - In other words, we must have a fragment `F_prefix = I :> ... :> B` in `candidates(I, V ∪ {B})`. - -d. `blockSlot B <= currentSlot` - -e. (Either `F = ε` and `B_pred ≠ I`, or) `exists F', B' ∙ F = F' :> B'` where `B' ≠ B_pred` - -Some observations: - -* point (c) rules out the first option in (e): if `B_pred ≠ I` then we must have - `B_pred ∈ V` and moreover this must form some kind of chain back to `I`; - this means that the preferred candidate cannot be empty. - -* By (1.d) above, the new candidates in `candidates(I, V ∪ {B})` must involve - `B`; in other words, they must all be extensions of `F_prefix`; we can - compute these candidates using `candidates(B, V ∪ {B})`. - -* We can then use chain selection on all of these candidates _and_ the current - chain[^preferCandidate]; let the resulting preferred candidate be `F_new`. - By definition we must have that `F_new` is preferred over the current chain - and the new candidates; moreover, since the current chain is preferred over - all _existing_ candidates, we must have by transitivity that `F_new` is - preferred over all candidates in `candidates(B, V ∪ {B})`, and so we can - adopt it as our new chain (this argument is a variation on the Local Chain - Selection argument, above). - -It is worth pointing out that we do _not_ rely on `F_prefix` being longer than -the current chain. Indeed, it may not be: if two leaders are selected for the -same slot, and we _receive_ a block for the current slot before we can _produce_ -one, our current chain will contain the block from the other leader; when we -then produce our own block, we end up in the switch-to-fork case; here it is -important that `preferCandidate` would prefer a candidate chain (the chain that -contains our own block) over our current chain, even if they are of the same -length, if the candidate ends in a block that we produced (and the current chain -does not); however, the `ChainDB` itself does not need to worry about this -special case. - -[^preferCandidate]: Chain selection may treat the current chain special, so we -have to be careful to use `preferCandidate` rather than `compareCandidates` as -appropriate. - -### Concurrency - -Multiple blocks might be added concurrently, and since this operation is not -atomic, as it involves writing a block to disk and reading headers from disk, -we explore the possible interleavings. - -The three main steps are: - -1. Add a block -2. Compute candidates and perform chain selection which might result in a - candidate that is preferred over the current chain. -3. Try to install the candidate as the new chain. - -We want that all possible interleavings will result in installing the most -preferable candidate as the new chain. We will reason that this is the case -(for two concurrent threads). - -If either of the two computations (step 2) is done with -knowledge of both blocks (after step 1), then the computation with knowledge -of only one block can't possibly construct a candidate that is preferred over -the candidate produced by the other computation. - -For this not to be true, both computations would have to be done with only -knowledge of their own block (step 1). This is impossible, as the execution of -step 1 is serialised, so at least one thread must see both blocks. - -## Short volatile fragment - -Nothing above relies in any way on the length of the current fragment, but -the maximum rollback we can support is bound by the length of that current fragment. -This will be less than `k` only if - -* We are near genesis and the immutable DB is empty, or -* Due to data corruption the volatile DB lost some blocks - -Only the latter case is some cause for concern: we are in a state where -conceptually we _could_ roll back up to `k` blocks, but due to how we chose to -organize the data on disk (immutable / volatile split) we cannot. -One option here would be to move blocks _back_ from the immutable DB to the -volatile DB under these circumstances, and indeed, if there were other parts of -the system where rollback might be instigated that would be the right thing to -do: those other parts of the system should not be aware of particulars of the -disk layout. - -However, since the `ChainDB` is _solely_ in charge of switching to forks, all -the logic can be isolated to the `ChainDB`. So, when we have a short volatile -fragment, we will just not rollback more than the length of that fragment. -Conceptually this can be justified also: the fact that `I` is the tip of the -immutable DB means that _at some point_ it was in our chain at least `k` blocks -back, and so we considered it to be immutable: the fact that some data loss -occurred does not really change that[^intersection]. We may still roll back more -than `k` blocks when disk corruption occurs in the immutable DB, of course. - -[^intersection]: When the chain sync client looks for an intersection between our -chain and the chain of the upstream peer, it sends points from our chain fragment. -If the volatile fragment is shorter than `k` due to data corruption, the client -would have fewer points to send to the upstream node. However, this is the correct -behaviour: it would mean we cannot connect to upstream nodes who fork more than `k` -of what _used to be_ our tip before the data corruption, even if that's not -where our tip is anymore. In the extreme case, if the volatile gets entirely -erased, only a single point is available (the tip of the immutable DB, `I`) and -hence we can only connect to upstream nodes that have `I` on their chain -- -which is precisely stating that we can only sync with upstream nodes that have a -chain that extends our immutable chain. - -## Clock changes - -When the system clock of a node is moved _forward_, we should run chain -selection again because some blocks that we stored because they were in the -future may now become valid. Since this could be any number of blocks, on any -fork, probably easiest to just do a full chain selection cycle (starting from -`I`). - -When the clock is moved _backwards_, we may have accepted blocks that we should -not have. Put another way, an attacker might have taken advantage of the fact -that the clock was wrong to get the node to accept blocks in the future. In this -case we therefore really should rollback -- but this is a weird kind of -rollback, one that might result in a strictly smaller current chain. We can only -do this by re-initializing the chain DB from scratch (the ledger DB does not -support such rollback directly). Worse still, we have have decided that some -blocks were immutable which really weren't. - -Unlike the data corruption case, here we should really endeavour to get to a -state in which it was as if the clock was never "wrong" in the first place; this -may mean we might have to move some blocks back from the immutable DB to the -volatile DB, depending on exactly how far the clock was moved back and how big -the overlap between the immutable DB and volatile DB is. - -It is therefore good to keep in mind that the overlap between the immutable DB -and volatile DB does make it a bit easier to deal with relatively small clock -changes; it may be worth ensuring that, say, the overlap is at least a few days -so that we can deal with people turning back their clock a day or two without -having to truncate the immutable database. Indeed, in a first implementation, -this may be the _only_ thing we support, though we will eventually have to -lift that restriction. - -## Garbage collection - -For performance reasons neither the immutable DB nor the volatile DB ever makes -explicit `fsync` calls to flush data to disk. This means that when the node -crashes, recently added blocks may be lost. When this happens in the volatile -DB it's not a huge deal: when the node starts back up and the `ChainDB` is -initialized we just run chain selection on whatever blocks still remain; in -typical cases we just end up with a slightly shorter chain. - -However, when this happens in the immutable DB the impact may be larger. In -particular, if we delete blocks from the volatile DB as soon as we add them to -the immutable DB, then data loss in the immutable DB would result in a gap -between the volatile DB and the immutable DB, making _all_ blocks in the -volatile DB useless. We _can_ recover from this, but it would result in a large -rollback (in particular, one larger than `k`). - -To avoid this we should introduce a delay between adding blocks to the immutable -DB and removing them from the volatile DB (garbage collection). The delay should -be configurable, but should be set in such a way that the possibility that the -block has not yet been written to disk at the time of garbage collection is -minimized. A relatively short delay should suffice (60 minutes, say, should be -more than enough), though there are other reasons for preferring a longer -delay: - -* Clock changes can more easily be accommodated with more overlap (see above) -* The time delay also determines the worst-case validity of iterators - (see detailed discussion in the `ChainDB` API) - -A consequence of this delay is that there will be overlap between the immutable -DB and the volatile DB. The exact length of this overlap depends on the garbage -collection delay and the slot length; a delay of 60 minutes and a block produced -every 20 seconds would result in an overlap of at least 180 blocks. This is a -lower bound; typically the overlap will be larger because blocks are not removed -from the volatile DB on a per-block basis, but rather in groups. However, this -overlap should be an internal detail to the `ChainDB` and not visible to its -clients. diff --git a/ouroboros-consensus/docs/HardFork.md b/ouroboros-consensus/docs/HardFork.md deleted file mode 100644 index 11157003634..00000000000 --- a/ouroboros-consensus/docs/HardFork.md +++ /dev/null @@ -1,216 +0,0 @@ -# Details of the hard fork transition - -This document attempts to describe the details of the hard fork transition -from Byron to Shelley, and from Shelley to future versions of the ledger. - -## Byron - -The Byron specification can be found at -https://hydra.iohk.io/job/Cardano/cardano-ledger-specs/byronLedgerSpec/latest/download-by-type/doc-pdf/ledger-spec . - -### Moment of hard fork - -The Byron ledger state provides the current protocol version in - -```haskell -adoptedProtocolVersion :: ProtocolVersion -``` - -in the `State` type from `Cardano.Chain.Update.Validation.Interface`. - -This protocol version is a three-tuple `major`, `minor`, `alt`. The Byron -specification does not provide any semantic interpretation of these components. -By convention (outside of the purview of the Byron specification), the hard fork -is initiated the moment that the `major` component of `adoptedProtocolVersion` -reaches a predefined, hardcoded, value. - -### The update mechanism for the `ProtocolVersion` - -Updates to the `ProtocolVersion` in `Byron` are part of the general -infrastructure for changing protocol parameters (parameters such as the maximum -block size), except that in the case of a hard fork, we care only about changing -the `ProtocolVersion`, and not any of the parameters themselves. - -The general mechanism for updating protocol parameters in Byron is as follows: - -1. A protocol update _proposal_ transaction is created. It proposes new values - for some protocol parameters and a greater _protocol version_ number as an - identifier. There cannot be two proposals with the same version number. - -2. Genesis key delegates can add _vote_ transactions that refer to such a - proposal (by its hash). They don't have to wait; a node could add a proposal - and a vote for it to its mempool simultaneously. There are only positive - votes, and a proposal has a time-to-live (see `ppUpdateProposalTTL`) during - which to gather sufficient votes. While gathering votes, a proposal is called - _active_. - - Note that neither Byron nor Shelley support full centralization - (everybody can vote); this is what the Voltaire ledger is intended to - accomplish. - -3. Once the number of voters satisfies a threshold (currently determined by the - `srMinThd` field of the `ppSoftforkRule` protocol parameter), the proposal - becomes _confirmed_. - -4. Once the threshold-satisfying vote becomes stable (ie its containing block is - `>=2k` slots old), a block whose header's protocol version number - (`CC.Block.headerProtocolVersion`) is that of the proposal is interpreted as - an _endorsement_ of the stably-confirmed proposal by the block's issuer - (specifically by the Verification Key of its delegation certificate). - Endorsements -- ie _any block_, since they all contain that header field -- - also trigger the system to discard proposals that were not confirmed within - their TTL. - - https://github.com/input-output-hk/cardano-ledger/blob/172b49ff1b6456851f10ae18f920fbfa733be0b0/cardano-ledger/src/Cardano/Chain/Block/Validation.hs#L439-L444 - - Notably, endorsements for proposals that are not yet stably-confirmed (or do - not even exist) are not invalid but rather silently ignored. In other words, - no validation applies to the `headerProtocolVersion` field. - -5. Once the number of endorsers satisfies a threshold (same as for voting), the - confirmed proposal becomes a _candidate_ proposal. - -6. _At the beginning of an epoch_, the candidate proposal with the greatest - protocol version number among those candidates whose threshold-satisfying - endorsement is stable (ie the block is `>=2k` slots old) is _adopted_: the - new protocol parameter values have now been changed. - - If there was no stably-candidated proposal, then nothing happens. Everything - is retained; in particular, a candidate proposal whose threshold-satisfying - endorsement was not yet stable will be adopted at the subsequent epoch unless - it is surpassed in the meantime. - - When a candidate is adopted, all record of other proposals/votes/endorsements - -- regardless of their state -- is discarded. The explanation for this is - that such proposals would now be interpreted as an update to the newly - adopted parameter values, whereas they were validated as an update to the - previously adopted parameter values. - -In summary, the following diagram tracks the progress of a proposal that's -eventually adopted. For other proposals, the path short circuits to a -"rejected/discarded" status at some point. - -``` -active proposal - --> (sufficient votes) -confirmed proposal - --> (2k slots later) -stably-confirmed proposal - --> (sufficient endorsements) -candidate proposal - --> (2k slots later) -stably-candidated proposal (Frisby: stably-nominated?) - --> (epoch transition) -adopted proposal -``` - -### Initiating the hard fork - -Proposals to initiate the hard fork can be submitted and voted on before all -core nodes are ready. After all, once a proposal is "stably-confirmed", it will -effectively remain so indefinitely until nodes endorse it (or it gets superseded -by another proposal). This means that nodes can vote to initiate the hard fork, -_then_ wait for everybody to update their software, and once updated, the -proposal is endorsed and eventually the hard fork is initiated. - -Endorsement is somewhat implicit. The node operator does not submit an explicit -"endorsement transaction", but instead restarts the node (probably after a -software update that makes the node ready to support the hard fork) with a new -protocol version (as part of a config file or command line parameter), which -then gets included in the blocks that the node produces (this value is part of -the static `ByronConfig`: `byronProtocolVersion`). - -(Note that a node restart is necessary for _any_ change to a protocol parameter, -even though most parameters do not require any change to the software at all.) - -### Software version (in block headers) - -The Byron header also records a software version (`headerSoftwareVersion`). This -is a legacy concern only, and is present in but ignored by the current Byron -implementation, and entirely absent from the Byron specification. - -## Shelley - -### Moment of the hard fork - -Similar to the Byron ledger, the Shelley ledger provides a "current protocol -version", but it is a two-tuple (not a three-tuple), containing only a -`hard fork` component and `soft fork` component: - -```haskell -_protocolVersion :: (Natural, Natural) -``` - -in `PParams` (currently, module `PParams` in -`chain-and-ledger/executable-spec/src/PParams.hs`). - -The hard fork from Shelley to its successor (Goguen?) will be initiated -once the hard fork component of this version gets incremented. - -### The update mechanism for the protocol version - -The update mechanism in Shelley is simpler than it is in Byron. There is no -distinction between votes and proposals: to "vote" for a proposal one merely -submits the exact same proposal. There is also no separate endorsement step -(though see "Initiating the hard fork", below). - -The procedure is as follows: - -1. As in Byron, a proposal is a partial map from parameters to their values. -2. During each epoch, a genesis key can submit (via its delegates) zero, one, - or many proposals; each submission overrides the previous one. -3. "Voting" (submitting of proposals) ends `6k/f` slots before the end of the - epoch (i.e., twice the stability period, called `stabilityWindow` in the - Shelley ledger implementation). -4. At the end of an epoch, if the majority of nodes (as determined by the - `Quorum` specification constant, which must be greater than half the nodes) - have most recently submitted the same exact proposal, then it is adopted. -5. The next epoch is always started with a clean slate, proposals from the - previous epoch that didn't make it are discarded. - -The protocol version itself is also considered to be merely another parameter, -and parameters can change _without_ changing the protocol version, although -a convention _could_ be established that the protocol version must change if -any of the parameters do; but the specification itself does not mandate this. - -### Initiating the hard fork - -The timing of the hard fork in Shelley is different to the one in Byron; -in Byron, we _first_ vote and then wait for people to get ready, in Shelley -it is the other way around. - -Core node operators will want to know that a significant majority of the -core nodes is ready (supports the hard fork) before initiating it. To make this -visible, Shelley blocks contain a protocol version. This is not related to the -current protocol version as reported by the ledger state (`_protocolVersion` as -discussed in the previous section), but it is the _maximum_ protocol version -that the node which produced that block can support. - -Once we see blocks from all or nearly all core nodes with the `hard fork` -component of their protocol version equal to the post-hard-fork value, nodes -will submit their proposals with the required major version change to initiate -the hard fork. - -Note that this also means that in Shelley there is no need to restart the node -merely to support a particular parameter change (such as a maximum block size). - -## Byron _or_ Shelley: Publication of software versions - -Both the Byron and the Shelley ledger additionally also records the latest -version of the software on the chain, in order to facilitate software -discovering new versions and subsequently updating themselves. This would -normally precede all of the above, but as far as `ouroboros-consensus` is -concerned, this is entirely orthogonal. It does not in any way interact with -either the decision to hard fork nor the moment of the hard fork. If we did -forego it, the discussion above would still be entirely correct. - -## Invalid states - -In a way, it is somewhat strange to have the hard fork mechanism be part of -the Byron or Shelley ledger itself, rather than some overarching ledger on top. -For Byron, a Byron ledger state where the `major` version is the (predetermined) -moment of the hard fork is basically an invalid state, used only once to -translate to a Shelley ledger. Similar, the `hard fork` part of the Shelley -protocol version _will never increase_ during Shelley's lifetime; the moment -it _does_ increase, that Shelley state will be translated to the (initial) -Goguen state. diff --git a/ouroboros-consensus/docs/report/.gitignore b/ouroboros-consensus/docs/report/.gitignore new file mode 100644 index 00000000000..71de13902ba --- /dev/null +++ b/ouroboros-consensus/docs/report/.gitignore @@ -0,0 +1,9 @@ +*.aux +*.log +*.out +*.pdf +*.toc +*.bbl +*.blg +*.nav +*.snm diff --git a/ouroboros-consensus/docs/report/chapters/appendix/byron.tex b/ouroboros-consensus/docs/report/chapters/appendix/byron.tex new file mode 100644 index 00000000000..71440381e75 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/appendix/byron.tex @@ -0,0 +1,155 @@ +\chapter{Byron} + +Some details specific to the Byron ledger. +EBBs already discussed at length in \cref{ebbs}. + +The Byron specification can be found at \url{https://github.com/input-output-hk/cardano-ledger-specs}. + +\section{Update proposals} +\label{byron:hardfork} + +\subsection{Moment of hard fork} +\label{byron:hardfork:moment} + +The Byron ledger state provides the current protocol version in +% +\begin{lstlisting} +adoptedProtocolVersion :: ProtocolVersion +\end{lstlisting} +% +in the \lstinline!State! type from +\lstinline!Cardano.Chain.Update.Validation.Interface!. +This protocol version is a three-tuple \emph{major}, \emph{minor}, \emph{alt}. +The Byron specification does not provide any semantic interpretation of these +components. By convention (outside of the purview of the Byron specification), +the hard fork is initiated the moment that the \emph{major} component of +\lstinline!adoptedProtocolVersion! reaches a predefined, hardcoded, value. + +\subsection{The update mechanism for the \lstinline!ProtocolVersion!} + +Updates to the \lstinline!ProtocolVersion! in Byron are part of the general +infrastructure for changing protocol parameters (parameters such as the maximum +block size), except that in the case of a hard fork, we care only about changing +the \lstinline!ProtocolVersion!, and not any of the parameters themselves. + +The general mechanism for updating protocol parameters in Byron is as follows: + +\begin{enumerate} + +\item +A protocol update \emph{proposal} transaction is created. It proposes new values +for some protocol parameters and a greater \emph{protocol version} number as an +identifier. There cannot be two proposals with the same version number. + +\item +Genesis key delegates can add \emph{vote} transactions that refer to such a +proposal (by its hash). They don't have to wait; a node could add a proposal and +a vote for it to its mempool simultaneously. There are only positive votes, and +a proposal has a time-to-live (see \lstinline!ppUpdateProposalTTL!) during which +to gather sufficient votes. While gathering votes, a proposal is called +\emph{active}. + +Note that neither Byron nor Shelley support full centralisation (everybody can +vote); this is what the Voltaire ledger is intended to accomplish. + +\item +Once the number of voters satisfies a threshold (currently determined by the +\lstinline!srMinThd! field of the \lstinline!ppSoftforkRule! protocol +parameter), the proposal becomes \emph{confirmed}. + +\item +Once the threshold-satisfying vote becomes stable (i.e. its containing block is at +least $2k$ slots deep), a block whose header's protocol version number +(\lstinline!CC.Block.headerProtocolVersion!) is that of the proposal is +interpreted as an \emph{endorsement} of the stably-confirmed proposal by the +block's issuer (specifically by the Verification Key of its delegation +certificate). Endorsements---i.e. \emph{any block}, since they all contain that +header field---also trigger the system to discard proposals that were not +confirmed within their TTL. + +Notably, endorsements for proposals that are not yet stably-confirmed (or do not +even exist) are not invalid but rather silently ignored. In other words, no +validation applies to the `headerProtocolVersion` field. + +\item +Once the number of endorsers satisfies a threshold (same as for voting), the +confirmed proposal becomes a \emph{candidate} proposal. + +\item +\emph{At the beginning of an epoch}, the candidate proposal with the greatest +protocol version number among those candidates whose threshold-satisfying +endorsement is stable (i.e. the block is at least $2k$ deep) is \emph{adopted}: +the new protocol parameter values have now been changed. + +If there was no stable candidate proposal, then nothing happens. Everything is +retained; in particular, a candidate proposal whose threshold-satisfying +endorsement was not yet stable will be adopted at the subsequent epoch unless it +is surpassed in the meantime. + +When a candidate is adopted, all record of other +proposals/votes/endorsements---regardless of their state---is discarded. The +explanation for this is that such proposals would now be interpreted as an +update to the newly adopted parameter values, whereas they were validated as an +update to the previously adopted parameter values. + +\end{enumerate} + +The diagram shown in \cref{byron:update-process} summarises the progress of a +proposal that's eventually adopted. For other proposals, the path short circuits +to a ``rejected/discarded'' status at some point. + +\begin{figure} +\hrule +\begin{center} +\begin{tikzpicture} +\node (act) {active} ; +\node (con) [below=of act] {confirmed} ; +\node (sta) [below=of con] {stably confirmed} ; +\node (can) [below=of sta] {candidate} ; +\node (sca) [below=of can] {stable candidate} ; +\node (ado) [below=of sca] {adopted} ; +\draw[->] (act.south) -- (con.north) node[pos=0.5, right] {sufficient votes}; +\draw[->] (con.south) -- (sta.north) node[pos=0.5, right] {$2k$ slots later}; +\draw[->] (sta.south) -- (can.north) node[pos=0.5, right] {sufficient endorsements}; +\draw[->] (can.south) -- (sca.north) node[pos=0.5, right] {$2k$ slots later}; +\draw[->] (sca.south) -- (ado.north) node[pos=0.5, right] {epoch transition}; +\end{tikzpicture} +\end{center} +\hrule +\caption{\label{byron:update-process}Byron update proposal process} +\end{figure} + +\subsection{Initiating the hard fork} +\label{byron:hardfork:initiating} + +Proposals to initiate the hard fork can be submitted and voted on before all +core nodes are ready. After all, once a proposal is stably confirmed, it will +effectively remain so indefinitely until nodes endorse it (or it gets superseded +by another proposal). This means that nodes can vote to initiate the hard fork, +\emph{then} wait for everybody to update their software, and once updated, the +proposal is endorsed and eventually the hard fork is initiated. + +Endorsement is somewhat implicit. The node operator does not submit an explicit +``endorsement transaction'', but instead restarts the +node\footnote{\label{byron:unnecessary-restarts}A node restart is necessary for +\emph{any} change to a protocol parameter, even though most parameters do not +require any change to the software at all.} (probably after a software update +that makes the node ready to support the hard fork) with a new protocol version +(as part of a configuration file or command line parameter), which then gets included +in the blocks that the node produces (this value is the +\lstinline!byronProtocolVersion! field in the static \lstinline!ByronConfig!). + +\subsection{Software versions} + +The Byron ledger additionally also records the latest version of the software on +the chain, in order to facilitate software discovering new versions and +subsequently updating themselves. This would normally precede all of the above, +but as far as the consensus layer is concerned, this is entirely orthogonal. It +does not in any way interact with either the decision to hard fork nor the +moment of the hard fork. If we did forego it, the discussion above would still +be entirely correct. As of Shelley, software discovery is done off-chain. + +The Byron \emph{block header} also records a software version +(\lstinline!headerSoftwareVersion!). This is a legacy concern only, and is +present in but ignored by the current Byron implementation, and entirely absent +from the Byron specification. diff --git a/ouroboros-consensus/docs/report/chapters/appendix/shelley.tex b/ouroboros-consensus/docs/report/chapters/appendix/shelley.tex new file mode 100644 index 00000000000..2248a274e9f --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/appendix/shelley.tex @@ -0,0 +1,89 @@ +\chapter{Shelley} + +\section{Update proposals} +\label{shelley:hardfork} + +\subsection{Moment of the hard fork} +\label{shelley:hardfork:moment} + +Similar to the Byron ledger (\cref{byron:hardfork:moment}), the Shelley ledger +provides a ``current protocol version'', but it is a two-tuple (not a +three-tuple), containing only a \emph{hard fork} component and \emph{soft fork} +component: +% +\begin{lstlisting} +_protocolVersion :: (Natural, Natural) +\end{lstlisting} +% +in \lstinline!PParams!. The hard fork from Shelley to its successor will be +initiated once the hard fork component of this version gets incremented. + +\subsection{The update mechanism for the protocol version} + +The update mechanism in Shelley is simpler than it is in Byron. There is no +distinction between votes and proposals: to ``vote'' for a proposal one merely +submits the exact same proposal. There is also no separate endorsement step +(though see \cref{shelley:hardfork:initiating}). + +The procedure is as follows: + +\begin{enumerate} + +\item +As in Byron, a proposal is a partial map from parameters to their values. + +\item +During each epoch, a genesis key can submit (via its delegates) zero, one, or +many proposals; each submission overrides the previous one. + +\item +``Voting'' (submitting of proposals) ends $6k/f$ slots before the end of the +epoch (i.e., twice the stability period, called \lstinline!stabilityWindow! in +the Shelley ledger implementation). + +\item +At the end of an epoch, if the majority of nodes (as determined by the +\lstinline!Quorum! specification constant, which must be greater than half the +nodes) have most recently submitted the same exact proposal, then it is adopted. + +\item +The next epoch is always started with a clean slate, proposals from the +previous epoch that didn't make it are discarded.\footnote{Proposals \emph{can} +be explicitly marked to be for future epochs; in that case, these are simply +not considered until that epoch is reached.} + +\end{enumerate} + +The protocol version itself is also considered to be merely another parameter, +and parameters can change without changing the protocol version, although a +convention could be established that the protocol version must change if any of +the parameters do; but the specification itself does not mandate this. + +\subsection{Initiating the hard fork} +\label{shelley:hardfork:initiating} + +The timing of the hard fork in Shelley is different to the one in Byron: in +Byron, we \emph{first} vote and then wait for people to get ready +(\cref{byron:hardfork:initiating}); in Shelley it is the other way around. + +Core node operators will want to know that a significant majority of the core +nodes is ready (supports the hard fork) before initiating it. To make this +visible, Shelley blocks contain a protocol version. This is not related to the +current protocol version as reported by the ledger state +(\lstinline!_protocolVersion! as discussed in the previous section), but it is +the \emph{maximum} protocol version that the node which produced that block can +support. + +Once we see blocks from all or nearly all core nodes with the `hard fork` +component of their protocol version equal to the post-hard-fork value, nodes +will submit their proposals with the required major version change to initiate +the hard fork.\footnote{This also means that unlike in Byron +(\cref{byron:unnecessary-restarts}), in Shelley there is no need to restart the +node merely to support a particular parameter change (such as a maximum block +size).} + +\section{Forecasting} +\label{shelley:forecasting} + +Discuss the fact that the effective maximum rollback in Shelley is $k - 1$, +not $k$; see also \cref{ledger:forecasting}. diff --git a/ouroboros-consensus/docs/report/chapters/conclusions/conclusions.tex b/ouroboros-consensus/docs/report/chapters/conclusions/conclusions.tex new file mode 100644 index 00000000000..211dd41138a --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/conclusions/conclusions.tex @@ -0,0 +1 @@ +\chapter{Conclusions} diff --git a/ouroboros-consensus/docs/report/chapters/conclusions/technical.tex b/ouroboros-consensus/docs/report/chapters/conclusions/technical.tex new file mode 100644 index 00000000000..fa36f356675 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/conclusions/technical.tex @@ -0,0 +1,14 @@ +\chapter{Technical design decisions} +\label{technical} + +In this chapter we will discuss a number of interesting technical decision +decisions that aren't directly to any of the specific needs of the consensus +layer. + +\section{Classes versus records} +\label{technical:classes-vs-records} + +Discuss why classes are helpful (explicit about closures). + +\section{Top-level versus associated type families} +\label{technical:toplevel-vs-associated} diff --git a/ouroboros-consensus/docs/report/chapters/consensus/ledger.tex b/ouroboros-consensus/docs/report/chapters/consensus/ledger.tex new file mode 100644 index 00000000000..5c0a11890b9 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/consensus/ledger.tex @@ -0,0 +1,354 @@ +\chapter{Interface to the ledger} +\label{ledger} + +\section{Abstract interface} +\label{ledger:api} + +In \cref{overview:ledger} we identified three responsibilities for the ledger +layer: +% +\begin{itemize} +\item ``Ticking'' the ledger state, applying any time related changes +(\cref{ledger:api:IsLedger}). This is independent from blocks, both at the value +level (we don't need a block in order to tick) and at the type level. +\item Applying and verifying blocks (\cref{ledger:api:ApplyBlock}). This +obviously connects a ledger and a block type, but we try to avoid to talk about +\emph{the} ledger corresponding to a block, in order to improve +compositionality; we will see examples of where this comes in useful in the +definition of the extended ledger state (\cref{storage:extledgerstate}) and the +ledger database (\cref{ledgerdb}). +\item Projecting out the ledger view (\cref{ledger:api:LedgerSupportsProtocol}), +connecting a ledger to a consensus protocol. +\end{itemize} +% +We will discuss these responsibilities one by one. + +\subsection{Independent definitions} +\label{ledger:api:IsLedger} + +We will start with ledger API that can be defined independent of a choice of +block or a choice of consensus protocol. + +\subsubsection{Configuration} + +Like the other abstractions in the consensus layer, the ledger defines its own +type of required static configuration +% +\begin{lstlisting} +type family LedgerCfg l :: Type +\end{lstlisting} + +\subsubsection{Tip} + +We require that any ledger can report its tip as a \lstinline!Point!. A +\lstinline!Point! is either genesis (no blocks have been applied yet) or a pair +of a hash and slot number; it is parametric over $l$ in order to allow +different ledgers to use different hash types. +% +\begin{lstlisting} +class GetTip l where + getTip :: l -> Point l +\end{lstlisting} + +\subsubsection{Ticking} + +We can now define the \lstinline!IsLedger! class as +% +\begin{lstlisting} +class (GetTip l, GetTip (Ticked l), ..) => IsLedger l where + type family LedgerErr l :: Type + applyChainTick :: LedgerCfg l -> SlotNo -> l -> Ticked l +\end{lstlisting} + +The type of \lstinline!applyChainTick! is similar to the type of +\lstinline!tickChainDepState! we saw in \cref{consensus:class:state}. +Examples of the time-based changes in the ledger state include activating +delegation certificates in the Byron ledger, or paying out staking rewards +in the Shelley ledger. + +Ticking is not allowed to fail (it cannot return an error). Consider what it +would mean if it \emph{could} fail: it would mean that a previous block was +accepted as valid, but set up the ledger state so that no matter what would +happen next, as soon as a particular moment in time is reached, the ledger would +fail to advance any further. Obviously, such a situation cannot be permitted to +arise (the block should have been rejected as invalid). + +Note that ticking does not change the tip of the ledger: no blocks have been +applied (yet). This means that we should have + +\begin{equation} + \mathtt{getTip} \; l += \mathtt{getTip} \; (\mathtt{applyChainTick}_\mathit{cfg} \; s \; l) +\end{equation} + +\subsubsection{Ledger errors} + +The inclusion of \lstinline!LedgerErr! in \lstinline!IsLedger! is perhaps +somewhat surprising. \lstinline!LedgerErr! is the type of errors that can arise +when applying blocks to the ledger, but block application is not yet defined +here. Nonetheless, a ledger can only be applied to a \emph{single} type of +block, and consequently can only have a \emph{single} type of error; the only +reason block application is defined separately is that a single type of +\emph{block} can be used with multiple ledgers (in other words, this is a +1-to-many relationship).\footnote{Defining \lstinline!LedgerErr! in +\lstinline!ApplyBlock! (\cref{ledger:api:ApplyBlock}) would result in ambiguous +types, since it would not refer to the \lstinline!blk! type variable of that +class.} + +\subsection{Applying blocks} +\label{ledger:api:ApplyBlock} + +If \lstinline!applyChainTick! was analogous to \lstinline!tickChainDepState!, +then \lstinline!applyLedgerBlock! and \lstinline!reapplyLedgerBlock! are +analogous to \lstinline!updateChainDepState! and +\lstinline!reupdateChainDepState!, respectively +(\cref{consensus:class:state}): apply a block to an already ticked +ledger state: +% +\begin{lstlisting} +class (IsLedger l, ..) => ApplyBlock l blk where + applyLedgerBlock :: + LedgerCfg l -> blk -> Ticked l -> Except (LedgerErr l) l + reapplyLedgerBlock :: + LedgerCfg l -> blk -> Ticked l -> l +\end{lstlisting} +% +The discussion of the difference between, and motivation for, the distinction +between application and reapplication in \cref{consensus:class:state} +about the consensus protocol state applies here equally. + +\subsection{Linking a block to its ledger} + +We mentioned at the start of \cref{ledger:api} that a single block can be used +with multiple ledgers. Nonetheless, there is one ``canonical'' ledger for each +block; for example, the Shelley block is associated with the Shelley ledger, +even if it can also be applied to the extended ledger state or the ledger +DB. We express this through a data family linking a block to its ``canonical +ledger state'': +% +\begin{lstlisting} +data family LedgerState blk :: Type +\end{lstlisting} +% +and then require that it must be possible to apply a block to its associated +ledger state +% +\begin{lstlisting} +class ApplyBlock (LedgerState blk) blk => UpdateLedger blk +\end{lstlisting} +% +(this is an otherwise empty class). For convenience, we then also introduce +some shorthand: +% +\begin{lstlisting} +type LedgerConfig blk = LedgerCfg (LedgerState blk) +type LedgerError blk = LedgerErr (LedgerState blk) +type TickedLedgerState blk = Ticked (LedgerState blk) +\end{lstlisting} + +\subsection{Projecting out the ledger view} +\label{ledger:api:LedgerSupportsProtocol} + +In \cref{overview:ledger} we mentioned that a consensus protocol may require +some information from the ledger, and in \cref{consensus:class:ledgerview} we +saw that this is modelled as the \lstinline!LedgerView! type family in the +\lstinline!ConsensusProtocol! class. A ledger and a consensus protocol are +linked through the block type (indeed, apart from the fundamental concepts we +have discussed so far, most of consensus is parameterised over blocks, not +ledgers or consensus protocols). Recall from \cref{BlockSupportsProtocol} that +the \lstinline!BlockProtocol! type family defines for each block what the +corresponding consensus protocol is; we can use this to define the projection of +the ledger view (defined by the consensus protocol) from the ledger state as +follows: +% +\begin{lstlisting} +class (..) => LedgerSupportsProtocol blk where + protocolLedgerView :: + LedgerConfig blk + -> Ticked (LedgerState blk) + -> Ticked (LedgerView (BlockProtocol blk)) + + ledgerViewForecastAt :: + LedgerConfig blk + -> LedgerState blk + -> Forecast (LedgerView (BlockProtocol blk)) +\end{lstlisting} +% +The first method extracts the ledger view out of an already ticked ledger state; +think of it as the ``current'' ledger view. Forecasting deserves a more detailed +discussion and will be the topic of the next section. + +\section{Forecasting} +\label{ledger:forecasting} + +\subsection{Introduction} + +In \cref{nonfunctional:network:headerbody} we discussed the need to validate +headers from upstream peers. In general, header validation requires information +from the ledger state. For example, in order to verify whether a Shelley header +was produced by the right node, we need to know the stake distribution (recall +that in Shelley the probability of being elected a leader is proportional to the +stake); this information is precisely what is captured by the +\lstinline!LedgerView! (\cref{consensus:class:ledgerview}). However, we cannot +update the ledger state with block headers only, we need the block bodies: after +all, to stay with the Shelley example, the stake evolves based on the +transactions that are made, which appear only in the block bodies. + +Not all is lost, however. The stake distribution used by the Shelley ledger for +the sake of the leadership check \emph{is not the \emph{current} stake +distribution}, but the stake distribution as it was at a specific point in the +past. Moreover, that same stake distribution is then used for all leadership +checks in a given period of time.\footnote{The exact details of precisely +\emph{how} the chain is split is not relevant to the consensus layer, and is +determined by the ledger layer.} In the depiction below, the stake distribution +as it was at point $b$ is used for the leadership checks near the current tip, +the stake distribution at point $a$ was used before that, and so forth: +% +\begin{center} +\begin{tikzpicture} +% /--------\ +% | | +% * v tip +% 1 -----+------------+------------+-----+ +% | * | | * current stake +% 0 -----+------------+------------+-----| +% -10 -8 -4 0 2 +% | ^ +% \----------/ +\draw (-10, 0.5) node{\ldots}; +\draw (-10, 0) -- (2, 0); +\draw (-10, 1) -- (2, 1); +\draw (-8, 0) -- (-8, 1); +\draw (-4, 0) -- (-4, 1); +\draw (0, 0) -- (0, 1); +\draw (2, 0) -- (2, 1) node[above]{tip}; +\draw (-6, -0.2) node {$\underbrace{\hspace{3.8cm}}$}; +\draw (-2, -0.2) node {$\underbrace{\hspace{3.8cm}}$}; +\draw ( 2, -0.2) node {$\underbrace{\hspace{3.8cm}}$}; +\draw [thick, arrows={-Triangle}] (-9, -1) node[fill=white] {$\ldots$}-- (-6, -1) -- (-6, -0.3); +\draw [thick, arrows={-Triangle}] (-5, 0.5) node[fill=white] {$\mathstrut a$} -- (-5, -1) -- (-2, -1) -- (-2, -0.3); +\draw [thick, arrows={-Triangle}] (-1, 0.5) node[fill=white] {$\mathstrut b$} -- (-1, -1) -- (2, -1) -- (2, -0.3); +\end{tikzpicture} +\end{center} +% +This makes it possible to \emph{forecast} what the stake distribution (i.e., +the ledger view) will be at various points. For example, if the chain looks like +% +\begin{center} +\begin{tikzpicture} +\draw (-10, 0.5) node{\ldots}; +\draw (-10, 0) -- (-0.5, 0); +\draw (-10, 1) -- (-0.5, 1); +\draw (-8, 0) -- (-8, 1); +\draw (-4, 0) -- (-4, 1); +\draw (-0.5, 0) -- (-0.5, 1) node[above]{tip}; +\draw (-6, -0.2) node {$\underbrace{\hspace{3.8cm}}$}; +\draw (-2, -0.2) node {$\underbrace{\hspace{3.8cm}}$}; +\draw ( 2, -0.2) node {$\underbrace{\hspace{3.8cm}}$}; +\draw [thick, arrows={-Triangle}] (-9, -1) node[fill=white] {$\ldots$}-- (-6, -1) -- (-6, -0.3); +\draw [thick, arrows={-Triangle}] (-5, 0.5) node[fill=white] {$\mathstrut a$} -- (-5, -1) -- (-2, -1) -- (-2, -0.3); +\draw [thick, arrows={-Triangle}] (-1, 0.5) node[fill=white] {$\mathstrut b$} -- (-1, -1) -- (2, -1) -- (2, -0.3); +\draw (0, 0.5) node[left] {$\mathstrut c$}; +\draw (0, 0.5) node[right] {$\mathstrut d$}; +\draw (4, 0.5) node[right] {$\mathstrut e$}; +\end{tikzpicture} +\end{center} +% +then we can ``forecast'' that the stake distribution at point $c$ will be the one +established at point $a$, whereas the stake distribution at point $d$ will be the +one established at point $b$. The stake distribution at point $e$ is however not +yet known; we say that $e$ is ``out of the forecast range''. + +\subsection{Code} + +Since we're always forecasting what the ledger would look like \emph{if it would +be advanced to a particular slot}, the result of forecasting is always something +ticked:\footnote{Actually we never deal with an \emph{unticked} ledger view.} +% +\begin{lstlisting} +data Forecast a = Forecast { + forecastAt :: WithOrigin SlotNo + , forecastFor :: SlotNo -> Except OutsideForecastRange (Ticked a) + } +\end{lstlisting} +% +Here \lstinline!forecastAt! is the tip of the ledger in which the forecast was +constructed and \lstinline!forecastFor! is constructing the forecast for a +particular slot, possibly returning an error message of that slot is out of +range. This terminology---a forecast constructed \emph{at} a slot +and computed \emph{for} a slot---is used throughout both this technical report +as well as the consensus layer code base. + +\subsection{Ledger view} +\label{forecast:ledgerview} + +For the ledger view specifically, the \lstinline!LedgerSupportsProtocol! +class (\cref{ledger:api:LedgerSupportsProtocol}) requires a function +% +\begin{lstlisting} +ledgerViewForecastAt :: + LedgerConfig blk + -> LedgerState blk + -> Forecast (LedgerView (BlockProtocol blk)) +\end{lstlisting} +% +This function must satisfy two important properties: +% +\begin{description} +\item[Sufficient range] + +When we validate headers from an upstream node, the most recent usable ledger +state we have is the ledger state at the intersection of our chain and the chain +of the upstream node. That intersection will be at most $k$ blocks back, because +that is our maximum rollback (\cref{consensus:overview:k}). Furthermore, it is +only useful to track an upstream peer if we might want to adopt their blocks, +and we only switch to their chain if it is longer than ours +(\cref{consensus:overview:chainsel}). This means that in the worst case +scenario, with the intersection $k$ blocks back, we need to be able to evaluate +$k + 1$ headers in order to adopt the alternative chain. However, the range of a +forecast is based on \emph{slots}, not blocks; since not every slot may contain +a block (\cref{time:slots-vs-blocks}), the range needs to be sufficient to +\emph{guarantee} to contain at least $k + 1$ blocks\footnote{Due to a +misalignment between the consensus requirements and the Shelley specification, +this is not the case for Shelley, where the effective maximum rollback is in +fact $k - 1$; see \cref{shelley:forecasting}).}; we will come back to this in +\cref{future:block-vs-slot}. + +The network layer may have additional reasons for wanting a long forecast +range; see \cref{nonfunctional:network:headerbody}. + +\item[Relation to ticking] +Forecasting is not the only way that we can get a ledger view for a particular +slot; alternatively, we can also \emph{tick} the ledger state, and then ask +for the ledger view at that ticked ledger state. These two ways should give us +the same answer: +% +\begin{equation} +\begin{array}{lllll} +\mathrm{whenever} & +\mathtt{forecastFor} \; (\mathtt{ledgerViewForecastAt}_\mathit{cfg} \; l) \; s & = & \mathtt{Right} & l' \\ +\mathrm{then} & \mathtt{protocolLedgerView}_\mathit{cfg} \; (\mathtt{applyChainTick}_\mathit{cfg} \; s \; l) & = && l' +\end{array} +\end{equation} +% +In other words, whenever the ledger view for a particular slot is within the +forecast range, then ticking the ledger state to that slot and asking for the +ledger view at the tip should give the same answer. Unlike forecasting, however, +ticking has no maximum range. The reason is the following fundamental difference between these two concepts: +% +\begin{quote} +\textbf{(Forecast vs. ticking)} When we \emph{forecast} a ledger view, we are +predicting what that ledger view will be, \emph{no matter which blocks will be +applied to the chain} between the current tip and the slot of the forecast. By +contrast, when we \emph{tick} a ledger, we are applying any time-related +changes to the ledger state in order to apply the \emph{next} block; in other +words, when we tick to a particular slot, \emph{there \emph{are} no blocks in +between the current tip and the slot we're ticking to}. Since there are no +intervening blocks, there is no uncertainty, and hence no limited range. +\end{quote} +\end{description} + +\section{Queries} +\label{ledger:queries} + +\section{Abandoned approach: historical states} diff --git a/ouroboros-consensus/docs/report/chapters/consensus/protocol.tex b/ouroboros-consensus/docs/report/chapters/consensus/protocol.tex new file mode 100644 index 00000000000..454dce70f50 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/consensus/protocol.tex @@ -0,0 +1,685 @@ +\chapter{Consensus Protocol} +\label{consensus} + +% TODO: what kind of variation does this design support? +% (counter-example: genesis rule) + +% TODO: Describe API +% +% TODO: State invariants +% +% TODO: Discuss relationship to the Ouroboros papers. Where are the various parts +% of the paper implemented? How do additional design constraints change this? +% (e.g. header/body split) + +\section{Overview} + +\subsection{Chain selection} +\label{consensus:overview:chainsel} + +Chain selection is the process of choosing between multiple competing chains, +and is one of the most important responsibilities of a consensus protocol. When +choosing between two chains, in theory any part of those chains could be +relevant; indeed, the research literature typically describes chain selection as +a comparison of two entire chains (\cref{bft-paper,praos-paper}). In practice +that is not realistic: the node has to do chain selection frequently, and +scanning millions of blocks each time to make the comparison is of course out of +the question. + +The consensus layer keeps the most recent headers as a \emph{chain fragment} +in memory (\cref{storage:inmemory}); the rest of the chain is stored on disk. +Similarly, we keep a chain fragment of headers in memory for every (upstream) +node whose chain we are following and whose blocks we may wish to adopt +(\cref{chainsyncclient}). Before the introduction of the hard fork combinator +chain selection used to be given these fragments to compare; as we will discuss +in \cref{hfc:intro}, however, this does not scale so well to hybrid chains. + +It turns out, however, that it suffices to look only at the headers at the very +tip of the chain, at least for the class of consensus algorithms we need to +support. The exact information we need about that tip varies from +one protocol to the other, but at least for the Ouroboros family of consensus +protocols the essence is always the same: we prefer longer chains over shorter +ones (justifying \emph{why} this is the right choice is the domain of +cryptographic research and well outside the scope of this report). In the +simplest case, the length of the chain is \emph{all} that matters, and hence the +only thing we need to know about the blocks at the tips of the chains is their +block numbers.\footnote{This is not \emph{entirely} true, due to the presence of +EBBs; see \cref{ebb-chain-selection}.} + +This does beg the question of how to compare two chains when one (or both) of +them are empty, since now we have no header to compare. We will resolve this by +stating the following fundamental assumption about \emph{all} chain selection +algorithms supported by the consensus layer: + +\begin{assumption}[Prefer extension] +\label{prefer-extension} +The extension of a chain is always preferred over that chain. +\end{assumption} + +A direct consequence of \cref{prefer-extension} is that a non-empty chain is +always preferred over an empty one,\footnote{Comparing empty chain +\emph{fragments}, introduced in \cref{storage:fragments}, is significantly more +subtle, and will be discussed in \cref{chainsel:fragments}.} but we will +actually need something stronger than that: we insist that shorter chains can +never be preferred over longer ones: + +\begin{assumption}[Never Shrink] +\label{never-shrink} +A shorter chain is never preferred over a longer chain. +\end{assumption} + +\Cref{never-shrink} does not say anything about chains of equal length; this will +be important for Praos (\cref{praos}). An important side-note here is that +the Ouroboros Genesis consensus protocol includes a chain selection rule +(the genesis rule) that violates \cref{never-shrink} (though not \cref{prefer-extension}); it also cannot be defined by only looking at the tips of chains. +It will therefore require special treatment; we will come back to this in +\cref{genesis}. + +\subsection{The security parameter $k$} +\label{consensus:overview:k} + +TODO\todo{TODO}. + +\section{The \lstinline!ConsensusProtocol! Class} +\label{consensus:class} + +We model consensus protocols as a single class called +\lstinline!ConsensusProtocol!; this class can be considered to be the +central class within the consensus layer. + +\begin{lstlisting} +class (..) => ConsensusProtocol p where +\end{lstlisting} + +The type variable $p$ is a type-level tag describing a particular consensus +protocol; if Haskell had open kinds\footnote{We will come back to this in +\cref{future:openkinds}.}, we could say \lstinline!(p :: ConsensusProtocol)!. +All functions within this class take an argument of type +% +\begin{lstlisting} +data family ConsensusConfig p :: Type +\end{lstlisting} +% +This allows the protocol to depend on some static configuration data; what +configuration data is required will vary from protocol to +protocol.\footnote{Explicitly modelling such a required context could be avoided +if we used explicit records instead of type classes; we will discuss this point +in more detail in \cref{technical:classes-vs-records}.} The rest of the +consensus layer does not really do much with this configuration, except make it +available where required; however, we do require that whatever the configuration +is, we can extract $k$ from it: +% +\begin{lstlisting} +protocolSecurityParam :: ConsensusConfig p -> SecurityParam +\end{lstlisting} +% +For example, this is used by the chain database to determine when blocks can be +moved from the volatile DB to the immutable DB (\cref{storage:components}). In +the rest of this section we will consider the various parts of the +\lstinline!ConsensusProtocol! class one by one. + +\subsection{Chain selection} +\label{consensus:class:chainsel} + +As mentioned in \cref{consensus:overview:chainsel}, chain selection will only +look at the headers at the tip of the ledger. Since we are defining consensus +protocols independent from a concrete choice of ledger, however +(\cref{decouple-consensus-ledger}), we cannot use a concrete block or header +type. Instead, we merely say that the chain selection requires \emph{some} view +on headers that it needs to make its decisions: + +\begin{lstlisting} +type family SelectView p :: Type +type SelectView p = BlockNo +\end{lstlisting} + +The default is \lstinline!BlockNo! because as we have seen this is all that is +required for the most important chain selection rule, simply preferring longer +chains over shorter ones. It is the responsibility of the glue code that +connects a specific choice of ledger to a consensus protocol to define the +projection from a concrete block type to this \lstinline!SelectView! +(\ref{BlockSupportsProtocol}). We then require that these views must be +comparable +% +\begin{lstlisting} +class (Ord (SelectView p), ..) => ConsensusProtocol p where +\end{lstlisting} +% +and say that one chain is (strictly) preferred over another if its +\lstinline!SelectView! is greater. If two chains terminate in headers with +the \emph{same} view, neither chain is preferred over the other, and we +could pick either one (we say they are equally preferable). + +Later in this chapter we will discuss in detail how our treatment of +consensus algorithms differs from the research literature (\cref{bft,praos}), +and in \cref{chainsel} we will see how the details of how chain selection +is implemented in the chain database; it is worth pointing out here, however, that the comparison based on \lstinline!SelectView! is not intended to capture + +\begin{itemize} +\item chain validity +\item the intersection point (checking that the intersection point is not too +far back, preserving the invariant that we never roll back more than $k$ blocks, +see \cref{consensus:overview:k}) +\end{itemize} + +Both of these responsibilities would require more than seeing just +the tip of the chains. They are handled independent of the choice of +consensus protocol by the chain database, as discussed in \cref{chainsel}. + +When two \emph{candidate} chains (that is, two chains that aren't our current) +are equally preferable, we are free to choose either one. However, when a +candidate chain is equally preferable to our current, we \emph{must} stick +with our current chain. This is true for all Ouroboros consensus protocols, +and we define it once and for all: + +\begin{lstlisting} +preferCandidate :: + ConsensusProtocol p + => proxy p + -> SelectView p -- ^ Tip of our chain + -> SelectView p -- ^ Tip of the candidate + -> Bool +preferCandidate _ ours cand = cand > ours +\end{lstlisting} + +\subsection{Ledger view} +\label{consensus:class:ledgerview} + +We mentioned in \cref{overview:ledger} that some consensus protocols may require +limited information from the ledger; for instance, the Praos consensus protocol +needs access to the stake distribution for the leadership check. In the +\lstinline!ConsensusProtocol! abstraction, this is modelled as a \emph{view} +on the ledger state + +\begin{lstlisting} +type family LedgerView p :: Type +\end{lstlisting} + +The ledger view will be required in only one function: when we ``tick'' the +state of the consensus protocol. We will discuss this state management in more +detail next. + +\subsection{Protocol state management} +\label{consensus:class:state} + +Each consensus protocol has its own type chain dependent state\footnote{We are +referring to this as the ``chain dependent state'' to emphasise that this is +state that evolves with the chain, and indeed is subject to rollback when we +switch to alternatives forks. This distinguishes it from chain +\emph{independent} state such as evolving private keys, which are updated +independently from blocks and are not subject to rollback.} + +\begin{lstlisting} +type family ChainDepState p :: Type +\end{lstlisting} + +The state must be updated with each block that comes in, but just like for +chain selection, we don't work with a concrete block type but instead define a +\emph{view} on blocks that is used to update the consensus state: + +\begin{lstlisting} +type family ValidateView p :: Type +\end{lstlisting} + +We're referring to this as the \lstinline!ValidateView! because updating the +consensus state also serves as \emph{validation} of (that part of) the block; +consequently, validation can also \emph{fail}, with protocol specific error +messages: + +\begin{lstlisting} +type family ValidationErr p :: Type +\end{lstlisting} + +Updating the chain dependent state now comes as a pair of functions. As for the ledger +(\cref{overview:ledger}), we first \emph{tick} the protocol state to the +appropriate slot, passing the already ticked ledger view as an +argument:\footnote{Throughout the consensus layer, the result of ticking is +distinguished from the unticked value at the type level. This allows to store +additional (or indeed, less) information in the ticked ledger state, but also +clarifies ordering. For example, it is clear in \lstinline!tickChainDepState! +that the ledger view we pass as an argument is already ticked, as opposed to the +\emph{old} ledger view.} + +\begin{lstlisting} +tickChainDepState :: + ConsensusConfig p + -> Ticked (LedgerView p) + -> SlotNo + -> ChainDepState p + -> Ticked (ChainDepState p) +\end{lstlisting} + +As an example, the Praos consensus protocol (\cref{praos}) derives its +randomness from the chain itself. It does that by maintaining a set of random +numbers called \emph{nonces}, which are used as seeds to pseudo-random number +generators. Every so often the current nonce is swapped out for a new one; this +does not depend on the specific block, but merely on a certain slot number being +reached, and hence is an example of something that the ticking function should +do. + +The (validation view on) a block can then be applied to the already ticked +protocol state: + +\begin{lstlisting} +updateChainDepState :: + ConsensusConfig p + -> ValidateView p + -> SlotNo + -> Ticked (ChainDepState p) + -> Except (ValidationErr p) (ChainDepState p) +\end{lstlisting} + +Finally, there is a variant of this function that can we used to \emph{reapply} +a known-to-be-valid block, potentially skipping expensive cryptographic checks, +merely computing what the new state is: + +\begin{lstlisting} +reupdateChainDepState :: + ConsensusConfig p + -> ValidateView p + -> SlotNo + -> Ticked (ChainDepState p) + -> ChainDepState p +\end{lstlisting} + +Re-applying previously-validated blocks happens when we are replaying blocks +from the immutable database when initialising the in-memory ledger state +(\cref{ledgerdb:initialisation}). It is also useful during chain selection +(\cref{chainsel}): depending on the consensus protocol, we may end up switching +relatively frequently between short-lived forks; when this happens, skipping +expensive checks can improve the performance of the node. \todo{How does this +relate to the best case == worst case thing? Or to the asymptotic +attacker/defender costs?} + +\subsection{Leader selection} +\label{consensus:class:leaderselection} + +The final responsibility of the consensus protocol is leader selection. First, +it is entirely possible for nodes to track the blockchain without ever producing +any blocks themselves; indeed, this will be the case for the majority of +nodes\footnote{Most ``normal'' users will not produce blocks themselves, but +instead delegate their stake to stakepools who produce blocks on their behalf.} +In order for a node to be able to lead at all, it may need access to keys and +other configuration data; the exact nature of what is required is different +from protocol to protocol, and so we model this as a type family + +\begin{lstlisting} +type family CanBeLeader p :: Type +\end{lstlisting} + +A value of \lstinline!CanBeLeader! merely indicates that the node has the +required configuration to lead at all. It does \emph{not} necessarily mean that +the node has the right to lead in any particular slot; \emph{this} is indicated +by a value of type \lstinline!IsLeader!: + +\begin{lstlisting} +type family IsLeader p :: Type +\end{lstlisting} + +In simple cases \lstinline!IsLeader! can just be a unit value (``yes, you are a +leader now'') but for more sophisticated consensus protocols such as Praos this +will be a cryptographic proof that the node indeed has the right to lead in this +slot. Checking whether a that \emph{can} lead \emph{should} lead in a given slot +is the responsibility of the final function in this class: + +\begin{lstlisting} +checkIsLeader :: + ConsensusConfig p + -> CanBeLeader p + -> SlotNo + -> Ticked (ChainDepState p) + -> Maybe (IsLeader p) +\end{lstlisting} + +\section{Connecting a block to a protocol} +\label{BlockSupportsProtocol} + +Although a single consensus protocol might be used with many blocks, any given +block is designed for a \emph{single} consensus protocol. The following type +family witnesses this relation:\footnote{For a discussion about why we +choose to make some type families top-level definitions rather than associate +them with a type class, see \cref{technical:toplevel-vs-associated}.} +% +\begin{lstlisting} +type family BlockProtocol blk :: Type +\end{lstlisting} +% +Of course, for the block to be usable with that consensus protocol, we need +functions that construct the \lstinline!SelectView! +(\cref{consensus:class:chainsel}) and \lstinline!ValidateView! +(\cref{consensus:class:state}) projections from that block: +% +\begin{lstlisting} +class (..) => BlockSupportsProtocol blk where + validateView :: + BlockConfig blk + -> Header blk -> ValidateView (BlockProtocol blk) + + selectView :: + BlockConfig blk + -> Header blk -> SelectView (BlockProtocol blk) +\end{lstlisting} +%% +The \lstinline!BlockConfig! is the static configuration required to work with +blocks of this type; it's just another data family: +% +\begin{lstlisting} +data family BlockConfig blk :: Type +\end{lstlisting} + +\section{Design decisions constraining the Ouroboros protocol family} +\label{design-decisions-constraining-ouroboros} + +\todo{TODO} TODO: Perhaps we should move this to conclusions; some of these +requirements may only become clear in later chapters (like the forecasting +range). + +\todo{TODO} TODO: The purpose of this section should be to highlight design +decisions we're already covering in this chapter that impose constraints +on existing or future members of the Ouroboros protocol family. + +For example, we at least have: +\begin{itemize} +\item max-K rollback, we insist that there be a maximum rollback length. This +was true for Ouroboros Classic, but is not true for Praos/Genesis, nevertheless +we insist on this for our design. We should say why this is so helpful for our +design. We should also admit that this is a fundamental decision on liveness vs +consistency, and that we're picking consistency over liveness. The Ouroboros +family is more liberal and different members of that family can and do make +different choices, so some adaptation of protocols in papers may be needed to +fit this design decision. In particular this is the case for Genesis. We cannot +implement Genesis as described since it is not compatible with a rollback limit. + +\item We insist that we can compare chains based only on their tips. For example +even length is a property of the whole chain not a block, but we insist that +chains include their length into the blocks in a verifiable way, which enables +this tip-only checking. Future Ouroboros family members may need some adaptation +to fit into this constraint. In particular the Genesis rule as described really +is a whole chain thing. Some creativity is needed to fit Genesis into our +framework: e.g. perhaps seeing it not as a chain selection rule at all but as a +different (coordinated) mode for following headers. + +\item We insist that a strict extension of a chain is always preferred over +that chain. + +\item We insist that we never roll back to a strictly shorter chain. + +\item The minimum cyclic data dependency time: the minimum time we permit +between some data going onto the chain and it affecting the validity of blocks +or the choices made by chain selection. This one is a constraint on both the +consensus algorithm and the ledger rules. For example this constrains the Praos +epoch structure, but also ledger rules like the Shelley rule on when genesis +key delegations or VRF key updates take effect. We should cover why we have this +constraint: arising from wanting to do header validation sufficiently in advance +of block download and validation that we can see that there's a potential longer +valid chain. + +\item The ledger must be able to look ahead sufficiently to validate $k + 1$ +headers (to guarantee a roll back of $k$). \todo{TODO}TODO: We should discuss +this in more detail. +\end{itemize} + +\section{Permissive BFT} +\label{bft} + +Defined in \cite{byron-chain-spec} +Not to be confused with ``Practical BFT'' \cite{10.1145/571637.571640} + +\subsection{Background} +\label{bft:background} + +\duncan +Discuss \emph{why} we started with Permissive BFT (backwards compatible with +Ouroboros Classic). + +\subsection{Implementation} + +\subsection{Relation to the paper} +\label{bft-paper} + +Permissive BFT is a variation on Ouroboros BFT, defined in +\cite{cryptoeprint:2018:1049}. We have included the main protocol description +from that paper as \cref{figure:bft} in this document; the only difference is +that we've added a few additional labels so we can refer to specific parts of +the protocol description below. + +It will be immediately obvious from \cref{figure:bft} that this description +covers significantly more than what we consider to be part of the consensus +protocol proper here. We will discuss the various parts of the BFT protocol +description below. + +\begin{description} + \item[Clock update and network delivery] The BFT specification requires that + ``with each advance of the clock (..) a collection of transactions and + blockchains are pushed to the server''. We consider neither block submission + nor transaction submission to be within the scope of the consensus algorithm; + see \cref{nonfunctional:network:blocksubmission,servers:blockfetch} and + \cref{nonfunctional:network:blocksubmission,servers:txsubmission} instead, respectively. + + \item[Mempool update] (\cref{bft:mempool}). The design of the mempool is the + subject of \cref{mempool}. Here we only briefly comment on how it relates to + what the BFT specification assumes: +% + \begin{itemize} + \item \textit{Consistency} (\cref{bft:mempool:consistency}). Our mempool + does indeed ensure consistency. In fact, we require something strictly + stronger; see \cref{mempool:consistency} for details. + \item \textit{Time-to-live (TTL)} (\cref{bft:mempool:ttl}). The BFT + specification requires that transactions stay in the mempool for a maximum + of $u$ rounds, for some configurable $u$. Our current mempool does not have + explicit support for a TTL parameter. The Shelley ledger will have support + for TTL starting with the ``Allegra'' era, so that transactions are only + valid within a certain slot window; this is part of the normal ledger rules + however and requires no explicit support from the consensus layer. That's + not to say that explicit support would not be useful; see \cref{future:ttl} + in the chapter on future work. + \item \textit{Receipts} (\cref{bft:mempool:receipts}). We do not offer any + kind of receipts for inclusion in the mempool. Clients such as wallets must + monitor the chain instead (see also \cite{wallet-spec}). The BFT + specification marks this as optional so this is not a deviation. + \end{itemize} +% + \item[Blockchain update] (\cref{bft:update}). The BFT specification requires + that the node prefers any valid chain over its own, as long as its strictly + longer. \emph{We do not satisfy this requirement.} The chain selection rule + for Permissive BFT is indeed the longest chain rule, \emph{but} consensus + imposes a global maximum rollback (the security parameter $k$; + \cref{consensus:overview:k}). In other words, nodes \emph{will} prefer longer + chains over its own, \emph{provided} that the intersection between that chain + and the nodes own chain is no more than $k$ blocks away from the node's tip. + \todo{Justify this maximum rollback?} + + Moreover, our definition of validity is also different. We do require that + hashes line up (\cref{bft:update:hash}), although we do not consider this part + of the responsibility of the consensus protocol, but instead require this + independent of the choice of consensus protocol when updating the header state + (\cref{storage:headerstate}). We do of course also require that the transactions in + the block are valid (\cref{bft:update:body}), but this is the responsibility + of the ledger layer instead (\cref{ledger}); the consensus protocol should be + independent from what's stored in the block body. + + Permissive BFT is however different from BFT \emph{by design} in the + signatures we require.\footnote{\label{footnote:singlesignature}There is + another minor deviation from the specification: we don't require an explicit + signature on the block body. Instead, we have a single signature over the + header, and the header includes a \emph{hash} of the body.} BFT requires that + each block is signed strictly according to the round robin schedule + (\cref{bft:update:signatures}); the whole point of \emph{permissive} BFT is + that we relax this requirement and merely require that blocks are signed by + \emph{any} of the known core nodes. + + Permissive BFT is however not \emph{strictly} more permissive than BFT: + although blocks do not need to be signed according to the round robin + schedule, there is a limit on the number of signatures by any given node in a + given window of blocks. When a node exceeds that threshold, its block is + rejected as invalid. Currently that threshold is set to 0.22 \cite[Appendix A, + Calculating the $t$ parameter]{byron-chain-spec}, which was considered to be + the smallest value that would be sufficiently unlikely to consider a chain + generated by Ouroboros Classic as invalid (\cref{bft:background}) and yet give + as little leeway to a malicious node as possible. This has an unfortunate side + effect, however. BFT can always recover from network partitions \cite[Section + 1, Introduction]{cryptoeprint:2018:1049}, but this is not true for PBFT: in a + setting with 7 core nodes (the same setting as considered in the PBFT + specification), a 4:3 network partition would quickly lead to \emph{both} + partitions being unable to produce more blocks; after all, the nodes in the + partition of 4 nodes would each sign 1/4th of the blocks, and the nodes in the + partition of 3 nodes would each sign 1/3rd. Both partitions would therefore + quickly stop producing blocks. Picking 0.25 for the threshold instead of 0.22 + would alleviate this problem, and would still be conform the PBFT + specification, which says that the value must be in the closed interval + $[\frac{1}{5}, \frac{1}{4}]$. Since PBFT is however no longer required (the + Byron era is past and fresh deployments would not need Permissive BFT but + could use regular BFT), it's probably not worth reconsidering this, although + it \emph{is} relevant for the consensus tests (\cref{testing:dire}). +% + \item[Blockchain extension] (\cref{bft:extension}). + The leadership check implemented as part of PBFT is conform specification + (\cref{bft:leadershipcheck}). The rest of this section matches the + implementation, modulo some details some of which we already alluded to above: +% + \begin{itemize} + \item The block format is slightly different; for instance, we only have a + single signature (\cref{footnote:singlesignature}). + \item Blocks in Byron have a maximum size, so we cannot necessarily take + \emph{all} valid transactions from the mempool. + \item Block diffusion is not limited to the suffix of the chain: clients + can request \emph{any} block that's on the chain. This is of course critical + to allow nodes to join the network later, something which the BFT paper does + not consider. + \end{itemize} +% + It should also be pointed out that we consider neither block production nor + block diffusion to be part of the consensus protocol at all; only the + leadership check itself is. + + \item[Ledger reporting]. + Although we do offer a way to query the state of the ledger + (\cref{ledger:queries}), we do not offer a query to distinguish between + finalised/pending blocks. + \todo{TODO} TODO: It's also not clear to me why the BFT specification would + consider a block to be finalised as soon as it's $3t + 1$ blocks deep + (where $t$ is the maximum number of core nodes). The paper claims that BFT + can always recover from a network partition, and the chain selection rule + in the paper requires supporting infinite rollback. + +\end{description} + +\begin{figure} +\small +\hrule +\textbf{Parameters}: + +\vspace{1em} + +\begin{tabular}{c|l} +$n$ & total number of core nodes \\ +$t$ & maximum number of core nodes \\ +$u$ & time to live (TTL) of a transaction \\ +\end{tabular} + +\vspace{1em} + +\textbf{Protocol}: \\ + +The $i$-th server locally maintains a blockchain $B_0 B_1 \ldots B_l$, an +ordered sequence of transactions called a mempool, and carries out the following +protocol: + +\begin{description} + \item[Clock update and network delivery] With each advance of the clock to a + slot $\mathit{sl}_j$, a collection of transactions and blockchains are pushed + to the server by the network layer. Following this, the server proceeds as + follows: + % + \begin{enumerate} + \item \textbf{Mempool update}.\label{bft:mempool} + \begin{enumerate} + \item \label{bft:mempool:consistency} Whenever a transaction + $\mathit{tx}$ is received, it is added to the mempool as long as it is + consistent with + \begin{enumerate} + \item the existing transactions in the mempool and + \item the contents of the local blockchain. + \end{enumerate} + \item \label{bft:mempool:ttl} The transaction is maintained in the + mempool for $u$ rounds, where $u$ is a parameter. + \item \label{bft:mempool:receipts} Optionally, when the transaction + enters the mempool the server can return a signed receipt back to the + client that is identified as the sender. + \end{enumerate} +% + \item \textbf{Blockchain update}.\label{bft:update} Whenever the server + becomes aware of an alternative blockchain + $B_0 B_1' \ldots B'_s$ + with $s > l$, it replaces its local chain with this new chain provided it is + valid, i.e. each one of its blocks + $(h, d, \mathit{sl}_j, \sigma_\mathit{sl}, \sigma_\mathrm{block})$ +% + \begin{enumerate} + \item \label{bft:update:signatures} contains proper signatures + \begin{enumerate} + \item one for time slot $\mathit{sl}_j$ and + \item one for the entire block + \end{enumerate} + by server $i$ such that $i - 1 = (j - 1) \bmod n$ + \item \label{bft:update:hash} $h$ is the hash of the previous block, and + \item \label{bft:update:body} $d$ is a valid sequence of transactions w.r.t. + the ledger defined by the transactions found in the previous blocks + \end{enumerate} +% + \item \textbf{Blockchain extension}.\label{bft:extension} Finally, the server + checks if it is responsible to issue the next block by testing if +% + \begin{equation} + i - 1 = (j - 1) \bmod n + \label{bft:leadershipcheck} + \end{equation} +% + In such case, this $i$-th server is the slot leader. It +% + \begin{itemize} + \item collects the set $d$ of all valid transactions from its mempool and + \item appends the block $B_{l+1} = (h, d, \mathit{sl}_j, \sigma_\mathit{sl}, \sigma_\mathrm{block})$ to its blockchain, where + \begin{equation*} + \begin{split} + \sigma_\mathit{sl} & = \mathsf{Sign}_{\mathsf{sk}_i}(\mathit{sl}_j) \\ + \sigma_\mathrm{block} & = \mathsf{Sign}_{\mathsf{sk}_i}(h, d, \mathit{sl}_j, \sigma_\mathit{sl}) \\ + h & = H(B_l) \\ + \end{split} + \end{equation*} + It then diffuses $B_{l+1}$ as well as any requested blocks from the suffix of its blockchain that covers the most recent $2t + 1$ slots. + \end{itemize} + + \end{enumerate} + + \item[Ledger Reporting] Whenever queried, the server reports as ``finalised'' the ledger of transactions contained in the blocks $B_0 \ldots B_m, m \le l$, where $B_m$ has a slot time stamp more than $3t + 1$ slots in the past. Blocks $B_{m+1} \ldots B_l$ are reported as ``pending''. +\end{description} + +\hrule +\caption{\label{figure:bft}Ouroboros-BFT \cite[Figure 1]{cryptoeprint:2018:1049}} +\end{figure} + +\section{Praos} +\label{praos} + +TODO: Discuss $\Delta$: When relating the papers to the implementation, we +loosely think of $\Delta$ as roughly having value 5, i.e., there is a maximum +message delay of 5 slots. However, this link to the paper is tenuous at best: +the messages the paper expects the system to send, and the messages that the +system \emph{actually} sends, are not at all the same. Defining how these relate +more precisely would be critical for a more formal statement of equivalence +between the paper and the implementation, but such a study is well outside the +scope of this report. + +\subsection{Active slot coefficient} +\label{praos:f} + +\subsection{Implementation} + +\subsection{Relation to the paper} +\label{praos-paper} + +\cite{cryptoeprint:2018:378} + +\section{Combinator: Override the leader schedule} +\label{consensus:override-leader-schedule} diff --git a/ouroboros-consensus/docs/report/chapters/consensus/serialisation.tex b/ouroboros-consensus/docs/report/chapters/consensus/serialisation.tex new file mode 100644 index 00000000000..7f0af8845df --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/consensus/serialisation.tex @@ -0,0 +1,515 @@ +\chapter{Serialisation abstractions} +\label{serialisation} + +Some of the various pieces of data that are handled by consensus also need to be +serialised to a binary format so that they can be: + +\begin{enumerate} +\item written/read to/from \emph{storage} (see \cref{storage}) or; +\item sent/received across the \emph{network} (e.g., headers via the chain sync + protocol \cref{chainsyncclient}). +\end{enumerate} + +The two serialisation purposes above have different requirements and are +independent of each other. For example, when establishing a network connection, +a version number is negotiated. We can vary the network serialisation format +depending on the version number, allowing for instance to include some more +information in the payload. A concrete example of this is that starting from a +certain version, we include the block size in the payload when sending a +Byron\todo{Can I talk about Byron here?} header across the network as the header +itself does not contain it. This kind of versioning only concerns the network +and is independent of the storage layer. Hence we define separate abstractions +for them, decoupling them from each other. + +For both abstractions, we use the CBOR (Concise Binary Object +Representation)\todo{command for acronyms?} format, because it has the following +benefits, paraphrasing the \texttt{cborg} library\todo{link?}: +\begin{itemize} +\item fast serialisation and deserialisation +\item compact binary format +\item stable format across platforms (32/64bit, big/little endian) +\item potential to read the serialised format from other languages +\item incremental or streaming (de)serialisation +\item suitable to use with untrusted input (resistance to asymmetric resource consumption attacks) +\item ... +\end{itemize} +Moreover, CBOR was chosen for the initial implementation of the Cardano +blockchain,\todo{correct?} with which we must maintain binary compatibility. +While it was possible to switch to another format for the block types developed +after the initial implementation, we saw no reason to switch. + +We will now discuss both serialisation abstractions in more detail. + +\section{Serialising for storage} +\label{serialisation:storage} + +The following data is stored on disk (see \cref{storage}): + +\begin{itemize} +\item Blocks +\item The extended ledger state (\cref{storage:extledgerstate}) which is the + combination of: + \begin{itemize} + \item The header state (\cref{storage:headerstate}) + \item The ledger state\todo{link?} + \end{itemize} +\end{itemize} + +We use the following abstraction for serialising data to and from disk: + +\begin{lstlisting} +class EncodeDisk blk a where + encodeDisk :: CodecConfig blk -> a -> Encoding + +class DecodeDisk blk a where + decodeDisk :: CodecConfig blk -> forall s. Decoder s a +\end{lstlisting} + +\begin{itemize} +\item These type classes have two type parameters: the block \lstinline!blk!, + over which most things are parameterised, and \lstinline!a!, the type to + (de)serialise. For example, \lstinline!a! can be the block type itself or the + type corresponding to the ledger state. +\item \lstinline!CodecConfig blk! is a data family that defines the extra + configuration needed for (de)serialisation. For example, to deserialise an EBB + (\cref{ebbs}), the number of slots per epoch needs to be known statically to + compute the slot of the block based on the epoch number, as the serialisation + of an EBB does not contain its slot number, but the in-memory representation + does. This configuration is kept as small as possible and is ideally empty. +\item The \lstinline!a -> Encoding! and \lstinline!forall s. Decoder s a! are + the types for respectively encoders and decoders of the \lstinline!cborg! + library.\todo{link?} +\item The encoder and decoder are split in two classes because they are not + always \emph{symmetric}: the instantiation of \lstinline!a! in the encoder is + not always the same as in the corresponding decoder. This is because blocks + are \emph{annotated} with their serialisation. We discuss this in more detail + in \cref{serialisation:annotations}. +\end{itemize} + +\subsection{Nested contents} +\label{serialisation:storage:nested-contents} + +By writing a block to disk we automatically have written the block's header to +disk, as the header is a part of the block. While we never write just a header, +we do support \emph{reading} just the header. This is more efficient than +reading the entire block and then extracting the header, as fewer bytes have to +be read from disk and deserialised. + +\begin{center} +\begin{tikzpicture} +\draw (0, 0) -- (10, 0); +\draw (0, 1) -- (10, 1); +\draw (0, 0) -- (0, 1); +\draw (1, 0) -- (1, 1); +\draw (4, 0) -- (4, 1); +\draw (10, 0) -- (10, 1); +\draw (5, 1.2) node {$\overbrace{\hspace{9.8cm}}$}; +\draw (5, 1.6) node[fill=white] {$\mathstrut block$}; +\draw (0.5, -0.2) node {$\underbrace{\hspace{0.8cm}}$}; +\draw (0.5, -0.6) node[fill=white] {$\mathstrut envelope$}; +\draw (2.5, -0.2) node {$\underbrace{\hspace{2.8cm}}$}; +\draw (2.5, -0.6) node[fill=white] {$\mathstrut header$}; +\draw (7, -0.2) node {$\underbrace{\hspace{5.8cm}}$}; +\draw (7, -0.6) node[fill=white] {$\mathstrut body$}; +\end{tikzpicture} +\end{center} + +Extracting the header from a block on disk can be very simple, like in the +figure above. The block starts with an envelope, which is followed by the block +header and the block body. In this case, we read the bytes starting from the +start of the header until the end of the header, which we then decode. We use +the following abstraction to represent this information: + +\begin{lstlisting} +data BinaryBlockInfo = BinaryBlockInfo { + headerOffset :: !Word16 + , headerSize :: !Word16 + } + +class HasBinaryBlockInfo blk where + getBinaryBlockInfo :: blk -> BinaryBlockInfo +\end{lstlisting} + +As the size of a header can vary on a per-block basis, we maintain this +information \emph{per block} in the storage layer.\todo{link?} We trade four +extra bytes of storage and memory space for faster reading of headers. + +However, it is not for every type of block the case that the encoding of a +header can literally be sliced out of the encoding of the corresponding block. +The serialisation of a header when embedded in a block might be different from +the serialisation of a header on its own. For example, the standalone header +might require an additional envelope or a different one than the block's +envelope. + +A concrete example of this are the Byron blocks and headers. A Byron block is +either a regular block or an epoch boundary block (EBB) (discussed in +\cref{ebbs}). A regular block has a different header than an EBB, consequently, +their encoding differs. The envelope of the encoding of a Byron block includes a +tag indicating whether the block is a regular block or an EBB, so that the +decoder knows what kind of header and body to expect. For the same reason, the +envelope of the encoding of a standalone Byron header includes the same tag. +However, when we slice out the header from the Byron block and feed that to the +decoder for Byron headers, the envelope containing the tag will be +\emph{missing}. + +The same problem presents itself for the hard fork combinator (\cref{hfc}): when +using the hard fork combinator to combine two block types, A and B, into one, +the block's envelope will (typically) indicate whether it is a block of type A +or B. The header corresponding to such a block will have a similar envelope. +When we slice the header out of such a block, the required envelope will be +missing. The right envelope has to be prepended so that the header decoder knows +whether it should expect A or B. + +The header is \emph{nested} inside the block and to be able to decode it, we +need some more \emph{context}, i.e., the envelope of the header. In the storage +layer (\cref{storage}), we store the context of each block in an index +(in-memory or on-disk, depending on the database) so that after reading both the +context and the sliced header, we can decode the header without having to read +and decode the entire block. We capture this idea in the following abstractions. + +\begin{lstlisting} +data family NestedCtxt_ blk :: (Type -> Type) -> (Type -> Type) +\end{lstlisting} +As usual, we parameterise over the block type. We also parameterise over another +functor, e.g., \lstinline!f!, which in practice will be instantiated to +\lstinline!Header!, but in the future, there might be more types of nested +contents, other than headers, e.g., block bodies. The constructors of this data +family will represent the different types of context available, e.g., for Byron +a context for regular blocks and a context for EBBs. + +\lstinline!NestedCtxt! is indexed by \lstinline!blk!: it is the block that +determines this type. However, we often want to partially apply the second +argument (the functor), leaving the block type not yet defined, hence we define: +\begin{lstlisting} +newtype NestedCtxt f blk a = NestedCtxt { + flipNestedCtxt :: NestedCtxt_ blk f a + } +\end{lstlisting} +The \lstinline!a! type index will correspond to the raw, sliced header that +requires the additional context. It can vary with the context, e.g., the context +for a Byron EBB will fix \lstinline!a! to a raw EBB header (without the +necessary envelope). + +Now that we have defined \lstinline!NestedCtxt!, we can define the class that +allows us to separate the nested type (the header) into the context and the raw, +sliced type (the raw header, \lstinline!a!), as well as the inverse: +\begin{lstlisting} +class (..) => HasNestedContent f blk where + unnest :: f blk -> DepPair (NestedCtxt f blk) + nest :: DepPair (NestedCtxt f blk) -> f blk +\end{lstlisting} +\lstinline!DepPair! is a dependent pair that allows us to hide the type +parameter \lstinline!a!. When writing a block, \lstinline!unnest! is used to +extract the context so that it can be stored in the appropriate index. When +reading a header, \lstinline!nest! is used to combine the context, read from the +appropriate index, with the raw header into the header. + +In certain scenarios, we do not have access to the separately stored context of +the block, but we do have access to the encoded block, in which case we should +be able to able to extract the context directly from the encoded block, without +having to decode it entirely. We use the \lstinline!ReconstructNestedCtxt! class +for this: +\begin{lstlisting} +class HasNestedContent f blk => ReconstructNestedCtxt f blk where + reconstructPrefixLen :: proxy (f blk) -> PrefixLen + reconstructNestedCtxt :: + proxy (f blk) + -> ShortByteString + -> .. + -> SomeSecond (NestedCtxt f) blk +\end{lstlisting} +The \lstinline!PrefixLen! is the number of bytes extracted from the beginning of +the encoded block required to reconstruct the context. The +\lstinline!ShortByteString! corresponds to these bytes. The +\lstinline!reconstructNestedCtxt! method will parse this bytestring and return +the corresponding context. The \lstinline!SomeSecond! type is used to hide the +type parameter \lstinline!a!. + +As these contexts and context-dependent types do not fit the mould of the +\lstinline!EncodeDisk! and \lstinline!DecodeDisk! classes described in +\cref{serialisation:storage}, we define variants of these classes: +\begin{lstlisting} +class EncodeDiskDepIx f blk where + encodeDiskDepIx :: CodecConfig blk + -> SomeSecond f blk -> Encoding + +class DecodeDiskDepIx f blk where + decodeDiskDepIx :: CodecConfig blk + -> Decoder s (SomeSecond f blk) + +class EncodeDiskDep f blk where + encodeDiskDep :: CodecConfig blk -> f blk a + -> a -> Encoding + +class DecodeDiskDep f blk where + decodeDiskDep :: CodecConfig blk -> f blk a + -> forall s. Decoder s (ByteString -> a) +\end{lstlisting} +\todo{explain?} + +\section{Serialising for network transmission} +\label{serialisation:network} + +The following data is sent across the network: +\begin{itemize} +\item Header hashes +\item Blocks +\item Headers +\item Transactions +\item Transaction IDs +\item Transaction validation errors +\item Ledger queries +\item Ledger query results +\end{itemize} +\todo{less whitespace} + +We use the following abstraction for serialising data to and from the network: + +\begin{lstlisting} +class SerialiseNodeToNode blk a where + encodeNodeToNode :: CodecConfig blk + -> BlockNodeToNodeVersion blk + -> a -> Encoding + decodeNodeToNode :: CodecConfig blk + -> BlockNodeToNodeVersion blk + -> forall s. Decoder s a + +class SerialiseNodeToClient blk a where + encodeNodeToClient :: CodecConfig blk + -> BlockNodeToClientVersion blk + -> a -> Encoding + decodeNodeToClient :: CodecConfig blk + -> BlockNodeToClientVersion blk + -> forall s. Decoder s a +\end{lstlisting} + +These classes are similar to the ones used for storage +(\cref{serialisation:storage}), but there are some important differences: + +\begin{itemize} +\item The encoders and decoders are always symmetric, which means we do not have + to separate encoders from decoders and can merge them in a single class. + Nevertheless, some of the types sent across the network still have to deal + with annotations (\cref{serialisation:annotations}), we discuss how we solve + this in \cref{serialisation:network:cbor-in-cbor}. +\item We have separate classes for \emph{node-to-node} and \emph{node-to-client} + serialisation.\todo{link?} + By separating them, we are more explicit about which data is serialised for + which type of connection. Node-to-node protocols and node-to-client protocols + have different properties and requirements. This also gives us the ability to, + for example, use a different encoding for blocks for node-to-node protocols + than for node-to-client protocols. +\item The methods in these classes all take a \emph{version} as argument. We + will discuss versioning in \cref{serialisation:network:versioning}. +\end{itemize} + +\subsection{Versioning} +\label{serialisation:network:versioning} + +As requirements evolve, features are added, data types change, constructors are +added and removed. For example, adding the block size to the Byron headers, +adding new ledger query constructors, etc. This affects the data we send across +the network. In a distributed network of nodes, it is a given that not all nodes +will simultaneously upgrade to the latest released version and that nodes +running different versions of the software, i.e., different versions of the +consensus layer, will try to communicate with each other. They should of course +be able to communicate with each other, otherwise the different versions would +cause partitions in the network. + +This means we should be careful to maintain binary compatibility between +versions. The network layer is faced with the same issue: as requirements +evolve, network protocols (block fetch, chain sync\todo{link?}) are modified +change (adding messages, removing messages, etc.), network protocols are added +or retired, etc. While the network layer is responsible for the network +protocols and the encoding of their messages, the consensus layer is responsible +for the encoding of the data embedded in these messages. Changes to either +should be possible without losing compatibility: a node should be able to +communicate successfully with other nodes that run a newer or older version of +the software, up to certain limits (old versions can be retired eventually). + +To accomplish this, the network layer uses \emph{versions}, one for each bundle +of protocols: +\begin{lstlisting} +data NodeToNodeVersion + = NodeToNodeV_1 + | NodeToNodeV_2 + | .. + +data NodeToClientVersion + = NodeToClientV_1 + | NodeToClientV_2 + | .. +\end{lstlisting} +For each backwards-incompatible change, either a change in the network protocols +or in the encoding of the consensus data types, a new version number is +introduced in the corresponding version data type. When the network layer +establishes a connection with another node or client, it will negotiate a +version number during the handshake: the highest version that both parties can +agree on. This version number is then passed to any client and server handlers, +which decide based on the version number which protocols to start and which +protocol messages (not) to send. A new protocol message would only be sent when +the version number is greater or equal than the one with which it was +introduced. + +This same network version is passed to the consensus layer, so we can follow the +same approach. However, we decouple the network version numbers from the +consensus version numbers for the following reason. A new network version number +is needed for each backwards-incompatible change to the network protocols or the +encoding of the consensus data types. This is clearly a strict superset of the +changes caused by consensus. When the network layer introduces a new protocol +message, this does not necessarily mean anything changes in the encoding of the +consensus data types. This means multiple network versions can correspond to the +same consensus-side encoding or \emph{consensus version}. In the other +direction, each change to the consensus-side encodings should result in a new +network version. We capture this in the following abstraction: +\begin{lstlisting} +class (..) => HasNetworkProtocolVersion blk where + type BlockNodeToNodeVersion blk :: Type + type BlockNodeToClientVersion blk :: Type + +class HasNetworkProtocolVersion blk + => SupportedNetworkProtocolVersion blk where + supportedNodeToNodeVersions :: + Proxy blk -> Map NodeToNodeVersion (BlockNodeToNodeVersion blk) + supportedNodeToClientVersions :: + Proxy blk -> Map NodeToClientVersion (BlockNodeToClientVersion blk) +\end{lstlisting} +The \lstinline!HasNetworkProtocolVersion! class has two associated types to +define the consensus version number types for the given block. When no +versioning is needed, one can use the unit type as the version number. The +\lstinline!SupportedNetworkProtocolVersion! defines the mapping between the +network and the consensus version numbers. Note that this does not have to be an +injection, as multiple network version can most certainly map to the same +consensus version. Nor does this have to be a surjection, as old network and +consensus versions might be removed from the mapping when the old version no +longer needs to be supported. This last reason is also why this mapping is +modelled with a \lstinline!Map! instead of a function: it allows enumerating a +subset of all defined versions, which is not possible with a function. + +\todo{TODO} Global numbering vs multiple block types + +The \lstinline!SerialiseNodeToNode! and \lstinline!SerialiseNodeToClient! +instances can then branch on the passed version to introduce changes to the +encoding format, for example, the inclusion of the block size in the Byron +header encoding. + +Consider the following scenario: a change is made to one of the consensus data +types, for example, a new query constructor is added the ledger query data type. +This requires a new consensus and thus network version number, as older versions +will not be able to decode it. What should be done when the new query +constructor is sent to a node that does not support it (the negotiated version +is older than the one in which the constructor was added)? If it is encoded and +send, the receiving side will fail to decode it and terminate its +connection.\todo{right?} This is rather confusing to the sender, as they are +left in the dark. Instead, we let the \emph{encoder} throw an exception in this +case, terminating that connection, so that the sender is at least notified of +this. \todo{TODO} Ideally, we could statically prevent such cases. + +\subsection{CBOR-in-CBOR} +\label{serialisation:network:cbor-in-cbor} + +In \cref{serialisation:annotations}, we explain why the result of the decoder +for types using \emph{annotations} needs to be passed the original encoding as a +bytestring. When reading from disk, we already have the entire bytestring in +memory,\todo{explain why} so it can easily be passed to the result of the +decoder. However, this is not the case when receiving a message via the network +layer: the entire message, containing the annotated type(s), is decoded +incrementally.\todo{right?} When decoding CBOR, it is not possible to obtain the +bytestring corresponding to what the decoder is decoding. To work around this, +we use \emph{CBOR-in-CBOR}: we encode the original data as CBOR and then encode +the resulting bytestring as CBOR \emph{again}. When decoding CBOR-in-CBOR, after +decoding the outer CBOR layer, we have exactly the bytestring that we will need +for the annotation. Next, we feed this bytestring to the original decoder, and, +finally, we pass the bytestring to the function returned by the decoder. + +\subsection{Serialised} +\label{serialisation:network:serialised} + +One of the duties of the consensus layer is to serve blocks and headers to other +nodes in the network.\todo{link?} To serve for example a block, we read it from +disk, deserialise it, and then serialise it again and send it across the +network. The costly deserialisation and serialisation steps cancel each other +out and are thus redundant. We perform this optimisation in the following way. +When reading such a block from storage, we do not read the \lstinline!blk!, but +the \lstinline!Serialised blk!, which is a phantom type around a raw, still +serialised bytestring: +\begin{lstlisting} +newtype Serialised a = Serialised ByteString +\end{lstlisting} +To send this serialised block over the network, we have to encode this +\lstinline!Serialised blk!. As it happens, we use CBOR-in-CBOR to send both +blocks and headers over the network, as described in +\cref{serialisation:network:cbor-in-cbor}. This means the serialised block +corresponds to the inner CBOR layer and that we only have to encode the +bytestring again as CBOR, which is cheap. + +This optimisation is only used to \emph{send} and thus encode blocks and +headers, not when \emph{receiving} them, because each received block or header +will have to be inspected and validated, and thus deserialised anyway. + +As discussed in \cref{serialisation:storage:nested-contents}, reading a header +(nested in a block) from disk requires reading the context and the raw header, +and then combining them before we can deserialise the header. This means the +approach for serialised headers differs slightly: +\begin{lstlisting} +newtype SerialisedHeader blk = SerialisedHeaderFromDepPair { + serialisedHeaderToDepPair :: GenDepPair Serialised + (NestedCtxt Header blk) + } +\end{lstlisting} +This is similar to the \lstinline!DepPair (NestedCtxt f blk)! type from +\cref{serialisation:storage:nested-contents}, but this time the raw header is +wrapped in \lstinline!Serialised! instead of being deserialised. + +\section{Annotations} +\label{serialisation:annotations} + +\todo{TODO} move up? The previous two sections refer to this + +The following technique is used in the Byron and Shelley ledgers for a number of +data types like blocks, headers, transactions, \ldots The in-memory representation +of, for example a block, consists of both the typical fields describing the +block (header, transactions, \ldots), but also the \emph{serialisation} of the block +in question. The block is \emph{annotated} with its serialisation. + +The principal reason for this is that it is possible that multiple +serialisations, each which a different hash, correspond to the same logical +block. For example, a client sending us the block might encode a number using a +binary type that is wider than necessary (e.g., encoding the number 0 using four +bytes instead of a single byte). CBOR defines a \emph{canonical format}, we call +an encoding that is in CBOR's canonical format a \emph{canonical +encoding}.\todo{link?} + +When after deserialising a block in a non-canonical encoding, we serialise it +again, we will end up with a different encoding, i.e., the canonical encoding, +as we stick to the canonical format. This means the hash, which is part of the +blockchain, is now different and can no longer be verified. + +For this reason, when deserialising a block, the original, possibly +non-canonical encoding is retained and used to annotate the block. To compute +the hash of the block, one can hash the annotated serialisation. + +Besides solving the issue with non-canonical encodings, this has a performance +advantage, as encoding such a block is very cheap, it is just a matter of +copying the in-memory annotation. + +\todo{TODO} We rely on it being cheap in a few places, mention that/them? + +\todo{TODO} extra memory usage + +This means that the result of the decoder must be passed the original encoding +as a bytestring to use as the annotation of the block or other type in question. +Hence the decoder corresponding to the encoder \lstinline!blk -> Encoding! has +type \lstinline!forall s. Decoder s (ByteString -> blk)!, which is a different +instantiation of the type \lstinline!a!, explaining the split of the +serialisation classes used for storage (\cref{serialisation:storage}). The +original encoding is then applied to the resulting function to obtain the +annotated block. This asymmetry is handled in a different way for the network +serialisation, namely using CBOR-in-CBOR +(\cref{serialisation:network:cbor-in-cbor}). + +\subsection{Slicing} + +\todo{TODO} discuss the slicing of annotations with an example. What is the +relation between the decoded bytestring and the bytestring passed to the +function the decoder returns? Talk about compositionality. diff --git a/ouroboros-consensus/docs/report/chapters/future/ebbs.tex b/ouroboros-consensus/docs/report/chapters/future/ebbs.tex new file mode 100644 index 00000000000..376af1e673e --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/future/ebbs.tex @@ -0,0 +1,192 @@ +\chapter{Epoch Boundary Blocks} +\label{ebbs} + +\section{Introduction} + +Recall that when a new epoch begins, the active stake distribution in the new +epoch---that is, the stake distribution used to determine the leader schedule--- +is not the stake distribution as it was at the end of the last epoch, but +rather as it was some distance back: +% +\begin{center} +\begin{tikzpicture} +\draw + (0 , 0) + -- (0.5 , 0) node{$\bullet$} + -- (1.5 , 0) node{$\bullet$} + -- (2 , 0) node{$\bullet$} + -- (2.5 , 0); +\node at (3,0) {$\cdots$}; +\draw + (3.5 , 0) + -- (4 , 0) node{$\bullet$} + -- (6 , 0) node{$\bullet$} + -- (7 , 0) node[right]{$\cdots$}; +\draw [very thick] (4.5,0.5) -- (4.5,-1) node[below] {epoch boundary}; +\draw [->, dotted] (5,0) to [out=135,in=45] (0.6,0.1); +\path (5,0) -- (0.6,0.1) node[pos=0.5,above=1] {stake distribution from}; +\end{tikzpicture} +\end{center} +% +This means that blocks cannot influence the active stake distribution until +some time in the future. That is important, because when a malicious node +forks off from the honest chain, the leadership schedule near the intersection +point cannot be influenced by the attacker, allowing us to compare chain +density and choose the honest chain (which will be denser because of the +assumed honest majority); see \cref{genesis} for an in-depth discussion. + +In the literature, the term ``epoch boundary block'' (or EBB for short) normally +simply refers to the last block in any given epoch (for example, +see~\cite{buterin2020combining}). It might therefore be a bit surprising to find +the term in this report since the final block in an epoch is not of special +interest in the Ouroboros family of consensus protocols. However, in the first +implementation of the Byron ledger (using the original Ouroboros protocol +\cite{cryptoeprint:2016:889}, which we now refer to as ``Ouroboros Classic''), a +decision was made to include the leadership schedule for each new epoch as an +explicit block on the blockchain; the term EBB was used to refer to this special +kind of block:\footnote{It is not entirely clear if an EBB should be regarded as +the final block in an epoch, or as the first block in the next epoch. The name +would suggest that the former interpretation is more appropriate; as it turns +out, however, the very first epoch on the chain \emph{starts} with an EBB, +recording the leadership schedule derived from the genesis block. We will +therefore regard the EBB as starting an epoch, rather than ending one.} +% +\begin{center} +\begin{tikzpicture} +\draw + (0 , 0) + -- (0.5 , 0) node{$\bullet$} + -- (1.5 , 0) node{$\bullet$} + -- (2 , 0) node{$\bullet$} + -- (2.5 , 0); +\node at (3,0) {$\cdots$}; +\draw + (3.5 , 0) + -- (4 , 0) node{$\bullet$} + -- (6 , 0) node{$\bullet$} + -- (7 , 0) node[right]{$\cdots$}; +\draw [very thick] (4.5,0.5) -- (4.5,-1) node[below] {epoch boundary}; +\draw [->, dotted] (5,0) to [out=135,in=45] (0.6,0.1); +\path (5,0) -- (0.6,0.1) node[pos=0.5,above=1] {records leadership schedule based on}; +\node at (5,0) {$\blacksquare$}; +\node [below=0.1] at (5,0) {EBB}; +\end{tikzpicture} +\end{center} + +Having the leadership schedule explicitly recorded on-chain turns out not to +be particularly useful, however, and the code was modified not to produce EBBs +anymore even before we switched from Byron to Shelley (as part of the OBFT hard +fork, see \cref{overview:history}); these days, the contents of the existing +EBBs on the chain are entirely ignored. Unfortunately, we cannot forget about +EBBs altogether because---since they are an actual block on the +blockchain---they affect the chain of block hashes: the first ``real'' block in +each epoch points to the EBB as its predecessor, which then in turns points to +the final block in the previous epoch. + +So far, none of this is particularly problematic to the consensus layer. Having +multiple types of blocks in a ledger presents some challenges for serialisation +(\cref{serialisation:storage:nested-contents}), but does not otherwise affect +consensus much: after all, blocks are interpreted by the ledger layer, not by +the consensus layer. Unfortunately, however, the design of the Byron EBBs has +odd quirk: an EBB has the same block number as its \emph{predecessor}, and the +same slot number as its \emph{successor}: +% +\begin{center} +\begin{tikzpicture} +\draw + (0 , 0) + -- (0.5 , 0) node{$\bullet$} + -- (1.5 , 0) node{$\bullet$} + -- (2 , 0) node{$\bullet$} + -- (2.5 , 0); +\node at (3,0) {$\cdots$}; +\draw + (3.5 , 0) + -- (4 , 0) node{$\bullet$} + -- (6 , 0) node{$\bullet$} + -- (7 , 0) node[right]{$\cdots$}; +\draw [very thick] (4.5,0.5) -- (4.5,-0.5); +\node at (5,0) {$\blacksquare$}; +\node [below=0.1] at (5,0) {EBB}; +% +\draw [dotted] + (3.75, -0.2) + -- ++(0, 1) + -- ++(1.5, 0) node[pos=0.5,above] {same block number} + -- ++(0, -1) + -- cycle; +\draw [dotted] + (4.75, -0.8) + -- ++(0, 1) + -- ++(1.5, 0) + -- ++(0, -1) + -- cycle node[pos=0.5,below] {same slot number}; +\end{tikzpicture} +\end{center} +% +This turns out to be a huge headache. When we started the rewrite, I think we +underestimated quite how many parts of the system would be affected by the +possibility of having multiple blocks with the same block number and +multiple blocks with the same slot number on a single chain. Some examples +include: + +TODO: List of examples + +In hindisght, we should have tried harder to eliminate EBBs from the get-go. In +this chapter, we will discuss two options for modifying the existing design to +reduce the impact of EBBs (\cref{ebbs:logical}), or indeed eliminate them +altogether (\cref{ebbs:elimination}). + +\section{Logical slot/block numbers} +\label{ebbs:logical} + +\section{Eliminating EBBs altogether} +\label{ebbs:elimination} + + + + + + + + + +% For the Ouroboros family of consensus +% protocols, the last block in an epoch is not of special interest, so it +% might be surprising to see the term EBB in this report. +% +% When a new epoch +% begins---that is, at an epoch boundary---the active stake distribution shifts, +% but it is not based on the final block in the previous epoch, but instead on +% the ledger state as it was quite a bit earlier. As a consequence, blocks cannot +% have an effect on the leadership schedule until they are a certain depth into +% the chain. This is important, because it means that if there is a fork in the +% chain, the leadership schedule after the intersection point will be determined +% by the common prefix of both chains. + + +% +% +% section 3 (The Ouroboros Protocol) +% +% Stage 1: "There is an initial stake distribution which is hardcoded into the genesis block" + + +% Discuss that although EBBs are a Byron concern, their presence has far reaching +% consequences on the consensus later. In hindsight, we should have tried harder +% to not deal with them at all from the beginning; we did not anticipate quite how +% bad the situation would be. We now have a plan for getting rid of them +% (\cref{decontamination-plan}) but it will be a fairly long term thing and it +% might not happen at all, depending on quite how much time is available for +% removing tech debt. +% +% +% \section{Introduction} +% +% \section{Consequences} +% +% \subsection{Chain selection} +% \label{ebb-chain-selection} +% +% \section{Elimination} +% \label{decontamination-plan} diff --git a/ouroboros-consensus/docs/report/chapters/future/genesis.tex b/ouroboros-consensus/docs/report/chapters/future/genesis.tex new file mode 100644 index 00000000000..ee9daf619e8 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/future/genesis.tex @@ -0,0 +1,1387 @@ +\newcommand{\RequiredPeers}{\ensuremath{N_\mathit{rs}}} + +\chapter{Ouroboros Genesis} +\label{genesis} + +\section{Introduction} + +\subsection{Background: understanding the Longest Chain rule} +\label{genesis:background:longest-chain} + +Recall the Praos chain selection rule: + +\begin{definition}[Longest Chain Rule] +\label{longest-chain-rule} +A candidate chain is preferred over our current chain if +% +\begin{enumerate} +\item it is longer than our chain, and +\item the intersection point is no more than $k$ blocks away from our tip. +\end{enumerate} +\end{definition} + +The purpose of chain selection is to resolve temporary forks that arise from the +normal operation of the protocol (such as when there are multiple leaders in a +single slot), and---importantly---to distinguish honest chains from chains +forged by malicious nodes. It is not a priori clear why choosing longer chains +over shorter chains would help distinguish malicious chains from honest chains: +why would an honest chain be longer? + +Recall that the leadership schedule is based on stake: a node's probability of +being elected a leader in a given slot is proportional to their stake. By +assumption, the malicious nodes in the system together have less stake than the +honest nodes; security of the system as a whole critically depends on the +presence of this honest majority. This means that when a malicious node extends +the chain they can only produce a chain with relatively few filled slots: the +honest chain will be \emph{denser}. At least, this will be true near the +intersection point: as we get further away from that intersection point, the +malicious node can attempt to influence the leadership schedule for future slots +to their advantage. + +The Praos security analysis \cite{cryptoeprint:2017:573} tells us that provided +all (honest) nodes are online all the time, they will all share the same chain, +except for blocks near the tips of those chains. Moreover, blocks with a slot +number ahead of the wall clock are considered invalid. This means that the only +way\footnote{The chain sync client does actually allow for some clock skew. +Headers that exceed the clock skew are however not included in chain selection.} +for one chain to be longer than another is by having more filled slots between +the tip of the shared prefix and ``now'': in other words, they must be +\emph{denser}. +% +\begin{center} +\begin{tikzpicture} +\draw (0,0) -- (5,0) coordinate(branch) node{$\bullet$} node[pos=0.5,below]{$\underbrace{\hspace{5cm}}_\text{shared prefix}$}; +\draw (branch) -- ++(1, 0.9) -- ++(2,0); +\draw (branch) -- ++(1, 0.3) -- ++(2,0); +\draw (branch) -- ++(1, -0.3) -- ++(2,0); +\draw (branch) -- ++(1, -0.9) -- ++(2,0); +\draw [ultra thick] (8,-1.5) -- (8,1.5) node[above]{now}; +\end{tikzpicture} +\end{center} +% +This motivates the first part of the Longest Chain rule: chain length is a +useful proxy for chain density. The second part of the rule---the intersection +point is no more than $k$ blocks away from our tip---is important because we can +only meaningfully compare density \emph{near the intersection point}. As we get +further away from the intersection point, an adversary can start to influence +the leadership schedule. This means that if the adversary's chain forks off from +the honest chain far back enough, they can construct a chain that is longer than +the honest chain. The Longest Chain rule therefore restricts rollback, so that +we will simply not even consider chains that fork off that far back. We can +still resolve minor forks that happen in the honest chain during the normal +operation of the protocol, because---so the analysis guarantees---those will not +be deeper than $k$ blocks. + +\subsection{Nodes joining late} +\label{genesis:background:joining-late} + +When new nodes join the network (or rejoin after having been offline for a +while), they don't have the advantage of having been online since the start of +the system, and have no sufficiently long prefix of the honest chain available. +As we saw at the end of \cref{genesis:background:longest-chain}, simply looking +at chain length is insufficient to distinguish the honest chain from malicious +chains: given enough time, an adversary can produce a chain that is longer than +the honest chain: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (-2,0) node{$\bullet$} -- (0,0); +\draw (0,0) node{$\bullet$} -- (6,0) node{$\bullet$} node[right] {honest chain}; +\draw (0,0) -- (0,-1) -- (8,-1) node{$\bullet$} node[right]{adversary's chain}; +\path (0,0) -- (6,0) node[pos=0.5, above]{$\overbrace{\hspace{6cm}}^{\text{$\gg k$ blocks}}$}; +\end{tikzpicture} +\end{center} +% +When a node's current chain is somewhere along that common prefix and uses the +longest chain rule, they will choose the adversary's chain rather than the +honest chain. Moreover, they will now be unable to switch to the honest chain, +because the intersection point with that chain is more than $k$ blocks ago. If a +node would get a ``leg up'' in the form of a reliable message telling which +chain to adopt when joining the network (such a message is known as a +``checkpoint'' in the consensus literature), the Praos rule from that point +forward would prevent them from (permanently) adopting the wrong chain, but +Praos cannot be used to help nodes ``bootstrap'' when they are behind. + +So far we have just been discussing Praos as it is described in theory. The +situation in practice is worse. In the abstract models of the consensus +algorithm, it is assumed entire chains are being broadcast and validated. In +reality, chains are downloaded and validated one block at a time. +We therefore don't see a candidate chain's \emph{true} length; instead, the +length of a candidate we see depends on how much of that candidate's chain we +have downloaded\footnote{Even if nodes did report their ``true length'' we would +have no way of verifying this information until we have seen the entire chain, +so we can make no use of this information for the purpose of chain selection.}. +Defining chain selection in terms of chain length, where our \emph{perceived} +chain length depends on what we decide to download, is obviously rather +circular. In terms of the above discussion, it means that the adversary's chain +doesn't even need to be longer than the honest chain: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (-2,0) node{$\bullet$} -- (0,0); +\draw (0,0) node{$\bullet$} -- (10,0) node{$\bullet$} node[right] {honest chain}; +\draw + (0,0) + -- (0,-1) + -- (4,-1) + node{$\bullet$} + node[right]{adversary's chain} + node[pos=0.5, below]{$\underbrace{\hspace{4cm}}_{\text{$> k$ blocks}}$}; +\end{tikzpicture} +\end{center} +% +If the adversary's chain contains more than $k$ blocks after the intersection +point, and we \emph{happen} to download that chain first, we would adopt it and +subsequently be unable to switch to the honest chain; after all, that would +involve a rollback of more than $k$ blocks, which the Praos rule forbids. + +\clearpage + +\subsection{The Density rule} +\label{genesis:background:density-rule} + +It is therefore clear that we need a different chain selection rule, and the +Ouroboros Genesis paper \cite{cryptoeprint:2018:378} proposes one, shown in +\cref{genesis:maxvalid-bg}. In this chapter we will work with a slightly +simplified (though equivalent) form of this rule, which we will term the +Density Rule: +% +\begin{definition}[Density Rule] +A candidate chain is preferred over our current chain if it is denser +(contains more blocks) in a window of $s$ slots anchored at the intersection +between the two chains. +\end{definition} +% +(We will briefly discuss the differences between the rule in the paper and this +one in \cref{genesis:original}.) Technically speaking, $s$ is a +parameter of the rule, but the following default is a suitable choice both from +a chain security perspective and from an implementation perspective:\footnote{If +we change the epoch size, this value might have to be reconsidered, along with +the ledger's stability window.} + +\begin{definition}[Genesis window size] +The genesis window size $s$ will be set to $s = 3k/f$. +\end{definition} + +Unlike the Longest Chain rule, the Density rule does not impose a maximum +rollback. It does not need to, as it always considers density \emph{at the +intersection point}. This means that in a situation such as +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (-2,0) node{$\bullet$} -- (0,0); +\draw (0,0) node{$\bullet$} -- (10,0) node{$\bullet$} node[right] {honest chain}; +\draw + (0,0) + -- (0,-1) + -- (4,-1) + node{$\bullet$} node[right]{adversary's chain} + node[pos=0.5, below]{$\underbrace{\hspace{4cm}}_{\text{$> k$ blocks}}$}; +\end{tikzpicture} +\end{center} +% +if we happen to see and adopt the adversary's chain first, we can still adopt +the honest chain (which will be denser at the intersection point, because of the +honest majority).This would however involve a rollback of more than $k$ blocks; +we will discuss how we can avoid such long rollbacks in +\cref{genesis:avoiding-long-rollbacks}. + +\section{Properties of the Density rule} + +In this section we will study the Density rule and prove some of its +properties. The improved understanding of the rule will be beneficial in the +remainder of this chapter. + +\subsection{Equivalence to the Longest Chain rule} + +For nodes that are up to date, the Density rule rule does not change how chain +selection works. + +\begin{lemma} +\label{lemma:tip-density-is-chain-length} +When comparing two chains with an intersection that is at most $s$ slots away +from the two tips, the Density rule just prefers the longer chain. +\end{lemma} + +\begin{proof} +The two chains share a common prefix, and differ only in the blocks +within the last $s$ slots: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (0,0) -- (6,0) coordinate (I); +\draw (I) -- ++(1,1) -- ++(1.5,0); +\draw (I) -- ++(1,-1) -- ++ (2,0); +\draw [dashed] + (I) + -- ++(0, 1.5) + -- ++(4, 0) node[pos=0.5,above]{$s$ slots} + -- ++(0, -3) + -- ++(-4, 0) + -- cycle; +\end{tikzpicture} +\end{center} +% +Since the chain length in this case is simply the length of the length of the +common prefix plus the number of blocks in the window (i.e., their density), +the longer chain will also be denser in the window. + +Just to be very explicit, \cref{lemma:tip-density-is-chain-length} does +\emph{not} hold when the intersection is more than $s$ slots away: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (0,0) -- (3,0) coordinate (I); +\draw (I) -- ++(1,1) -- ++(5.5,0); +\draw (I) -- ++(1,-1) -- ++ (6,0); +\draw [dashed] + (I) + -- ++(0, 1.5) + -- ++(4, 0) node[pos=0.5,above]{$s$ slots} + -- ++(0, -3) + -- ++(-4, 0) + -- cycle; +\end{tikzpicture} +\end{center} +% +In this case of course the longer chain may well not be denser in the window. +\end{proof} + +\clearpage + +\begin{lemma}[Rule Equivalence] +\label{lemma:rule-equivalence} +When comparing two chains with an intersection that is at most $k$ blocks away +from the two tips, the Density rule and the Longest Chain rule are equivalent. +\end{lemma} + +\begin{proof} +First, observe that since the intersection is at most $k$ blocks away, the +maximum rollback condition of the Longest Chain rule is trivially satisfied. +Remains to show that the intersection is at most $s$ slots away, so that we can +apply \cref{lemma:tip-density-is-chain-length}. This is easiest to see by +contradiction: suppose the intersection is \emph{more} than $s$ slots away. Then +we would have a section on the chain which is more than $s$ slots long but +contains fewer than $k$ blocks; the analysis +\cite{cryptoeprint:2017:573,cryptoeprint:2018:378} tells us that the probability +of this happening is negligibly small (provided $s$ is at least $3k/f$). +\end{proof} + +\Cref{lemma:rule-equivalence} has a corollary that is of practical importance +for the consensus layer, as it re-establishes an invariant that we rely on +(\cref{never-shrink}): + +\begin{lemma} +Alert nodes (that is, an honest node that has consistently been online) +will never have to switch to a shorter chain. +\end{lemma} + +\begin{proof} +The Ouroboros Genesis analysis \cite{cryptoeprint:2018:378} tells us that +alert nodes will never have to roll back by more than $k$ blocks. In other +words, the intersection between their current chain and any chain they might +have to switch to will be at most $k$ blocks ago. The lemma now follows +from \cref{lemma:rule-equivalence}. +\end{proof} + +\subsection{Honest versus adversarial blocks} + +In this section we will consider what kinds of chains an adversary might try to +construct. + +\begin{lemma} +\label{lemma:adversarial-before-k} +An adversary cannot forge a chain that forks off more than $k$ blocks from +an alert node's tip and is denser than than the alert's node chain at +the intersection between the two chains. +\end{lemma} + +\begin{proof} +This is an easy corollary of the Ouroboros Genesis analysis. If the adversary +would be able to construct such a chain, then by the Density rule the node +should switch to it. As mentioned above, however, the analysis tells us that +alert nodes never have to roll back more than $k$ blocks. +\end{proof} + +\Cref{lemma:adversarial-before-k} has a useful specialization for chains that an +adversary might try to construct near the wallclock: + +\begin{lemma} +\label{lemma:adversarial-within-s} +An adversary cannot forge a chain that satisfies all of the below: +% +\begin{itemize} +\item It forks off at most $s$ slots from the wallclock. +\item If forks off more than $k$ blocks before that tip of an alert node. +\item It is longer than the alert node's current chain. +\end{itemize} +\end{lemma} + +\begin{proof} +The situation looks like this: +% +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\draw + (0,0) + -- (6,0) coordinate (s-back) node{$\bullet$} + -- (7,0) coordinate (k-back) node{$\bullet$}; +\draw + (k-back) + -- ++(2, 1) + -- ++(0, -2) node[pos=0.5,right=0.5]{honest chains} + -- cycle; +\draw + (s-back) + -- ++(1, -2) + -- ++(1, 0) node[right=1.5cm]{adversary's chain}; +\path + (k-back) + -- ++(0, 1) + -- ++(2, 0) node[pos=0.5,above]{$\overbrace{\hspace{2cm}}^{\text{$\le k$ blocks}}$}; +\path + (s-back) + -- ++(0, 1.75) + -- ++(3, 0) node[pos=0.5,above]{$\overbrace{\hspace{3cm}}^{\text{$\le s$ slots}}$}; +\draw [very thick] (9, -2.5) -- (9, 2.5) node[above]{now}; +\end{tikzpicture} +\end{center} +% +The intersection between the alert node's chain and the adversarial chain is +within $s$ slots from the wallclock. This means that density at the intersection +point is just chain length (\cref{lemma:tip-density-is-chain-length}), and +hence the property follows from \cref{lemma:adversarial-before-k}. +\end{proof} + +\subsection{The original genesis rule} +\label{genesis:original} + +The Density rule is simplification of the rule as presented in the paper +\cite{cryptoeprint:2018:378}. The original rule is shown in +\cref{genesis:maxvalid-bg}, and paraphrased below: + +\begin{definition}[Genesis chain selection rule, original version] +\label{genesis:originalrule} +A candidate chain is preferred over our current chain if + +\begin{itemize} +\item The intersection between the candidate chain and our chain is \textbf{no +more than $k$} blocks back, and the candidate chain is strictly \textbf{longer} +than our chain. + +\item If the intersection \emph{is} \textbf{more than $k$} blocks back, and the +candidate chain is \textbf{denser} (contains more blocks) than our chain in +a region of $s$ slots starting at the intersection. +\end{itemize} +\end{definition} + +As we saw in \cref{lemma:rule-equivalence}, the Density rule is equivalent +to the Longest Chain rule if the intersection is within $k$ blocks, so the +original rule and the simplified form as in fact equivalent. + +For completeness sake, we should note that this equivalence only holds for +suitable choice of $s$. If $s$ is much smaller (for example, the paper uses $s = +\frac{1}{4}(k/f)$ in some places), then we might have a situation such as the +following, where we have two chains $A$ and $B$; $A$ is denser than $B$ at the +intersection with $B$, but $B$ is longer: +% +\begin{center} +\begin{tikzpicture} +\path (0, 0) coordinate (tip) node{$\bullet$}; +\draw (tip) -- ++(1.0, 0.5) -- ++(2.5, 0) coordinate(C1) node[right]{$A$}; +\draw (tip) -- ++(1.0, -0.5) -- ++(3.5, 0) coordinate(C2) node[right]{$B$}; +\draw [red, very thick] (tip) -- ++(1.0, 0.5) -- ++(2.0, 0); +\draw [dashed] + (tip) + -- ++(0, 0.75) + -- ++(3, 0) + -- ++(0, -1.5) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_{\text{$s$ slots}}$} + -- cycle; +\path (tip) -- (C1) node[pos=0.5, above=0.5cm]{$\overbrace{\hspace{3.5cm}}^{\text{fewer than $k$ blocks}}$}; +\path (tip) -- (C2) node[pos=0.5, below=1.1cm]{$\underbrace{\hspace{4.5cm}}_{\text{more than $k$ blocks}}$}; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); +\end{tikzpicture} +\end{center} +% +In this case, the original rule ends up preferring either $A$ or $B$ depending +on the order in which we consider them, whereas the Density Rule would simply +pick $A$. (As we we will in \cref{density-ordering-sensitivity}, however, the +Density rule is unfortunately not immune to order sensitivity either.) + +\begin{figure} +\hrule + +\textbf{Parameters} \\[0.5em] +\begin{tabular}{ll} +$C_\mathit{loc}$ & Current chain \\ +$\mathcal{N} = \{C_1, \ldots, C_M\}$ & All possible chains (including our own) \\ +$k$ & Security parameter (\cref{consensus:overview:k}) \\ +$s$ & Genesis window size (Genesis rule specific parameter) \\ +$f$ & Active slot coefficient (\cref{praos:f}) \\[1em] +\end{tabular} + +\textbf{Algorithm} + +\begin{lstlisting}[escapeinside={(*}{*)}, language={}, keywords={for,do,if,then,else,end,return}] +// Compare (*$C_\mathit{max}$*) to each (*$C_i \in \mathcal{N}$*) +Set (*$C_\mathit{max} \leftarrow C_\mathit{loc}$*) +for (*$i = 1$*) to (*$M$*) do + if (*$(C_i \text{ forks from } C_\mathit{max} \text{ at most } k \text{ blocks})$*) then + if (*$|C_i| > |C_\mathit{max}|$*) then // Condition A + Set (*$C_\mathit{max} \leftarrow C_i$*). + else + Let (*$j \leftarrow \max \Bigl\{ j' \ge 0 \mathrel{\Bigl\lvert} C_\mathit{max} \text{ and } C_i \text{ have the same block in } \mathtt{sl}_{j'} \Bigr\} $*) + if (*$|C_i[0 : j + s]| > |C_\mathit{max}[0 : j + s]|$*) then // Condition B + Set (*$C_\mathit{max} \leftarrow C_i$*). +return (*$C_\mathit{max}$*) +\end{lstlisting} + +\hrule +\caption{\label{genesis:maxvalid-bg}Algorithm \texttt{maxvalid-bg}} +\end{figure} + +\section{Fragment selection} +\label{genesis:fragment-selection} + +While the literature on Ouroboros compares entire chains, we will want to +compare \emph{fragments} of chains: if at all possible we would prefer not to +have to download and verify entire chains before we can make any decisions. As +we have discussed in \cref{genesis:background:joining-late}, comparing fragments +(prefixes) of chains using the Longest Chain rule does not actually make much +sense, but in this section we will see that the situation is fortunately much +better when we use the Density rule. + +\begin{definition}[Preferred fragment] +Let $\mathcal{S}$ be set of chain fragments, all anchored at the same point +(that is, the fragments share a common ancestor), corresponding to some set of +chains $\mathcal{C}$. Then $A$ is a preferred fragment in $\mathcal{S}$ if and +only if $A$ is a fragment of a preferred chain in $\mathcal{C}$. +\end{definition} + +We will now establish the (necessary and sufficient) condition for fragment +preference to be decidable. First, if we have to choose between two chains, we +must see enough of those chains to do a density comparison. + +\pagebreak + +\begin{definition}[Known density] +We say that a chain fragment has a \emph{known density} at some point $p$ +if either of the following conditions hold: + +\begin{enumerate} +\item The fragment contains a block after at least $s$ slots: +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0,0) -- (9,0); % adjust bounding box +\path (0,0) -- (1,0) node[pos=0.5]{$\cdots$}; +\draw + (1,0) + -- (3,0) node{$\bullet$} node[above left]{$p$} coordinate(p) + -- (3.5,0) node{$\bullet$} + -- (4,0); +\path + (4,0) + -- (5,0) node[pos=0.5]{$\cdots$}; +\draw + (5,0) + -- (5,0) node{$\bullet$} + -- (6,0) node{$\bullet$} + -- (7,0) node[red]{$\bullet$} + -- (8,0) node[right]{$\cdots$}; +\draw [dashed] + (p) + -- ++(0, 1) + -- ++(3.5, 0) node[pos=0.5,above]{$s$ slots} + -- ++(0, -2) + -- ++(-3.5, 0) + -- cycle; +\end{tikzpicture} +\end{center} + +\item The chain (not just the fragment\footnote{We can distinguish between these +two cases because nodes report the tip of their chain as part of the chain sync +protocol, independent from the headers that we have downloaded from those +nodes.}) terminates within the window: +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0,0) -- (9,0); % adjust bounding box +\path (0,0) -- (1,0) node[pos=0.5]{$\cdots$}; +\draw + (1,0) + -- (3,0) node{$\bullet$} node[above left]{$p$} coordinate(p) + -- (3.5,0) node{$\bullet$} + -- (4,0); +\path + (4,0) + -- (5,0) node[pos=0.5]{$\cdots$}; +\draw + (5,0) + -- (6,0) node[red]{$\bullet$}; +\draw [dashed] + (p) + -- ++(0, 1) + -- ++(3.5, 0) node[pos=0.5,above]{$s$ slots} + -- ++(0, -2) + -- ++(-3.5, 0) + -- cycle; +\end{tikzpicture} +\end{center} +\end{enumerate} +\end{definition} + +\begin{definition}[Look-ahead closure] +\label{lookahead-closure} +Let $\mathcal{S}$ be a set of chain fragments all anchored at the same point. We +say that $\mathcal{S}$ is \emph{look-ahead closed} if whenever there are two +fragments $A, B \in \mathcal{S}$, the densities of $A$ and $B$ are known at +their intersection. +\end{definition} + +\begin{lemma}[Look-ahead closure is sufficient for fragment selection] +\label{lemma:fragment-selection} +Let $\mathcal{S}$ be a look-ahead closed set of chain fragments. Then +we can always choose a preferred fragment in $\mathcal{S}$. +\end{lemma} + +\begin{proof}[Proof (sketch)] +In order to be able to pick a chain, we need to resolve forks. In order to +resolve forks using the Density Rule, we need to know the density, but that +is precisely what is guaranteed by look-ahead closure. +\end{proof} + +\pagebreak + +\Cref{lemma:fragment-selection} is relevant because it reflects how the +consensus layer uses chain selection: + +\begin{enumerate} +\item We maintain a fragment of the chain for each upstream peer we track +(\cref{chainsyncclient}). The block fetch client +(\cref{chainsyncclient:plausiblecandidates}) then picks a preferred fragment and +downloads that. +\item When the chain database needs to construct the current chain +(\cref{chainsel}), it constructs a set of chain fragments through the volatile +DB, all anchored at the tip of the immutable database, picks a preferred +fragment, and adopts that as the node's current chain. +\end{enumerate} + +However, \cref{lemma:fragment-selection} is less useful than it might seem: +the look-ahead closure requirement means that in the worst case, we still need +to see entire chains before we can make a decision: every new intersection point +requires us to see $s$ more slots: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\path (0,0) coordinate(I); +% +\node at (I) {$\bullet$}; +\draw (I) -- ++(2.5, 0); +\draw (I) -- ++(1,-1) -- ++(0.5,0) coordinate(A) -- ++(2,0); +\draw [dashed] + (I) + -- ++(0,0.25) + -- ++(2,0) + -- ++(0,-1.5) + -- ++(-2,0) node[pos=0.5,below]{$\underbrace{\hspace{2cm}}_\text{$s$ slots}$} + -- cycle; +% +\node at (A) {$\bullet$}; +\draw (A) -- ++(2.5, 0); +\draw (A) -- ++(1,-1) -- ++(0.5,0) coordinate(B) -- ++(2,0); +\draw [dashed] + (A) + -- ++(0,0.25) + -- ++(2,0) + -- ++(0,-1.5) + -- ++(-2,0) node[pos=0.5,below]{$\underbrace{\hspace{2cm}}_\text{$s$ slots}$} + -- cycle; +% +\node at (B) {$\bullet$}; +\draw (B) -- ++(2.5, 0); +\draw (B) -- ++(1,-1) -- ++(0.5,0) coordinate(C) -- ++(2,0) node[above=0.5cm, right]{$\cdots$}; +\draw [dashed] + (B) + -- ++(0,0.25) + -- ++(2,0) + -- ++(0,-1.5) + -- ++(-2,0) node[pos=0.5,below]{$\underbrace{\hspace{2cm}}_\text{$s$ slots}$} + -- cycle; +\end{tikzpicture} +\end{center} +% +Moreover, due to the header/body split +(\cref{nonfunctional:network:headerbody}), when we are tracking the headers from +an upstream peer, we cannot (easily) verify headers that are more than $3k/f$ +slots away from the intersection between our chain and their chain (see also +\cref{low-density}). In the next section we will therefore consider how we can +drop the look-ahead closure requirement. + +In case it is not obvious why we must only compare density at intersection +points, in the remainder of this section we will consider an example that will +hopefully clarify it. Suppose a malicious node with some stake intentionally +skips their slot, after which the chain continues to grow as normal: +% +\begin{center} +\begin{tikzpicture} +\draw + (0,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node[above]{$\times$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$}; +\end{tikzpicture} +\end{center} +% +It is now trivial for the adversary to create an alternative chain that +\emph{does} have a block in that slot; if other nodes switch to the denser chain +the moment they see a window of $s$ slots that is denser, they would adopt the +adversary's chain; after all, it has one more block in the window than the +honest chain does: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} coordinate(s-anchor) + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} coordinate(branch) + -- ++(1,0) node[above]{$\times$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$}; +\draw + (branch) + -- ++(1, -1) node{$\bullet$} + -- ++(3, 0) node{$\bullet$}; +\draw [dashed] + (s-anchor) + -- ++(0,1) + -- ++(3.5,0) + -- ++(0,-2.5) + -- ++(-3.5,0) node[below, pos=0.5]{$\underbrace{\hspace{3.5cm}}_{\text{$s$ slots}}$} + -- cycle; +\end{tikzpicture} +\end{center} +% +Instead, we must wait until we make such a comparison until we have reached +the intersection point: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} coordinate(branch) + -- ++(1,0) node[above]{$\times$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$}; +\draw + (branch) + -- ++(1, -1) node{$\bullet$} + -- ++(3, 0) node{$\bullet$}; +\draw [dashed] + (branch) + -- ++(0,1) + -- ++(3.5,0) + -- ++(0,-2.5) + -- ++(-3.5,0) node[below, pos=0.5]{$\underbrace{\hspace{3.5cm}}_{\text{$s$ slots}}$} + -- cycle; +\end{tikzpicture} +\end{center} +% +Since the adversary does not have sufficient stake, their chain will be less +dense and we will therefore not select it. If the adversary creates another fork +earlier on the chain, then we will resolve that fork when we encounter it using +a window of $s$ slots \emph{anchored at that fork}, and then later resolve the +second fork using a \emph{different} window of $s$ slots, anchored at the second +fork: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} coordinate(s-anchor) + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} coordinate(branch) + -- ++(1,0) node[above]{$\times$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$} + -- ++(1,0) node{$\bullet$}; +\draw + (branch) + -- ++(1, -1) node{$\bullet$} + -- ++(3, 0) node{$\bullet$}; +\draw + (s-anchor) + -- ++(1, -1) node{$\bullet$}; +\draw [dashed] + (s-anchor) + -- ++(0,1) + -- ++(3.5,0) + -- ++(0,-2.5) + -- ++(-3.5,0) node[below, pos=0.5]{$\underbrace{\hspace{3.5cm}}_{\text{$s$ slots}}$} + -- cycle; +\draw [dotted] + (branch) ++ (0, 0.1) + -- ++(0,1) + -- ++(3.5,0) node[above, pos=0.5]{$\overbrace{\hspace{3.5cm}}^{\text{$s$ slots}}$} + -- ++(0,-2.5) + -- ++(-3.5,0) + -- cycle; +\end{tikzpicture} +\end{center} + +\pagebreak + +\section{Prefix selection} +\label{genesis:prefix-selection} + +\subsection{Preferred prefix} + +When a set $\mathcal{S}$ of chain fragments is not look-ahead closed, we may +not be able to pick a best fragment. For example, in +% +\begin{equation*} +\mathcal{S} = \left\{ \; +\begin{tikzpicture}[baseline=0pt, xscale=0.5,yscale=0.5] +\draw [very thick, red] (-2,0) -- (0,0); +\draw (0,0) -- (1, 1) -- (6, 1) node[right]{$A$}; +\draw (0,0) -- (1,-1) -- (7, -1) node[right]{$B$}; +\draw [dashed] (-2,0) -- ++(0,1.5) -- ++(5,0) node[pos=0.5,above]{$\overbrace{\hspace{2cm}}^\text{$s$ slots}$} -- ++(0,-3) -- ++(-5,0) -- cycle; +\end{tikzpicture} +\right\} +\end{equation*} +% +we cannot choose between $A$ and $B$; what we \emph{can} say however is that no +matter which of $A$ and $B$ turns out to be the better fragment, the common +prefix of $A$ and $B$ (shown in red) will definitely be a prefix of that +fragment. This example provides the intuition for the definition of a preferred +prefix: + +\begin{definition}[Preferred prefix] +Given a set $\mathcal{S}$ of chain fragments, all anchored at the same point, a +preferred prefix is a prefix $\Pi$ of one of the fragments in $\mathcal{S}$, +such that $\Pi$ is guaranteed to be a prefix of a preferred fragment in the +lookahead-closure of $\mathcal{S}$. +\end{definition} + +In other words, we may not be able to pick the best fragment out of +$\mathcal{S}$, but we \emph{can} pick a prefix which is guaranteed to be a +prefix of whatever turns out to be the best fragment. Obviously, the empty +fragment is always a valid choice, albeit not a particularly helpful one. +Ideally, we would choose the \emph{longest} preferred prefix. Fortunately, +constructing such a prefix is not difficult. + +\subsection{Algorithm} + +We will now consider how we can choose the longest preferred prefix. + +\begin{definition}[Prefix selection] +\label{prefix-selection} +Let $\mathcal{S}$ be a set of chain fragments all anchored at the same point $a$, +such that all fragments have known density at point $a$. Then we can construct +the longest preferred prefix in two steps: +% +\begin{enumerate} +\item \emph{Resolve initial fork.} +Suppose $\mathcal{S}$ looks like this: +% +\begin{center} +\begin{tikzpicture} +\node at (0,0) [left] {$a$}; +\node at (0,0) {$\bullet$}; +% +\draw (0,0) -- (1, 0.5) coordinate(x); +\draw (0,0) -- (1,-0.5) coordinate(y); +\draw [dotted] (0,0) -- (1,-1) node[below right]{$\cdots$}; +% +\draw (x) -- ++(2, 0.25) -- ++(0, -0.5) node[pos=0.5,right]{$\vec{x}$} -- cycle; +\draw (y) -- ++(2, 0.25) -- ++(0, -0.5) node[pos=0.5,right]{$\vec{y}$} -- cycle; +% +\draw [dashed] + (0,0) + -- ++(0, 1) + -- ++(2.5, 0) node[pos=0.5,above]{$s$ slots} + -- ++(0, -2.5) + -- ++(-2.5, 0) + -- cycle; +\end{tikzpicture} +\end{center} +% +where (without loss of generality) the common prefixes are non-empty. Suppose +one of the $\vec{x}$ has the highest density\footnote{ If two fragments in +different forks have have exactly the same density, we need a tie-breaker in +order to be able to make progress. The genesis paper does not prefer either +chain in such a scenario, switching only if another chain is strictly denser. We +can therefore follow suit, and just focus on one of the two chains arbitrarily.} +at point $a$; let's call it $x_i$. That means if we ever were to adopt any of +the $\vec{y}, \ldots$, and then compared our chain to $x_i$, we would find that +$x_i$ is denser at the intersection point (which is precisely what we are +comparing in this window here), and therefore switch to it. This means we can +focus our attention on the $\vec{x}$. (Unfortunately\todo{TODO}, the existing +density rule does not always pick a unique best chain, so we cannot prove this +algorithm correct. See \cref{density-ordering-sensitivity}.) + +\item \emph{Adopt common prefix.} +Most of the time, the density of the $\vec{x}$ will yet not be known. This means +we do not yet know which $x_i$ will turn out to be the best, but we \emph{do} +know that whichever it turns out to be, it will have the common prefix from $a$ +to $b$, so we choose this as the longest preferred prefix: + +\begin{center} +\begin{tikzpicture} +\path (0,0) node{$\bullet$} node[left]{$a$}; +\path (1,0.5) node{$\bullet$} node[below]{$b$}; +% +\draw [very thick, red] (0,0) -- (1, 0.5) coordinate(x); +% +\draw (x) -- ++(2, 0.25) -- ++(0, -0.5) node[pos=0.5,right]{$\vec{x}$} -- cycle; +% +\draw [dashed] + (0,0) + -- ++(0, 1) + -- ++(2.5, 0) node[pos=0.5,above]{$s$ slots} + -- ++(0, -1.5) + -- ++(-2.5, 0) + -- cycle; +\end{tikzpicture} +\end{center} +% +One useful special case is when all of the $\vec{x}$ terminate within the +window. In this case, their density \emph{is} known, and we can just pick the +densest fragment as the preferred prefix; the preferred prefix is then in fact +the preferred fragment. +\end{enumerate} +\end{definition} + +Prefix selection fits very naturally with the needs of the consensus layer +(\emph{cf.} the description of how consensus uses fragment selection in +\cref{genesis:fragment-selection}): +% +\begin{enumerate} +\item When blockfetch applies prefix selection to the set of fragments of +the upstream peers we are tracking, the computed prefix is precisely the set +of blocks that blockfetch should download. +\item When the chain database applies prefix selection to the set of fragments +through the volatile database, the computed prefix is precisely the chain that +we should adopt as the node's current chain. +\end{enumerate} + +\begin{lemma} +If the intersection point between the chains of our upstream peers is at most +$k$ blocks away from their tips, prefix selection will choose the longest chain. +\end{lemma} + +\begin{proof}[Proof (sketch)] +The proof is very similar to the proof of \cref{lemma:rule-equivalence}. +If the intersection is at most $k$ blocks away, it will be less than $s$ slots +away; therefore prefix selection will be able to see the chains to their tip, +the density of all chains will be known, and the densest fragment equals the +longest one. +\end{proof} + +\subsection{Prefix selection on headers} + +When the chain sync client is dedicing which of the chains of its upstream peers +to download, it does so based on chains of \emph{headers}. It does not download +block bodies and so cannot verify them. As such, it is basing its decisions +based on header validity \emph{only}. However, this is then only used to tell +the block fetch client which blocks to \emph{download} +(\cref{chainsyncclient:plausiblecandidates}); it does not necessarily mean that +those blocks will be \emph{adopted}. When the chain database performs chain +selection (\cref{chainsel}), it will verify the blocks and discard any that turn +out to be invalid. If any blocks \emph{are} invalid, then the chain sync client +will disconnect from the nodes that provided them, which in turn may change +which prefix is chosen by prefix selection. + +Indeed, the only reason to even validate headers at all is to avoid a denial +of service attack where an adversary might cause us to waste resources. +It may be worth reconsidering this risk, and balancing it against the costs +for the implementation; the only reason we need forecasting, and a special +treatment of low density chains (\cref{low-density}) is that we insist we want +to validate headers independent from blocks. + +\subsection{Known density} + +\Cref{prefix-selection} requires known density at the anchor $a$ of the set, so +that it can resolve the initial fork. This means we have to wait until +we have downloaded enough headers from each peer: the last header we downloaded +must either be outside the $s$ window, or else it must be the last header on that +peer's chain (in which case the peer will tell us so). If a peer tells us they have +more headers but then do not provide them, we should disconnect from them +after some time-out to avoid a potential denial of service attack. + +Note that if that last header is \emph{not} the last header on a peer's chain, +we will not be able to validate it: since it is more than $s$ slots away from +the anchor point (and we only have a ledger state at the anchor point), it falls +outside the ledger's forecast range. However, this does not matter: the presence +of this header only tells us that we have seen everything we need to see to +compute the density within the window; an invalid header after that window +cannot increase the density \emph{within} the window. + +\pagebreak + +We can make two useful exceptions to the known density requirement: + +\begin{enumerate} +\item If there \emph{is} no initial fork, we do not need to know the density at +all and can go straight to step (2). This will allow a node to sync faster: +consider a new node that is joining the network. In most cases, all of the +node's upstream peers will report the \emph{same} blocks for all but the last +few blocks on the chain. Since there is no fork to resolve, we can start +adopting blocks the moment we see them. + +\item When we say that the density is \emph{known}, we mean that we have seen +the last required header and validated it. However, consider what happens when +the node is up to date, and a new block is produced (by some other node). +Strictly speaking we must now wait for \emph{all} peers to have provided us with +this header, and have validated all those headers, before we would consider the +density to be known and prefix selection can make progress. This is however +unnecessary: when node $A$ provides us with a header, and then node $B$ reports +the exact same tip, we know that node $B$'s density cannot exceed node $A$s, and +so we can go ahead and select node $A$'s chain. +\end{enumerate} + +\begin{figure}[p] +\hrule +\vspace{0.5em} + +The justification for step 1 in \cref{prefix-selection} does depend on ordering +(\emph{first} adopt any of the $\vec{y}$, \emph{then} compare to $x_i$). +Unfortunately we cannot do better than that with the existing Density rule (or +indeed with the original Genesis chain selection rule from the paper, \emph{cf.} +\cref{genesis:maxvalid-bg}). Consider three chains $A, B, C$ in the following +example: +% +\begin{center} +\begin{tikzpicture} +\draw + (0, 0) node {$\bullet$} coordinate (a) + -- (2, 1) node {$\bullet$} coordinate (b) node[above left]{$ab$}; +\draw + (a) + -- ++(1, -0.5) node {$\bullet$} + -- ++(2, 0 ) node {$\bullet$} + -- ++(4, 0 ) node[right]{$C$}; +\draw + (b) + -- ++(0.5, 0.5) node {$\bullet$} + -- ++(0.5, 0 ) node {$\bullet$} + -- ++(0.5, 0 ) node {$\bullet$} + -- ++(3.5, 0 ) node[right]{$A$}; +\draw + (b) + -- ++(0.5 , -0.5) + -- ++(1.25, 0 ) + -- ++(0.5 , 0 ) node {$\bullet$} + -- ++(0.5 , 0 ) node {$\bullet$} + -- ++(0.5 , 0 ) node {$\bullet$} + -- ++(0.5 , 0 ) node {$\bullet$} + -- ++(1.25, 0 ) node[right]{$B$}; +% +\draw [dashed] + (a) + -- ++(0, 2) + -- ++(4, 0) + -- ++(0, -3) + -- ++(-4, 0) node[pos=0.5,below]{$\underbrace{\hspace{4cm}}_\text{$s$ slots}$} + -- cycle; +\draw [dashed] + (b) + -- ++(0, 1.5) + -- ++(4, 0) node[pos=0.5,above]{$\overbrace{\hspace{4cm}}^\text{$s$ slots}$} + -- ++(0, -2.5) + -- ++(-4, 0) + -- cycle; +\end{tikzpicture} +\end{center} +% +Depending on the order in which we consider the chains, we might pick any of +these three chains: +% +\begin{center} +\begin{tabular}{l|l} +\textbf{order} & \textbf{selected chain} \\ \hline +$B, C, A$ & $A$ \\ +$C, A, B$ & $B$ \\ +$A, B, C$ & $C$ \\ +\end{tabular} +\end{center} +% +This is rather unfortunate, but resolving this would require input from the +Ouroboros researchers. The algorithm described in \cref{prefix-selection} +essentially makes local decisions: every time it sees a fork, it moves towards +the densest chain in the window. For the example above, it would proceed as +follows: +% +\begin{itemize} +\item When resolving the initial fork, it would notice that $A$ is the densest +fragment. Since it doesn't have sufficient look-ahead to compare $A$ and $B$, +it would defer that decision, but it would discard $C$ and adopt the block $ab$ +in the common prefix of $A$ and $B$. +\item When resolving the second fork, it would notice that $B$ is the densest +fragment, and discard $A$. In this example, there are no further forks, and +so it would adopt the entire fragment of $B$ in the window. +\end{itemize} +\hrule +\caption{\label{density-ordering-sensitivity}Ordering sensitivity of the Density rule} +\end{figure} + +\section{Avoiding long rollbacks} +\label{genesis:avoiding-long-rollbacks} + +\subsection{Representative sample} + +At the end of \cref{genesis:background:density-rule} we mentioned that if +we have a situation such as +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (-2,0) node{$\bullet$} -- (0,0); +\draw (0,0) node{$\bullet$} -- (10,0) node{$\bullet$} node[right] {honest chain}; +\draw + (0,0) + -- (0,-1) + -- (4,-1) + node{$\bullet$} + node[right]{adversary's chain} + node[pos=0.5, below]{$\underbrace{\hspace{4cm}}_{\text{$> k$ blocks}}$}; +\end{tikzpicture} +\end{center} +% +and we happen to adopt the adversary's chain first and later compare it to the +honest chain, the Density Rule will prefer the honest chain and so we should +switch, at the cost of a rollback of more than $k$ blocks. + +Such long rollbacks are problematic; we depend on having a maximum rollback of +$k$ blocks in many ways +(\cref{consensus:overview:k,storage:components,chainsyncclient:validation,chainsyncclient:forecasting} +and others), and we will want to \emph{continue} to depend on it. However, we +needed this long rollback in this example only because we \emph{first} adopted +the adversary's chain, and \emph{then} compared it to the honest chain. In the +consensus layer even as it is today, we already don't do chain selection in such +a chain-by-chain manner; as we saw in \cref{genesis:fragment-selection} and +again in \cref{genesis:prefix-selection}, we instead pick the best out of all +currently known candidates. This means that as long as we are \emph{aware} of +both chains before we adopt either one, we will just pick the honest chain +straight away, and we avoid the rollback. + +This then is the solution to avoiding longer-than-$k$ rollbacks: as long as we +are not yet up to date with the main chain, we must make sure that we connect to +a representative sample $\RequiredPeers$ of upstream peers that the probability +that \emph{none} of them will serve us the honest chain is negligible +(presumably aided by a probabilistic way of choosing upstream peers in the +network layer), and avoid doing prefix selection until we have reached +this threshold. + +The only remaining decision to make is when we can \emph{drop} this requirement: +at which point is the state of the node comparable to the state of an alert node +(a node that has consistently been online)? + +\subsection{Becoming alert} +\label{genesis:becoming-alert} + +Every block that we adopt through prefix selection that is more than $s$ slots +away from the wall clock must be a block on the honest chain, because at every +fork we will only adopt blocks from the denser fragment. Moreover, all honest +parties (all alert nodes) will agree on blocks that are more than $s$ slots away +from their tip. This means that we will not need to roll back at all: any block +that we adopt which is more than $s$ slots away from the wall-clock is a block +that we can be sure about. + +\pagebreak + +It is tempting to conclude from \cref{lemma:adversarial-within-s} that as soon +we have reached $s$ slots from the wallclock, we can drop the requirement to be +connected to $\RequiredPeers$ nodes and restrict rollback to $k$ blocks. This +is however not the case. Recall what the situation looks like: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) + -- (6,0) coordinate (s-back) node{$\bullet$} + -- (7,0) coordinate (k-back) node{$\bullet$}; +\draw + (k-back) + -- ++(2, 1) + -- ++(0, -2) node[pos=0.5,right=0.5]{honest chains} + -- cycle; +\draw + (s-back) + -- ++(1, -2) + -- ++(1, 0) node[right=1.5cm]{adversary's chain}; +\path + (k-back) + -- ++(0, 1) + -- ++(2, 0) node[pos=0.5,above=-0.15]{$\overbrace{\hspace{2cm}}^{\text{$\le k$ blocks}}$}; +\path + (s-back) + -- ++(0, 1.75) + -- ++(3, 0) node[pos=0.5,above]{$\overbrace{\hspace{3cm}}^{\text{$\le s$ slots}}$}; +\draw [very thick] (9, -2.5) -- (9, 2.5) node[above]{now}; +\end{tikzpicture} +\end{center} +% +\Cref{lemma:adversarial-within-s} tells us that the adversary cannot construct a +chain that forks off more than $k$ blocks from an alert node's chain and is +longer than that chain. It does \emph{not} tell us that it cannot contain more +than $k$ blocks after the intersection.\footnote{A worst-case adversary with +near 50\% stake would be able to construct $0.5 \times 3k = 1.5k$ blocks in +$3k/f$ slots. The reasoning would be simpler if the adversary has at most 33\% +stake, as then they could indeed only construct $k$ blocks in $3k/f$ slots.} +This means that if we dropped the requirement that we see all chains and then +see and adopt the adversary's chain, we would be stuck, as switching to the +honest chain would involve a rollback of more than $k$ blocks. + +\Cref{lemma:adversarial-within-s} would only help us if we are somehow +guaranteed that we have adopted one of the alert nodes' chains. But how can we +tell? Fortunately, here we have a rare example of reality serendipitously lining +up with theory. In the theoretical model, nodes collect entire chains broadcast +by their peers, and then use the density rule to select the best one. Once they +have done that, they have selected the same chain that any of the alert nodes +might have selected, and so from this point forward their state is effectively +indistinguishable from the state of an alert node. + +In reality of course we cannot collect entire chains, and so we introduced +the concept of prefix selection in order to be able to apply the Density rule +incrementally. However, notice what happens when we have reached $s$ slots +from the wallclock: once we have filled our look-ahead window, \emph{we will +have seen every chain to its very tip}. Every fragment terminates within the +window, which means that prefix selection will not just pick the preferred +\emph{prefix}, and not just the preferred \emph{fragment}, but in fact the +preferred \emph{chain}. This means that just like the theory assumes, we have +selected the best chain out of all possible chains, which means we can +conclude we are now completely up to date and can resume normal operation. + +\begin{definition}[Recovering ``alert'' status] +\label{recover-alert-status} +When prefix selection can see all available chains to their tip, and we have +selected and adopted the best one, the node is up to date. +\end{definition} + +(TODO\todo{TODO}: Christian tells me that the genesis proof also depends on +such a ``final chain selection''. Would be good to refer to that, but I'm +not sure where that happens in the paper.) + +\subsection{Avoiding DoS attacks} +\label{genesis:becoming-alert:DoS} + +Malicious nodes cannot abuse \cref{recover-alert-status} in an attempt to +prevent us from concluding we are up to date: as we saw, all blocks that get us +to within $s$ slots from the wallclock come from the honest chain, and once we +have reached $s$ slots from the wallclock, an adversary cannot present us with a +chain that exceeds the window, since blocks with a slot number after the +wall-clock are invalid. + +We do have to be careful if we allow for clock skew however: if a malicious node +presents us with a header past the $s$ window (and hence past the wallclock, +though within permissible skew), we would not be able to conclude that we have +become alert. This would not stop prefix selection from doing its job---after +all, a header after the window means that we now have a known density---and so +we would continue to adopt blocks from the honest chain; however, the malicious +node could keep presenting us with a header that is just out of reach, +preventing us from ever concluding we are up to date and hence from producing +new blocks. The only reason we allow for clock skew at all, however, is to avoid +branding nodes as adversarial whereas in fact its just that our clocks are +misaligned. This must therefore be a best-effort only: allow for clock skew, but +not if this would exceed $s$ slots from the intersection point. + +\subsection{Falling behind} + +\Cref{recover-alert-status} gives us a way to discover that we are up to date. +Deciding when we are \emph{not} up to date is less critical. One option is to +simply use the inverse of \cref{recover-alert-status} and say we are not up to +date when one of our peers provides us with a chain that we cannot see until its +tip. Another option is to assume we are not up to date when we boot up the +node, and then only conclude that have somehow fallen behind again if we notice +that our tip is more than a certain distance away from the wallclock (at most +$s$ slots). This may need some special care; if nodes stop producing blocks for +a while, we might end up in a state in which we both conclude that we are up to +date (because we can see all of our peer's chains to their tip) and not up to +date (because our tip is too far from the wallclock). However, this scenario +needs special case anyway; we will come back to it in \cref{low-density}. + +\subsection{Block production} + +In principle, block production can happen even when the node is not up to date. +There is however not much point: any blocks that the node will produce while it +is not up to date are likely to be discarded almost immediately after +production, because the node will prefer the existing (honest) chain over the +tiny fork that it itself created. Moreover, blocks that we produce while we are +not up to date may in fact be helpful for an adversary. We should therefore +disable block production while the node is not up to date. + +\section{Implementation concerns} + +\subsection{Chain selection in the chain database} +\label{genesis:chain-database} + +We mentioned in \cref{genesis:prefix-selection} that the prefix selection +interface works equally well for the chain sync client and the chain database. +They are however not doing the same thing; this is a subtle point that deserves +to be spelled out in detail. + +It comes back to the difference between perceived chain length and actual chain +length (\cref{genesis:background:joining-late}). In the chain sync client this +is a meaningful and useful difference: since we are tracking chains from +individual peers, it makes sense to distinguish between having seen the tip of +that particular chain, or only seeing a prefix of that chain. However, unless we +completely change the API to the chain database, the chain database just sees +individual blocks, without knowing anything about their provenance; it does +therefore not know if those blocks are the tip of ``their'' chains; it's not +even clear what that would mean. + +Of course, when the chain database is constructing fragments of chains through +its volatile database, it knows if a particular block is the tip of any +constructed fragment. However, that is \emph{perceived} chain length: it might +be a tip just because we haven't downloaded any more blocks yet. The difference +is important. Consider two forks like this: +% +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\draw (0, 0) -- (4, 0) coordinate(i); +\draw (i) -- ++(1, 1) -- ++(6,0) node[right]{$A$}; +\draw (i) -- ++(1, -1) -- ++(6,0) node[right]{$B$}; +\draw [dashed] + (i) + -- ++( 0, 2) + -- ++( 4, 0) node[pos=0.5,above]{$\overbrace{\hspace{4cm}}^\text{$s$ slots}$} + -- ++( 0, -4) + -- ++(-4, 0) + -- cycle; +\path + (i) + -- ++(0, 1) + -- ++(4, 0) node[pos=0.5,above]{$\overbrace{\hspace{4cm}}^\text{denser}$}; +\path + (i) + -- ++(0, -1) + -- ++(4, 0) node[pos=0.5,below]{$\underbrace{\hspace{4cm}}_\text{$> k$ blocks}$}; +\end{tikzpicture} +\end{center} +% +Chain $A$ is denser in the window, but $B$ nonetheless has more than $k$ blocks +in the window (this is entirely possible; a chain of normal density would have +$3k$ blocks in the window). The chain sync client knows about both nodes, knows +that the chains extend past the window, will wait until it has seen sufficient +blocks from both chains (that is, for the first header outside the window), then +do a density comparison, find that $A$ is denser, and choose to download the +blocks from chain $A$. + +\pagebreak + +But the chain database cannot replicate any of that reasoning. When blocks from +either chain $A$ or chain $B$ are added, as far as the chain database is +concerned, those \emph{are} the tips of those chains. This means that is not +doing a density comparison, but a chain length comparison. What's worse, if more +than $k$ blocks from chain $B$ are added before it sees any blocks from chain +$A$, then it from that point forward be unable to switch to chain $A$, as this +would involve a rollback of more than $k$ blocks. + +This is not necessarily problematic: since the chain sync client has more +context, it will make the right decision, and only present blocks from chain $A$ +to the database. Indeed, as we saw in \cref{genesis:becoming-alert}, we will +in fact download \emph{only} blocks from the honest chain until we are $s$ +slots away from the wallclock, at which point we do one final chain selection, +and we are up to date. At this point the Density Rule is \emph{anyway} just +selecting the longest chain, so the fact that the chain database is effectively +doing longest chain selection \emph{always} does not matter. + +It does however mean that the block fetch client becomes an important +``guardian'' of the chain database; they become more tightly coupled than they +are currently. This is unfortunate, but not disastrous; it ``only'' makes the +system more difficult to understand. Solving this problem would require +rethinking how the chain database works; this is well outside the scope of this +chapter. + +There is one advantage to this. \Cref{genesis:prefix-selection} describes how +the chain database could \emph{in principle} use prefix selection: compute all +paths through the volatile database and then use prefix selection to construct a +prefix that the node should adopt as its current chain. While this is a useful +\emph{specification} of what the chain database must do, in practice we will +probably need an equivalent of \cref{focusonnewblock} that will allow us to +avoid looking at the entire volatile database every time a new block is added to +the database. If we however decide that the chain database is just selecting +based on chain length \emph{anyway}, then the existing lemma and existing +implementation can be used as-is. + + +\subsection{Abstract chain selection interface} + +The current chain selection API compares two chains at a time, and only looks at +the tips of those two chains (\cref{consensus:overview:chainsel}). This will +obviously have to change; depending on how exactly we want to modify the chain +database (\cref{genesis:chain-database}), we must either replace the existing +interface with prefix selection as the primitive operation, or else add prefix +selection as an additional concept. + +One of the reasons we currently only look at the tips of chains is because this +simplified treatment of chain selection in the hard fork combinator. This by +itself might not be too difficult to change; for example, we could set the +\lstinline!SelectView! of the hard fork block to be an $n$-ary sum of the +\lstinline!SelectView!s of the various eras. However, it is not a-priori clear +what it would mean to apply, say, the Praos rule in one era on the chain, +and the Genesis rule in another. This will require some careful thought, +though we can probably just sidestep the entire issue and pretend we were +using the Genesis rule all along. + +\subsection{Possible optimisations} +\label{genesis:optimizations} + +Chain selection while we are not up to date has some properties that might +enable us to implement some performance optimizations. Here we just list some of +the possibilities: + +\begin{itemize} + +\item When a node is operational, we try to line up its average-case performance +requirements with its worst-case performance requirements, since this avoids +an attack vector: if the average-case performance would be significantly better +than the worst-case, it is likely that nodes would be optimised for the average +case (for instance, run on hardware that can handle the average case, but not +necessarily the worst case); then if a malicious node can intentionally cause +the worst-case, they might be able to bring down parts of the network. + +For this reason we don't normally share work between various peers; when +multiple upstream peers all send us the same header, we verify the header +each time. This means that the average case (most upstream chains are the same) +and the worst case (every upstream chain is different) are equal. + +\pagebreak + +However, it is less important that we can predict accurately how long it takes +a node (that isn't up to date) to sync with the network. Such a node is anyway +not producing blocks; here, faster is simply better. This means that while we +we are not up to date we could share the validation work across upstream peers: +when two peers send us the same header, we do not need to validate it twice. + +This is \emph{especially} important when we are not up to date, because due to +the requirement to have at least $\RequiredPeers$ upstream peers, we might be +connecting to more peers than usual. Moreover, under normal circumstances we +expect all of these peers to present us with exactly the same chain (and +finally, these cryptographic checks are expensive). + +\item Similarly, since we expect all upstream nodes to report the same chain, +if we receive a bunch of headers from peer 1, we can just ask peer 2 +whether they have the most recent of those headers on their chain, thereby +skipping over large chunks of the chain altogether. + +\item Since we only ever fetch blocks strictly in order, we can simplify +the interaction with the block fetch client: it might be easier to generate +longer fetch ranges, as well as spread the load more evenly across the peers. + +\item We saw in \cref{genesis:becoming-alert} that any blocks that we download +while syncing which are more than $s$ slots away from the wall clock, will be +blocks from the common prefix of the honest chains and will not have to be +rolled back. It might therefore be possible to bypass the volatile database +entirely. However, how this works when we switch back from being up to date +to not being up to date would require careful thought. + +\end{itemize} diff --git a/ouroboros-consensus/docs/report/chapters/future/lowdensity.tex b/ouroboros-consensus/docs/report/chapters/future/lowdensity.tex new file mode 100644 index 00000000000..11ec26851ce --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/future/lowdensity.tex @@ -0,0 +1,953 @@ +\chapter{Dealing with extreme low-density chains} +\label{low-density} + +\section{Introduction} + +As we saw in \cref{genesis}, chain density is our principal means for +distinguishing chains forged by malicious nodes from the honest chain: due to +the fundamental assumption that there is an honest majority, the chain forged by +the honest nodes will be denser than any chain forged by an adversary. If +therefore the honest nodes in the system stop producing blocks due to some +network-wide problem---pervasive node misconfiguration, bug in the ledger, +etc.---the security of the system is at risk. Even when the nodes start +producing blocks again, the low-density region of the chain left behind by the +problem will remain to be an issue for security. If an adversary forks off +their own chain at the point where the honest majority stopped producing blocks, +then new nodes joining the network (using the genesis rule, \cref{genesis}) will +adopt the adversarial chain instead of the honest chain: +% +\begin{center} +\begin{tikzpicture} +\draw + (0,0) + -- (3,0) node{$\bullet$} coordinate(i); +\draw + (i) + -- ++(3,0) node[pos=0.5,above]{$\overbrace{\hspace{3cm}}^\text{no blocks produced}$} + -- ++(4,0) node[pos=0.5,above]{$\overbrace{\hspace{4cm}}^\text{chain resumes}$} + node[right]{honest chain}; +\draw + (i) + -- ++(0, -1) + -- ++(4, 0) node[pos=0.5,below]{$\underbrace{\hspace{4cm}}_\text{$s$ slots}$} + -- ++(2, 0) node[right]{adversarial chain}; +\draw [dashed] (7,-2) -- (7,1); +\end{tikzpicture} +\end{center} +% +The \emph{Disaster Recovery Plan}\footnote{Currently not available as a public +document.} sketches how we might ``patch the chain back up'' when a major +problem like this occurs. This must happen out-of-band with the cooperation of +the major stake holders, and is (mostly) outside the scope of the Consensus +Layer report. That said, the options for disaster recovery are currenly limited +due to a technical limitation in the consensus layer, which we will discuss now. + +From a chain security point of view, it makes little difference if the honest +chain has a region of $s$ slots containing \emph{one} block or \emph{zero} +blocks; both are equally terrible. However, this makes a big +difference to the implementation as it currently stands: as long as there is at +least one block in every $s$ slots, the system can continue; but when there is a +gap of more than $s$ slots anywhere on the chain, the system will grind to a +halt. As we will see in this chapter, this is a consequence of the fact that we +validate headers independent from blocks, the so-called header/body split +(see also \cref{nonfunctional:network:headerbody}). The main goal of this +chapter is to discuss how we can address this, allowing the system to continue +irrespective of any gaps on the chain. This is important for a number of +reasons: + +\begin{enumerate} +\item It makes disaster recovery less immediately urgent: if the honest nodes +stop producing blocks for whatever reason, the problem can be resolved, the +system restarted, and blocks can be produced again. Disaster recovery, +and patching the chain back up, can then be considered as the system is running +again, and put into motion when the various stake holders are ready. +\item It also opens up more avenues for disaster recovery. If the consensus +layer can't skip past large gaps on the chain, then the chain \emph{must} be +patched. However, if we lift this restriction, then there are other ways in +which we might address the problem. For example, we could (even if just +temporarily) simply record the low-density area of the chain within the code +itself and hardcode a preference for (this part of the) ``honest but sparse'' +chain in chain selection. +\item Chain regions with extreme low density are difficult to avoid +in our consensus tests (\cref{testing:consensus}). +\end{enumerate} + +Even \emph{if} it is desirable that the system stops when the chain density +falls below a certain threshold, it does not make sense to set that threshold at +the ``less than 1 block per $s$ slots'' boundary. This should be defined and +implemented as an external policy, not dictated by implementation details. +Moreover, even with an explicit stop, we might like the ability to mark the +known-to-be-low-density chain and restart the system (point 2, above). It is +also far from clear how to avoid adverserial nodes from taking advantage of such +``automatic'' stops (how do we prevent adversaries from producing blocks?). +Either way, such concerns are well outside the scope of this chapter. Here we +address just one question: how can we allow the system to continue when there +are larger-than-$s$-slots gaps on the chain. + +\section{Background} + +\subsection{Recap: ledger state, ledger view and forecasting} + +Blocks are validated against the state of the ledger +(\cref{ledger:api:ApplyBlock}). For example, we check that inputs spent by +transactions in the block are available in the UTxO in the ledger state. +Depending on the choice of consensus protocol, we may also need part of the +ledger state to be able to validate the block \emph{header}. For example, in +Praos and Genesis we need to know the active stake distribution in order to be +to verify that whoever produced the block had a right to do so. We call the part +of the ledger state that we need to validate block headers the \emph{ledger +view} (\cref{consensus:class:ledgerview}). + +We call it a ledger \emph{view} because it is a projection out of the full +ledger state. Unfortunately, we cannot compute the \emph{next} ledger view based only +on the header; there is nothing that corresponds to the dotted arrow in this +diagram: +% +\begin{center} +\begin{tikzpicture}[block/.style={rectangle}] +\node at (0, 2) (state1) [block] {ledger state}; +\node at (7, 2) (state2) [block] {ledger state}; +\node at (0, 0) (view1) [block] {ledger view}; +\node at (7, 0) (view2) [block] {ledger view}; +\draw [->] (state1.south) -- (view1.north) node[pos=0.5,left]{project}; +\draw [->] (state2.south) -- (view2.north) node[pos=0.5,right]{project}; +\draw [->] (state1.east) -- (state2.west) node[pos=0.5,above]{apply block}; +\draw [->, dotted] (view1.east) -- (view2.west) node[pos=0.5,below]{(cannot apply header)}; +\end{tikzpicture} +\end{center} +% +Let's recall the Praos example again: we can compute the active stake +distribution from the ledger state, but in order to understand how the active +stake distribution evolves, we need to know how the full UTxO evolves, and for +that we need to the full blocks. (We discussed this also in +\cref{hfc:failed:forecasting}.) + +Let's stay with Praos a little longer. The active stake distribution changes +only at epoch boundaries. Therefore we will know the active stake distribution +at least until the end of the epoch. Moreover, once we get close enough to the +epoch boundary, we also know the stake distribution for the \emph{next} epoch. +The range over which we know the active stake distribution therefore evolves as +follows: +% +\begin{center} +\begin{tikzpicture}[yscale=0.75] +% +\draw (0, 0) -- (2, 0) node{$\bullet$} node[above left]{tip}; +\path (2, 0) -- (6, 0) node[pos=0.5,above]{$\overbrace{\hspace{4cm}}^\text{known}$}; +% +\draw (0, -1) -- (3, -1) node{$\bullet$} node[above left]{tip}; +\path (3, -1) -- (6, -1) node[pos=0.5,above]{$\overbrace{\hspace{3cm}}^\text{known}$}; +% +\draw (0, -2) -- (4, -2) node{$\bullet$} node[above left]{tip}; +\path (4, -2) -- (6, -2) node[pos=0.5,above]{$\overbrace{\hspace{2cm}}^\text{known}$}; +% +\draw (0, -3) -- (5, -3) node{$\bullet$} node[above left]{tip}; +\path (9, -3) -- (6, -3) node[pos=0.5,above]{$\overbrace{\hspace{5cm}}^\text{known}$}; +% +\draw [dashed] ( 2, -3.2) -- ( 2, 0.7) node[above]{epoch}; +\draw [dashed] ( 6, -3.2) -- ( 6, 0.7) node[above]{epoch}; +\draw [dashed] (10, -3.2) -- (10, 0.7) node[above]{epoch}; +\end{tikzpicture} +\end{center} + +\pagebreak + +The range over which we know the active stake distribution shrinks and then +grows again, but never falls below a certain minimum size. We abstract from this +process in the consensus layer, and say we can \emph{forecast} the ledger view +from a particular ledger state over a certain \emph{forecast range} +(\cref{ledger:api:LedgerSupportsProtocol}). This does not necessarily mean the +ledger view is constant during that range, but merely that any changes are +\emph{known} (for example, see the last line in the diagram above). + +If we change our perspective slightly, we can say that blocks on the chain +cannot influence the ledger view (active stake distribution) until a certain +period of time (in slots) has passed. We call this the \emph{stability window} +of the ledger, and will study it in more detail in the next section. + +\subsection{Recap: stability windows} +\label{low-density:recap-stability-window} + +Blocks are validated against ledger states; each block is validated against the +ledger state as it was after applying the previous block. This means that when +we validate block $B$ in the example below, we use the ledger state after +applying block $A$; for block $C$, we use the ledger state after applying block +$B$: +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm} + ,baseline=0pt] +\node at (0,0) (A) [block] {A}; +\node at (2,0) (B) [block] {B}; +\node at (5,0) (C) [block] {C}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west) node[pos=0.5,above=5mm]{\small validated against}; +\draw (B.east) -- (C.west) node[pos=0.5,above=5mm]{\small validated against}; +\draw (C.east) -- ++(2,0); +% +\draw [->, dotted] (B.west) to [out=135,in=90] (A.east); +\draw [->, dotted] (C.west) to [out=135,in=90] (B.east); +\end{tikzpicture} +\qquad +\begin{minipage}{0.25\textwidth} +\emph{Horizontal axis represents time (in slots)} +\end{minipage} +\end{center} +% +In the chain sync client (\cref{chainsyncclient}) we are however not validating +blocks, but block \emph{headers}. As we saw, in order to validate a header we +only need part of the ledger state, known as the ledger \emph{view}. We also saw +that despite the fact that we only need part of the ledger state, we cannot +\emph{update} the ledger view using only headers: we still need the full block. +This means that if we have block $A$, but only block \emph{headers} $B$ and $C$, +we have a problem: +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm}] +\path (-2,0) -- (11,0); % adjust bounding box +\node at (0,0) (A) [block] {A}; +\node at (2,0) (B) [block, dashed] {B}; +\node at (5,0) (C) [block, dashed] {C}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west) node[pos=0.5,above=5mm]{\small validated against}; +\draw (B.east) -- (C.west) node[pos=0.5,above=5mm]{\small validated against}; +\draw (C.east) -- ++(2,0); +% +\draw [->, dotted] (B.west) to [out=135,in=90] (A.east); +\draw [->, dotted] (C.west) to [out=135,in=90] (B.east); +\end{tikzpicture} +\end{center} +% +Validating header $B$ is unproblematic, since we have the ledger state available +after applying block $A$. However, since we don't have block $B$, we can't +compute the ledger state after block $B$ to validate header $C$. We are saved by +the fact that we can \emph{forecast} the ledger view required to validate +header $B$ from the ledger state after $A$: +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm}] +\path (-2,0) -- (11,0); % adjust bounding box +\node at (0,0) (A) [block] {A}; +\node at (2,0) (B) [block, dashed] {B}; +\node at (5,0) (C) [block, dashed] {C}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west) node[pos=0.55,below=5mm]{\small forecast}; +\draw (B.east) -- (C.west); +\draw (C.east) -- ++(2,0); +% +\draw [->, dotted] (B.west) to [out=135,in=90] (A.east); +\draw [->, dotted] (C.west) to [out=135,in=90] (B.east); +% +\draw [->, dotted] (A.east) to [out=270,in=270] (B.east); +\end{tikzpicture} +\end{center} +% +We can do this because of a restriction on the ledger: blocks cannot affect +the ledger view until a \emph{stability window} has passed: +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm}] +\path (-2,0) -- (11,0); % adjust bounding box +\node at (0,0) (A) [block] {A}; +\node at (2,0) (B) [block, dashed] {B}; +\node at (5,0) (C) [block, dashed] {C}; +\node at (8,0) (D) [block, dashed] {D}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west) node[pos=0.55,below=5mm]{\small forecast}; +\draw (B.east) -- (C.west); +\draw (C.east) -- (D.west); +\draw (D.east) -- ++(2,0); +% +\draw [->, dotted] (B.west) to [out=135,in=90] (A.east); +\draw [->, dotted] (C.west) to [out=135,in=90] (B.east); +% +\draw [->, dotted] (A.east) to [out=270,in=270] (B.east); +\node at (B.east) [below=0.6, right] {$\underbrace{\hspace{4cm}}_\text{stability window}$}; +\node at (7,0) {$\times$}; +\end{tikzpicture} +\end{center} +% +We can use the ledger state after applying block $A$ (which we +have complete knowledge of) to validate any header up to the end of $B$'s +stability window: any changes that $A$ (or any block before $A$) +initiates we know about, and any changes that $B$ initiates cannot take effect +until that stability window ends. Therefore we can validate header $C$, but not +header $D$: block $B$ might have scheduled some changes to take effect at the +slot marked as $(\times)$ in the diagram, and we do not know what those effects +are.\footnote{It might be tempting to think that we can validate $D$ because if +we did have blocks $B$ and $C$, block $D$ would be evaluated against the ledger +state as it was after applying $C$, which is still within $B$'s stability +window. However, the slot number of $D$ (its location on the $x$-axis in the +diagram) matters, because changes are scheduled for slots.} + +In chain sync we do not currently take advantage of the knowledge of the +location of header $B$.\footnote{\label{footnote:anchor-after-first-header}We +should change this. By anchoring the stability window at the last known block, +we only have a guarantee that we can validate $k$ headers, but we should really +be able to validate $k + 1$ headers in order to get a chain that is longer than +our own (\cref{low-density:tension}). If we anchored the stability window after +the first unknown header, where it \emph{should} be anchored, we can validate +$k$ headers \emph{after} the first unknown header, and hence $k + 1$ in total. +Concretely, we would have to extend the \lstinline!LedgerSupportsProtocol! class +with a function that forecasts the ledger view given a \emph{ticked} ledger +state. Taking advantage of this would then just be a minor additional +complication in the chain sync client.} This means we have to be conservative: +all we know is that there could be \emph{some} block in between $A$ and $C$ that +might schedule some changes that are relevant for validating header $C$. In this +case we therefore assume that the stability window extends from $A$ instead: +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm}] +\path (-2,0) -- (11,0); % adjust bounding box +\node at (0,0) (A) [block] {A}; +\node at (2,0) (B) [block, dashed] {B}; +\node at (5,0) (C) [block, dashed] {C}; +\node at (8,0) (D) [block, dashed] {D}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west); +\draw (B.east) -- (C.west); +\draw (C.east) -- (D.west); +\draw (D.east) -- ++(2,0); +% +\node at (A.east) [below=0.6, right] {$\underbrace{\hspace{4cm}}_\text{stability window}$}; +\end{tikzpicture} +\end{center} +% +In this example, that means we can validate $B$, but not $C$ (nor +$D$).\footnote{We could in principle shift this up by 1 slot: after all, the +very first next block after $A$ cannot be in the same slot as $A$. While EBBs +are an exception to that rule (\cref{ebbs}), we do not need to validate EBBs so +this is a rare example where EBBs do not cause a problem.} + +\subsection{Tension with chain selection} +\label{low-density:tension} + +Changes that affect the ledger view are scheduled for slots (often +for epoch boundaries, which happen at particular slots); the stability window +must therefore be defined in terms of slots as well. This means that +the number of \emph{headers} we can validate within a given stability window +depends on the density of that chain; if the chain we considered at the end +of the previous section looks like this instead +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm}] +\path (-2,0) -- (11,0); % adjust bounding box +\node at (0,0) (A) [block] {A}; +\node at (1.5,0) (B) [block, dashed] {B}; +\node at (3,0) (C) [block, dashed] {C}; +\node at (8,0) (D) [block, dashed] {D}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west); +\draw (B.east) -- (C.west); +\draw (C.east) -- (D.west); +\draw (D.east) -- ++(2,0); +% +\node at (A.east) [below=0.6, right] {$\underbrace{\hspace{4cm}}_\text{stability window}$}; +\end{tikzpicture} +\end{center} +% +we can validate headers $B$ and $C$ (but still not $D$). + +There is a fundamental tension between the stability window defined in +\emph{slots}, and chain selection preferring longer chains: chains that have +more \emph{blocks}. In order to be able to do a meaningful comparison between +our chain and the candidate chain, we must be able to verify enough of that +candidate chain that the length of that verified prefix exceeds the length of +our own chain. Since the maximum rollback we support is $k$ +(\cref{consensus:overview:k}), that means we must be able to validate at least +$k + 1$ headers. The tension is resolved by a theoretical result that says that +within $3k/f$ slots we \emph{will} see more than $k$ blocks (more precisely, the +probability that we see fewer than $k$ blocks in $3k/f$ slots is negligibly +small; \cite{cryptoeprint:2017:573}). This therefore provides us with a suitable +choice for a stability window. + +Unfortunately, while in theory there is no difference between theory and +practice, there is in practice. Currently, when all nodes in the system are +unable to produce blocks for an extended period of time, the system grinds to a +halt. Even if the underlying problem is resolved, nodes will refuse to create a +block if the distance between that block and the previous block exceeds the +stability window; after all, if they did produce a block, other nodes would be +unable to validate it. The former is easily resolved, this is merely a check in +the block production code; resolving the second problem is the topic of this +chapter. + +It would be preferable to avoid the tension altogether, and schedule +changes that affect the ledger view for particular \emph{blocks} instead +(and consequently, have epoch boundaries also happen at certain blocks). This +however requires backing from theoretical research; we will come back to this +in \cref{future:block-vs-slot}. + +\pagebreak + +\subsection{Single-gap case} + +It is tempting to think that when there is only a \emph{single} large gap +on the chain, there is no problem: +% +\begin{center} +\begin{tikzpicture} + [block/.style={rectangle,draw=black,minimum size=5mm}] +\path (-2,0) -- (11,0); % adjust bounding box +\node at (0,0) (A) [block] {A}; +\node at (6,0) (B) [block, dashed] {B}; +\node at (7,0) (C) [block, dashed] {C}; +\node at (8,0) (D) [block, dashed] {D}; +\draw (-2,0) -- (A.west); +\draw (A.east) -- (B.west); +\draw (B.east) -- (C.west); +\draw (C.east) -- (D.west); +\draw (D.east) -- ++(2,0); +% +\node at (B.east) [below=0.6, right] {$\underbrace{\hspace{4cm}}_\text{stability window}$}; +\end{tikzpicture} +\end{center} +% +The gap between $A$ and $B$ exceeds the stability window, but this +should not matter: it's not the stability window after $A$ that +matters, but the stability window after $B$. This seems to be a useful special +case: if a problem \emph{does} arise that prevents nodes from producing blocks +for an extended period of time, one might hope that this problem does not +immediately arise again after the nodes resume producing blocks. + +As we saw, the consensus layer always conservatively anchors the stability +window at the last known block rather than the first header after the tip. We +could change this (and probably should; see +\cref{footnote:anchor-after-first-header}), but it turns out this does not +actually help very much for this particular problem. To see this, suppose there +is another node in the system which is currently on a fork that intersects with +this chain after some block $I$ before the gap: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5,block/.style={rectangle,draw=black,minimum size=5mm}] +\node at (-2,-1) (I) [block] {I}; +\node at (0,0) (A) [block, dashed] {A}; +\node at (6,0) (B) [block, dashed] {B}; +\node at (7,0) (C) [block, dashed] {C}; +\node at (8,0) (D) [block, dashed] {D}; +\draw (-4,-1) -- (I.west); +\draw (I.east) -- (A.west); +\draw (A.east) -- (B.west); +\draw (B.east) -- (C.west); +\draw (C.east) -- (D.west); +\draw (D.east) -- ++(2,0); +% +\node at (A.east) [below=0.6, right] {$\underbrace{\hspace{4cm}}_\text{stability window}$}; +% +\node at (0, -2) (A') [block] {A$'$}; +\draw (I.east) -- (A'.west); +\end{tikzpicture} +\end{center} +% +The second node must execute a rollback to $I$ in order to be able to adopt +the new chain, but from \emph{its} perspective the first unknown block is $A$, +not $B$: hence the stability window \emph{must} be anchored at $A$, and the +node will be unable to bridge the gap. + +\section{Pre-genesis} +\label{low-density:pre-genesis} + +In this section we will consider how we might allow nodes to recover from a low +density chain, prior to the implementation of the genesis rule. An obvious +solution suggests itself: we could just allow chain sync to download blocks +when it needs to validate a header which is beyond its forecast range. + +\subsection{Damage mitigation} + +The reason the chain sync client doesn't normally download blocks is to limit +the amount of unnecessary work an attacker can make it do (prevent DoS attacks, +\cref{nonfunctional:network:headerbody}). We might therefore consider if we can +restrict \emph{when} we allow the chain sync client to download blocks. Ideally +we would do this only ``when necessary'': to bridge the gap on the honest chain. +Unfortunately, it is difficult to come up with a criterion that +approximates this ideal. Consider how the situation evolves from the point of +view of a single node: + +\begin{center} +\begin{tikzpicture}[yscale=0.25] +% +\path (0, 0) coordinate(imm1) node{$\bullet$} node[above]{imm}; +\draw (imm1) -- ++(-1,0); +\draw (imm1) -- ++(0.5,0); +\path (imm1) -- ++(0.5,0) -- ++(1,0) node[pos=0.5]{$\cdots$} -- ++(0.5,0) coordinate(cp1); +\draw (cp1) -- ++(-0.5,0); +\draw (cp1) -- ++(1, 1.5) -- ++(1, 0) node{$\bullet$}; +\draw (cp1) -- ++(1, 0.5) -- ++(1.5, 0) node{$\bullet$}; +\draw (cp1) -- ++(1,-0.5) -- ++(0.5, 0) node{$\bullet$}; +\draw (cp1) -- ++(1,-1.5) -- ++(1, 0) node{$\bullet$}; +\draw [very thick] (4.5,-2) -- (4.5,2) node[above]{now}; +% +\path (0, -7) coordinate(imm2) node{$\bullet$} node[above]{imm}; +\draw (imm2) -- ++(-1,0); +\draw (imm2) -- ++(0.5,0); +\path (imm2) -- ++(0.5,0) -- ++(1,0) node[pos=0.5]{$\cdots$} -- ++(0.5,0) coordinate(cp2); +\draw (cp2) -- ++(-0.5,0); +\draw (cp2) -- ++(1, 1.5) -- ++(1, 0) node{$\bullet$}; +\draw (cp2) -- ++(1, 0.5) -- ++(1.5, 0) node{$\bullet$}; +\draw (cp2) -- ++(1,-0.5) -- ++(0.5, 0) node{$\bullet$}; +\draw (cp2) -- ++(1,-1.5) -- ++(1, 0) node{$\bullet$}; +\draw [very thick] (5.5,-9) -- (5.5,-5) node[above]{now}; +% +\path (0, -14) coordinate(imm3) node{$\bullet$} node[above]{imm}; +\draw (imm3) -- ++(-1,0); +\draw (imm3) -- ++(0.5,0); +\path (imm3) -- ++(0.5,0) -- ++(1,0) node[pos=0.5]{$\cdots$} -- ++(0.5,0) coordinate(cp3); +\draw (cp3) -- ++(-0.5,0); +\draw (cp3) -- ++(1, 1.5) -- ++(1, 0) node{$\bullet$}; +\draw (cp3) -- ++(1, 0.5) -- ++(1.5, 0) node{$\bullet$}; +\draw (cp3) -- ++(1,-0.5) -- ++(0.5, 0) node{$\bullet$}; +\draw (cp3) -- ++(1,-1.5) -- ++(1, 0) node{$\bullet$}; +\path (4.5,-12.5) -- (7.5,-12.5) node[pos=0.5,above=-0.1]{$\overbrace{\hspace{3cm}}^\text{$> s$ slots}$}; +\draw [very thick] (7.5,-16) -- (7.5,-12) node[above]{now}; +% +\path (0, -21) coordinate(imm4) node{$\bullet$} node[above]{imm}; +\draw (imm4) -- ++(-1,0); +\draw (imm4) -- ++(0.5,0); +\path (imm4) -- ++(0.5,0) -- ++(1,0) node[pos=0.5]{$\cdots$} -- ++(0.5,0) coordinate(cp4); +\draw (cp4) -- ++(-0.5,0); +\draw (cp4) -- ++(1, 1.5) -- ++(1, 0) node{$\bullet$} coordinate(before1); +\draw (cp4) -- ++(1, 0.5) -- ++(1.5, 0) node{$\bullet$}; +\draw (cp4) -- ++(1,-0.5) -- ++(0.5, 0) node{$\bullet$} coordinate(before2); +\draw (cp4) -- ++(1,-1.5) -- ++(1, 0) node{$\bullet$}; +\path (4.5,-19.5) -- (7.5,-19.5) node[pos=0.5,above=-0.1]{$\overbrace{\hspace{3cm}}^\text{$> s$ slots}$}; +\draw [very thick] (10,-23) -- (10,-19) node[above]{now}; +\path (before1) -- ++(5,0) coordinate(after1); +\path (before2) -- ++(6,0) coordinate(after2); +\draw (after1) node{$\bullet$} -- ++(2,0); +\draw (after2) node{$\bullet$} -- ++(2,0); +\end{tikzpicture} +\end{center} + +\pagebreak + +The node is tracking the chains of a number of upstream peers. These chains will +share some common prefix, which must at least include the tip of our own +immutable database (that is, the block $k$ blocks away from our tip), marked +``imm''. When block production is halted due to some problem, the gap between +the tips of the chains and the wallclock will start to increase; at some point +this gap will exceed the stability window. Finally, when the problem is resolved +the nodes will start producing blocks again. + +\begin{assumption} +\label{never-only-malicious} +In the period where the honest nodes cannot produce any blocks, malicious nodes +cannot either. If that is not the case, we are in trouble anyway; that is a +problem which is well outside the scope of this chapter. +\end{assumption} + +\Cref{never-only-malicious} seems to give some hope. We may not be able to +decide for any \emph{particular} chain if that chain happens to be the honest +chain. However, if \emph{none} of the chains contain any blocks in the gap, then +eventually it will be true for \emph{all} upstream peers that the gap from the +tip of that peer's chain to the wallclock exceeds the stability window. This +might suggest the following rule: + +\begin{failedattempt} +Only allow the chain sync client to download blocks if this would be required +for \emph{all} peers. +\end{failedattempt} + +Unfortunately, this rule does not work because as soon as we bridge the gap for +\emph{one} of our peers, that condition no longer holds: +% +\begin{center} +\begin{tikzpicture}[yscale=0.25] +\path (0, -21) coordinate(imm4) node{$\bullet$} node[above]{imm}; +\draw (imm4) -- ++(-1,0); +\draw (imm4) -- ++(0.5,0); +\path (imm4) -- ++(0.5,0) -- ++(1,0) node[pos=0.5]{$\cdots$} -- ++(0.5,0) coordinate(cp4); +\draw (cp4) -- ++(-0.5,0); +\draw (cp4) -- ++(1, 1.5) -- ++(1, 0) node{$\bullet$} coordinate(before1); +\draw (cp4) -- ++(1, 0.5) -- ++(1.5, 0) node{$\bullet$}; +\draw (cp4) -- ++(1,-0.5) -- ++(0.5, 0) node{$\bullet$} coordinate(before2); +\draw (cp4) -- ++(1,-1.5) -- ++(1, 0) node{$\bullet$}; +\path (4.5,-19.5) -- (7.5,-19.5) node[pos=0.5,above=-0.1]{$\overbrace{\hspace{3cm}}^\text{$> s$ slots}$}; +\draw [very thick] (10,-23) -- (10,-19) node[above]{now}; +\path (before1) -- ++(5,0) coordinate(after1); +\draw (before2) -- ++(6,0) coordinate(after2); +\draw (after1) node{$\bullet$} -- ++(2,0); +\draw (after2) node{$\bullet$} -- ++(2,0); +\end{tikzpicture} +\end{center} +% +Now one of our chains has a tip which is near the wallclock, and so the +condition no longer holds. Okay, you might say, but it was true at \emph{some} +point, and when it was true, it would have allowed the chain sync client to +download blocks for \emph{any} peer. Thus, we could try the following rule: + +\begin{failedattempt} +When we detect that the tips of all upstream peers are more than the stability +window away from the wallclock, give the chain sync client a chance to download +blocks for \emph{all} peers. +\end{failedattempt} + +This \emph{might} work, but it's very stateful. What does ``all peers'' mean +exactly? All peers we are currently connected to? What if we connect to another +peer later? What if the node has restarted in the meantime, do we need to +persist this state? Will we need some notion of peer identity? Perhaps all of +these questions have answers, but this does not seem like a clean solution. + +As a final attempt, we might try to ensure that there is only a \emph{single} +chain after we resolve the problem that was preventing block production. +Suppose this could somehow be guaranteed (out of band communication to agree on +a block in the common prefix, use a BFT-like leadership selection for a while, +etc.). Then we could try the following rule: + +\begin{failedattempt} +When we detect that the tips of all upstream peers are more than the stability +window away from the wallclock, allow the chain sync client to download enough +blocks to bridge the gap for \emph{one} peer. Allow the other peers to bridge +the gap only if they contain the \emph{same} header after the gap. +\end{failedattempt} + +Unfortunately, this still cannot work. Even if the honest nodes agree to only +produce a single chain after the gap, we cannot prevent an adversary from +constructing another chain. If the node then happens to pick the adversary's +chain as the one-and-only allowed header to jump the gap, it would be unable to +then switch to the honest chain later. + +\pagebreak + +\subsection{Damage analysis} + +If we cannot limit when the chain sync client is allowed to download and +validate blocks, then let's analyse exactly what the possiblity for denial of +service attacks really is. + +\begin{lemma} +When the node is up to date, the chain sync client will never have to download +any blocks. +\end{lemma} + +\begin{proof} +The Praos analysis \cite{cryptoeprint:2017:573} tells us that the honest chains +will not diverge by more than $k$ blocks, and that this means that their +intersection cannot be more than $3k/f$ slots away from the wallclock (provided +block production is not halted, of course). This means that any header that +would be more than the stability window away from the intersection point +would have a slot number past the wallclock, and would therefore be +invalid.\footnote{Though we allow for some minimal clock skew, headers past +the wallclock should be considered invalid if this exceeds $s$ slots from the +immutable tip, even if they would still fall within the permissible clock +skew. This is an edge case that was important for implementation of genesis +as well; see \cref{genesis:becoming-alert:DoS}.} +\end{proof} + +This means that we only have to worry about DoS attacks while a node is syncing. +As a first observation, node performance is less critical here. The node is +anyway not producing blocks while syncing, so causing the node to slow down +temporarily is not a huge deal (\emph{cf.} also \cref{genesis:optimizations} +where we argue it's less essential during syncing to make the worst case +performance and the normal case performance the same). + +It will therefore suffice to simply \emph{bound} the amount of work a malicious +node can make us do. We have to make sure that we can see at least $k+1$ headers +from each peer (we want to support a rollback of $k$ blocks, and chain selection +is based on length, so if we can validate $k+1$ headers, we have seen enough to +do a length comparison and decide we want to switch to the other chain). This +means we would need to download at most $k$ blocks. + +This bounds the amount of \emph{memory} we might need to dedicate to any +chain,\footnote{Currently the length of the fragments we keep in memory for each +upstream peer is bound by the forecast range, but that natural bound would of +course no longer work if we allow the chain sync client to download blocks.} but +does not limit how much \emph{work} they can make us do: an attacker with even a +small amount of stake could construct lots of chains that fork off the main +chain, and so we'd end up downloading and validating lots of blocks. We can +limit the impact of this by rate limiting rollback messages, which would be +useful for other purposes as well.\footnote{For example, it can help avoid a DoS +attack where an attacker attempts to flood our volatile DB with lots of useless +blocks.} Moreover, there is no real asymmetry here between the attacker and the +defender: the cost of downloading and validating a block on our side is not too +dissimilar from the cost of producing and providing that block on the side of +the attacker, and all the attacker would gain in doing so is slow down a node's +syncing speed. (Admittedly, if we adopt more than $k$ blocks from the +adversarial chain we'd be in trouble, but that is a problem solved by the +Genesis chain selection rule). + +\pagebreak + +\section{Post-genesis} +\label{low-density:post-genesis} + +With the implementation of the genesis rule, discussed in detail in +\cref{genesis}, some things get easier, but unfortunately some things get more +difficult. + +\subsection{Pre-disaster genesis window} + +Suppose the chain is operating as normal until disaster strikes and the nodes +stop producing blocks: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) + -- (3,0) node{$\bullet$} coordinate(i); +\draw (i) -- ++(1, 1) -- ++(1, 0); +\draw (i) -- ++(1, 0) -- ++(1.5, 0); +\draw (i) -- ++(1, -1) -- ++(0.5, 0); +\path + (i) + -- ++(2.5, 0) node[pos=0.5,above=0.5cm]{$\overbrace{\hspace{2.5cm}}^\text{$\le k$ blocks}$}; +\draw [very thick] (6,-1.5) -- (6,2) node[above]{disaster}; +\end{tikzpicture} +\end{center} +% +While the Genesis analysis \cite{cryptoeprint:2018:378} tells us that that +common intersection point is \emph{at most} $k$ blocks away, in practice it will +actually be much less than $k$ most of the time, a handful of blocks in typical +cases. This means that when the nodes start producing blocks again, chain +selection will be a looking at a window of $s$ slots where all chains have very +low density:\footnote{Prefix selection does a length comparison when we can see +all chains to their tip, meaning all chains terminate within the $s$ window. It +is important that we don't reinterpret that as ``all chains are less than $k$ +\emph{blocks} away from the intersection point''. If we did, we would conclude +in this case that we can still do a length comparison when the chains continue +after the end of the disaster period; that is not correct: it would mean that +while the chains start are growing we would come to one conclusion, but then +once the chains grow past the window of $k$ blocks, we would switch to comparing +density and might come to a \emph{different} conclusion.} +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) + -- (3,0) node{$\bullet$} coordinate(i); +\draw (i) -- ++(0.25, 1) -- ++(0.5, 0); +\draw (i) -- ++(0.25, 0) -- ++(0.75, 0); +\draw (i) -- ++(0.25, -1) -- ++(0.25, 0); +\draw [very thick] (4.5,-1.5) -- (4.5,2.5) node[above]{disaster\vphantom{y}}; +\draw [very thick] (6.5,-1.5) -- (6.5,2.5) node[above]{recovery}; +\draw [dashed] + (i) + -- ++( 0, 2) + -- ++( 3, 0) + -- ++( 0, -4) + -- ++(-3, 0) node[pos=0.5,below]{$\underbrace{\hspace{3cm}}_\text{$s$ slots}$} + -- cycle; +% +\draw (6.5, 1) -- (8.5, 1); +\draw (6.5, 0) -- (7.5, 0); +\draw (6.5, -1) -- (8, -1); +\end{tikzpicture} +\end{center} +% +In effect we are doing a density comparison over very short fragments. In +general this is not meaningful; in the extreme case, where that fragment +contains only a single slot, density will either be 100\% or 0\%. +It is tempting to think that we could just \emph{grow} the genesis window to +include part of the post-disaster chain. Growing the genesis window is however +not sound: once we get more than $s$ slots away from the intersection point, an +adversary can start to influence the leadership schedule and so density +comparisons are no longer meaningful. + +Essentially what this means is that after disaster recovery we arbitrarily pick +any of the chains from before the disaster to continue. This probably does not +matter too much; at worst more blocks are lost than strictly necessary, but +those transactions can be resubmitted and we're anyway talking about disaster +recovery; some loss is acceptable.\todo{Verify} + +It might \emph{even} okay if the chain we happened to pick was constructed by an +adversarial node. After all, at most they can have constructed $k$ blocks, and +all they can do is selectively \emph{omit} transactions; if we continue the +chain based on such an adversarial chain, the damage they can do is very +limited.\todo{Verify} + +\emph{However.} Suppose we do make an arbitrary choice and the chain resumes. +Nothing is preventing an adversary from forking off a new chain just prior to +the disaster region \emph{after the fact}. If they do, and new nodes joining +the system end up choosing that chain, they are in serious trouble; now they +are following a chain that is basically under the control of the adversary. + +\pagebreak + +This ability of adversaries to construct new forks before areas of low density +on the chain mean that these areas are a serious risk to security. Indeed, +somewhat ironically this risk is made \emph{worse} by the genesis rule. If we +look at chain length only, the honest chain will probably be longer than +whatever chain an attacker forges; but if we look at density, an attacker than +can even produce a single block in $s$ slots might already have a sufficient +advantage. + +This means that some kind of disaster recovery becomes even more important +after we implement the genesis rule. Ideally we would patch the chain up, +but there is an easier option which can work (at least as a temporarily +solution): it suffices to hardcode a pre-disaster block as the agreed-on +pre-disaster tip. + +\subsection{Post-disaster genesis window} + +So far we've been talking about the genesis window as we approach the disaster. +Suppose we choose \emph{some} block as our pre-disaster tip; either by randomly +selecting one of the chains (or if by luck all chains happen to converge +pre-disaster) or by hardcoding a preference for a certain block: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw (0,0) -- (3,0) coordinate(i) node{$\bullet$}; +\draw [dotted] (i) -- ++(0.25, 1) -- ++(0.5, 0); +\draw + (i) + -- ++(0.25, 0) + -- ++(0.75, 0) node{$\bullet$} coordinate(pre-disaster-tip); +\draw [dotted] (i) -- ++(0.25, -1) -- ++(0.25, 0); +\draw [very thick] (4.5,-1.5) -- (4.5,2.5) node[above]{disaster\vphantom{y}}; +\draw [very thick] (6.5,-1.5) -- (6.5,2.5) node[above]{recovery}; +\draw [dashed] + (pre-disaster-tip) + -- ++( 0, 2) + -- ++( 3, 0) + -- ++( 0, -4) + -- ++(-3, 0) node[pos=0.5,below]{$\underbrace{\hspace{3cm}}_\text{$s$ slots}$} + -- cycle; +% +\draw (6.5, 1) -- (8.5, 1); +\draw (6.5, 0) -- (7.5, 0); +\draw (6.5, -1) -- (8, -1); +\end{tikzpicture} +\end{center} +% +Having made this choice, we are \emph{again} faced with a comparison between +chains which all have very low density within the window (in the extreme case, +even zero). This means that here we effectively have a \emph{second} arbitrary +choice between chains, with all the same dangers (in particular the danger of an +attacker forking off a new chain after the fact). However, in this case we have +a way out: +% +\begin{lemma} +\label{lemma:shift-genesis-window} +Suppose we have decided on a particular pre-disaster tip, and the chains we see +look like this: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) + -- (3,0) node{$\bullet$} node[above left]{tip} coordinate(tip); +\draw + (tip) + -- ++(4,1.5) node{$\bullet$} + -- ++(2.5,0) node[right]{$\cdots$}; +\draw + (tip) + -- ++(6,0.5) node{$\bullet$} + -- ++(0.5,0) node[right]{$\cdots$}; +\draw + (tip) + -- ++(5,-0.5) node{$\bullet$} + -- ++(1.5,0) node[right]{$\cdots$}; +\draw + (tip) + -- ++(3,-1.5) node{$\bullet$} + -- ++(3.5,0) node[right]{$\cdots$}; +% +\draw [dashed] + (tip) + -- ++(0,2) + -- ++(4.5,0) node[pos=0.5, above]{$\overbrace{\hspace{4.5cm}}^\text{$s$ slots}$} + -- ++(0,-4) + -- ++(-4.5,0) + -- cycle; +\end{tikzpicture} +\end{center} +% +Then we can shift up the genesis lookahead window until it starts at the +first block after the tip: +% +\begin{center} +\begin{tikzpicture}[yscale=0.5] +\draw + (0,0) + -- (3,0) node{$\bullet$} node[above left]{tip} coordinate(tip); +\draw + (tip) + -- ++(4,1.5) node{$\bullet$} + -- ++(2.5,0) node[right]{$\cdots$}; +\draw + (tip) + -- ++(6,0.5) node{$\bullet$} + -- ++(0.5,0) node[right]{$\cdots$}; +\draw + (tip) + -- ++(5,-0.5) node{$\bullet$} + -- ++(1.5,0) node[right]{$\cdots$}; +\draw + (tip) + -- ++(3,-1.5) node{$\bullet$} + -- ++(3.5,0) node[right]{$\cdots$}; +% +\draw [dashed] + (tip) ++(3,0) + -- ++(0,2) + -- ++(4.5,0) node[pos=0.5, above]{$\overbrace{\hspace{4.5cm}}^\text{$s$ slots}$} + -- ++(0,-4) + -- ++(-4.5,0) + -- cycle; +\end{tikzpicture} +\end{center} +\end{lemma} + +\begin{proof} +The first block that could be produced by an adversary is the first block after +the tip. This adversarial block cannot influence the leadership schedule until +at least $3k/f$ slots later, which is also the size of the lookahead window +($s$). Therefore a density comparison within the shifted window will still +favour the honest chains. +\end{proof} +% +\Cref{lemma:shift-genesis-window} means that we can shift the genesis window +until after the disaster, and avoid the second arbitrary choice between chains. +In particular, it means we can definitely make it across the gap safely if we +\emph{mark} the before-disaster block (to avoid picking an adversary's block). + +\pagebreak + +\subsection{(No) need for gap jumping} + +In \cref{low-density:pre-genesis} we discuss that prior to the implementation +of the genesis rule, we sometimes need to allow the chain sync client to +download blocks. Since chain selection was based on length, we need to be able +to validate a sufficient number of headers to get a fragment that is longer +than our current chain; in the case of a disaster, that might mean validating +a header that is more than $s$ slots away from our latest usable ledger state +to validate that header, and hence we may need to download some blocks. + +The genesis rule, in principle, \emph{never needs to look past $s$ slots}. +It makes all of its decisions based on a window of $s$ slots; if a node reports +a header past the end of that window, that just tells us we have seen everything +we need to see about that chain within the window. There is no need to validate +this header: any headers \emph{within} the window contribute to the density +of the chain and are validated, any headers \emph{past} the window just cap +that density; nodes cannot increase their chain's density with an invalid +header past the window, and so nothing can go wrong if we do not validate that +header. + +This changes however if we want to make use of \cref{lemma:shift-genesis-window}. +It is of course necessary that we validate the headers \emph{within} the window; +if we shift the window, we are no longer guaranteed that ``within the window'' +is synonymous with ``at most $s$ slots away from the ledger state we have +available''. + +Whether or not this opens us up to denial of service attacks depends +on when exactly we shift the window. However, if we do this only if we have some +kind of explicit disaster recovery (where we mark the pre-disaster block), +or if the density in the window we see drops below a certain threshold, then +the scope for a denial is service attack is very limited indeed. + +\subsection{In the absence of adversaries} + +In the consensus tests (\cref{testing:consensus}) periods where no blocks are +being produced are hard to avoid. However, we do not (currently) model +adversarial behaviour. This means that any kind of explicit disaster recovery is +not needed: if pre-disaster and post-disaster we end up picking an ``arbitrary'' +chain, consensus is still guaranteed. After all, the choice is not ``arbitrary'' +in the sense that different nodes may pick different chains; it is only +``arbitrary'' in the sense that we are doing a density comparison on a fragment +that is too short (it may be necessary to add a deterministic tie-breaker in +case there are multiple fragments with equal density). diff --git a/ouroboros-consensus/docs/report/chapters/future/misc.tex b/ouroboros-consensus/docs/report/chapters/future/misc.tex new file mode 100644 index 00000000000..d93524b6bca --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/future/misc.tex @@ -0,0 +1,125 @@ +\chapter{Miscellaneous} + +TODO\todo{TODO}: This is a mess at the moment. + +\section{On abstraction} + +ledger integration: as things were changing a lot, it made sense for consensus to define the ledger API internally and have the integration be done consensus side. but as things are stabilising, it might make more sense for that abstraction to live externally, so that you can literally plug in Shelley into consensus and we don't have to do anything + +\section{On-disk ledger state} + +\duncan + +Sketch out what we think it could look like +Consequences for the design + +\section{Transaction TTL} +\label{future:ttl} + +Describe that the mempool could have explicit support for TTL, but that right now we don't (and why this is OK: the ledger anyway checks tx TTL). We should discuss why this is not an attack vector (transactions will either be included in the blockchain or else will be chucked out because some of their inputs will have been used). + +\section{Block based versus slot based} +\label{future:block-vs-slot} + +\section{Eliminating safe zones} +\label{future:eliminating-safezones} + +Are they really needed? Consensus doesn't really look ahead anymore? +(Headers are not checked for time; leadership is ticking, not forecasting). +Does the wallet really need it? What about the ledger? + +Other thought: what if we split slots into "microslots", 20 microslots to a +slot. Now the slot/time mapping is \emph{always} known, and for Shelley etc +we don't actually need to know the global microslot, all we care about is +the microslot within a slot (and hence is independent of when Shelley starts). +This would make time conversion no longer state dependent. + +\section{Eliminating forecasting} +\label{future:eliminating-forecasting} + +This is a stronger version of \cref{future:eliminating-safezones}, where +we eliminate \emph{all} forecasting. Specifically, this means that we don't +do header validation anymore, relying on the chain DB to do block validation. +This would be an important simplification of the consensus layer, but we'd +need to analyse what the ``benefit'' of this simplification is for an +attacker. Personally, I think it'll be okay. + +The most important analysis we need to do here is how this affects the memory +usage of the chain sync client. Note that we already skip the ahead-of-time +check, which we don't do until we have the full block and validate it. We +should discuss that somewhere as well. + +\section{Open kinds} +\label{future:openkinds} + +Avoid type errors such as trying to apply a ledger to a block instead of an era +(or an era instead of crypto, or..). + +\section{Relax requirements on time conversions} +\label{future:relax-time-requirements} + +Perhaps it would be okay if time conversions we strictly relative to a ledger +state, rather than ``absolute'' (\cref{time:ledgerrestrictions}). + +\section{Configuration} + +What a mess. + +\section{Specialised chain selection data structure} + +In \cref{chainsel:spec} we describe how chain selection is implemented. However, +in an ideal world this would mean we have some kind of specialised data +structure supporting + +\begin{itemize} +\item Efficient insertion of new blocks +\item Efficient computation of the best chain +\end{itemize} + +It's however not at all clear what such a data structure would look like if we +don't want to hard-code the specific chain selection rule. + +\section{Dealing with clock changes} +\label{future:clockchanges} + +When the user changes their system clock, blocks that we previously adopted +into our current chain might now be ahead of the system clock (\cref{chainsel:infuture}) and should not +be part of the chain anymore, and vice versa. + +When the system clock of a node is moved \emph{forward}, we should run chain +selection again because some blocks that we stored because they were in the +future may now become valid. Since this could be any number of blocks, on any +fork, probably easiest to just do a full chain selection cycle (starting from +the tip of the immutable database). + +When the clock is moved \emph{backwards}, we may have accepted blocks that we +should not have. Put another way, an attacker might have taken advantage of the +fact that the clock was wrong to get the node to accept blocks in the future. In +this case we therefore really should rollback--- but this is a weird kind of +rollback, one that might result in a strictly smaller current chain. We can only +do this by re-initialising the chain DB from scratch (the ledger DB does not +support such rollback directly). Worse still, we have have decided that some +blocks were immutable which really weren't. + +Unlike the data corruption case, here we should really endeavour to get to a +state in which it was as if the clock was never ``wrong'' in the first place; +this may mean we might have to move some blocks back from the immutable DB to +the volatile DB, depending on exactly how far the clock was moved back and how +big the overlap between the immutable DB and volatile DB is. + +It is therefore good to keep in mind that the overlap between the immutable DB +and volatile DB does make it a bit easier to deal with relatively small clock +changes; it may be worth ensuring that, say, the overlap is at least a few days +so that we can deal with people turning back their clock a day or two without +having to truncate the immutable database. Indeed, in a first implementation, +this may be the \emph{only} thing we support, though we will eventually have to +lift that restriction. + +Right now, we do nothing special when the clock moves forward (we will discover +discover the now valid blocks on the next call to \lstinline!addBlock! +(\cref{chainsel:addblock}). When the clock is reset \emph{backwards}, the node +will currently (intentionally) crash, we make no attempt to try and reset +the state (the current slot number moving backwards might cause difficulties +in many places). Unfortunately, if the clock is moved so far back that blocks +in the \emph{immutable database} are now considered to be ahead of the wall +clock, we will not currently detect this (\cref{time:imm-tip-in-future}). diff --git a/ouroboros-consensus/docs/report/chapters/hfc/misc.tex b/ouroboros-consensus/docs/report/chapters/hfc/misc.tex new file mode 100644 index 00000000000..8eeee5eda55 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/hfc/misc.tex @@ -0,0 +1,132 @@ +\chapter{Misc stuff. To clean up.} +\label{hfc:misc} + +\todo{This is just a collection of random snippets right now.} + +\section{Ledger} + +\subsection{Invalid states} +\label{hfc:ledger:invalid-states} + +\todo{This came from the Byron/Shelley appendix. Need to generalise a bit or provide context.} +In a way, it is somewhat strange to have the hard fork mechanism be part of the +Byron (\cref{byron:hardfork}) or Shelley ledger (\cref{shelley:hardfork}) +itself, rather than some overarching ledger on top. For Byron, a Byron ledger +state where the \emph{major} version is the (predetermined) moment of the hard +fork is basically an invalid state, used only once to translate to a Shelley +ledger. Similar, the \emph{hard fork} part of the Shelley protocol version will +never increase during Shelley's lifetime; the moment it \emph{does} increase, +that Shelley state will be translated to the (initial) state of the post-Shelley +ledger. + +\section{Keeping track of time} +\label{hfc:time} + +EpochInfo + +\section{Failed attempts} + +\subsection{Forecasting} +\label{hfc:failed:forecasting} + +As part of the integration of any ledger in the consensus layer (not HFC +specific), we need a projection from the ledger \emph{state} to the consensus +protocol ledger \emph{view} +(\cref{,ledger:api:LedgerSupportsProtocol}). +As we have seen\todo{Once we write these sections, add back references here}, +the HFC additionally requires for each pair of consecutive eras a \emph{state} +translation functions as well as a \emph{projection} from the state of the old +era to the ledger view of the new era. These means that if we have $n + 1$ eras, +we need $n$ across-era projection functions, in addition to the $n + 1$ +projections functions we already have \emph{within} each era. + +This might feel a bit cumbersome; perhaps a more natural approach would be to +only have within-era projection functions, but require a function to translate +the ledger view (in addition to the ledger state) for each pair of eras. +We initially tried this approach; when projecting from an era to the next, +we would first ask the old era to give us the final ledger view in that era, +and then translate this final ledger view across the eras: + +\begin{center} +\begin{tikzpicture}[ +square/.style={rectangle, draw}, +] +% old ledger +\node[square] (Astate) {old ledger state}; +\node[square] (Aview1) [below=of Astate] {view}; +\node[square] (Aview2) [right=of Aview1] {view}; +\node (Adots) [right=of Aview2] {$\ldots$}; +\node[square] (AviewN) [right=of Adots] {view}; +\draw[->] (Astate.south) -- (Aview1.north); +\draw[->] (Astate.south) .. controls +(down:1cm) and +(up:1cm).. (Aview2.north); +\draw[->] (Astate.south) .. controls +(down:1cm) and +(up:1cm).. (AviewN.north); +% +% some intermediate nodes for positiiong +\node (AstateN) [above=of AviewN] {}; +\node (mid) [right=of AstateN] {}; +\node (midH) [above=of mid] {era boundary}; +\node (midM) [below=of mid] {}; +\node (midL) [below=of midM] {}; +% +% new ledger +\node[square] (Bstate) [right=of mid] {new ledger state}; +\node[square] (Bview1) [below=of Bstate] {view}; +\node[square] (Bview2) [right=of Bview1] {view}; +\node (Bdots) [right=of Bview2] {$\ldots$}; +\node[square] (BviewN) [right=of Bdots] {view}; +\draw[->] (Bstate.south) -- (Bview1.north); +\draw[->] (Bstate.south) .. controls +(down:1cm) and +(up:1cm).. (Bview2.north); +\draw[->] (Bstate.south) .. controls +(down:1cm) and +(up:1cm).. (BviewN.north); +% +\draw[dotted] (midH) -- (midL); +\draw[->, dashed] (AviewN.south) .. controls +(down:1cm) and +(down:1cm) .. (Bview2.south) node[pos=0.5, below] {\emph{translate}};; +\end{tikzpicture} +\end{center} + +The problem with this approach is that the ledger view only contains a small +subset of the ledger state; the old ledger state might contain information about +scheduled changes that should be taken into account when constructing the ledger +view in the new era, but the final ledger view in the old era might not have +that information. + +Indeed, a moment's reflection reveals that this cannot be right the approach. +After all, we cannot step the ledger state; the dashed arrow in +% +\begin{center} +\begin{tikzpicture}[ +square/.style={rectangle, draw}, +] +\node[square] (state) {ledger state at anchor}; +\node[square] (view1) [below=of state] {view}; +\node[square] (view2) [right=of view1] {view}; +\node (dots) [right=of view2] {$\ldots$}; +\node[square] (viewN) [right=of dots] {view}; +\draw[->] (state.south) -- (view1.north); +\draw[->] (state.south) .. controls +(down:1cm) and +(up:1cm).. (view2.north); +\draw[->] (state.south) .. controls +(down:1cm) and +(up:1cm).. (viewN.north); +\draw[->, dashed] (view1.south) .. controls +(down:1cm) and +(down:1cm) .. (view2.south) node[pos=0.5, below] {\emph{(impossible)}}; +\end{tikzpicture} +\end{center} +% +is not definable: scheduled changes are recorded in the ledger state, not in +the ledger view. If we cannot even do this \emph{within} an era, there is no +reason to assume it would be possible \emph{across} eras. + +We cannot forecast directly from the old ledger state to the new era either: +this would result in a ledger view from the old era in the new era, violating +the invariant we discussed in \cref{hfc:ledger:invalid-states}. + +Both approaches---forecasting the final old ledger state and then translating, +or forecasting directly across the era boundary and then translating---also +suffer from another problem: neither approach would compute correct forecast +bounds. Correct bounds depend on properties of both the old and the new ledger, +as well as the distance of the old ledger state to that final ledger view. For +example, if that final ledger view is right at the edge of the forecast range of +the old ledger state, we should not be able to give a forecast in the new era at +all. + +Requiring a special forecasting function for each pair of eras of course in a +way is cheating: it pushes the complexity of doing this forecasting to the +specific ledgers that the HFC is instantiated at. As it turns out, however, this +function tends to be easy to define for any pair of concrete ledgers; it's just +hard to define in a completely general way. diff --git a/ouroboros-consensus/docs/report/chapters/hfc/overview.tex b/ouroboros-consensus/docs/report/chapters/hfc/overview.tex new file mode 100644 index 00000000000..7ff9a157863 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/hfc/overview.tex @@ -0,0 +1,17 @@ +\chapter{Overview} +\label{hfc} + +\section{Introduction} +\label{hfc:intro} + +\todo{} We should discuss terminology here: what we mean by a hard fork, +and how that is different from how the word is usually used. + +We should mention that era transitions happen at epoch boundaries only. + +Mention that we had to adjust the consensus layer in some ways: + +\begin{itemize} +\item Simplified chain selection (tip only; \cref{consensus:overview:chainsel}) +\item Remove the assumption slot/time conversion is always possible (\cref{time}) +\end{itemize} diff --git a/ouroboros-consensus/docs/report/chapters/hfc/time.tex b/ouroboros-consensus/docs/report/chapters/hfc/time.tex new file mode 100644 index 00000000000..a6677b904a7 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/hfc/time.tex @@ -0,0 +1,720 @@ +\newcommand{\timeconv}[2]{\ensuremath{\mathtt{Conv}_{#1}(#2)}} +\newcommand{\applyBlocks}[2]{\ensuremath{\mathtt{apply}_\mathit{#1}(#2)}} +\newcommand{\ledgertip}[1]{\ensuremath{\mathtt{tip}(#1)}} +\newcommand{\switch}[3]{\ensuremath{\mathtt{switch}_{(\mathit{#1},\;\mathit{#2})}(#3)}} + +\chapter{Time} +\label{time} + +\section{Introduction} +\label{time:introduction} + +A fundamental property of the Ouroboros family of consensus protocols is that +they all divide time into discrete chunks called \emph{slots}; typically the +duration of a slot is on the order of seconds. In most Ouroboros protocols slots +are grouped into \emph{epochs}, with certain changes to the consensus chain +state happening at various points in an epoch. All nodes running the blockchain +agree on a \emph{system start time} (as a UTC time) through the chain's genesis +configuration, making the translation from a particular wallclock time to a slot +number easy: subtract the system start time from the wall clock time, and +divide by the slot length. This assumption that the mapping between wall clock +and slot or epoch numbers is always available permeated the consensus layer. +Unfortunately, it is not a valid assumption in the presence of hard forks. + +It's not difficult to illustrate this with an example. Suppose we want to know +which slot time $t$ corresponds to in: +% +\begin{center} +\begin{tikzpicture} +\draw (0,0) -- (330pt, 0); +\draw [dotted] (180pt,20pt) node[above] {era transition} -- (180pt,-30pt); +\node at (273 pt,0) {$\bullet$}; +\node at (273 pt,0) [above] {$t$}; +% era 1 +% slot length 6 +% epoch size 10 +% 3 epochs +\foreach \x in {0, 6, ..., 180} { + \draw (\x pt, 0) -- +(0, -3pt); +} +\foreach \x in {0, 60, ..., 180} { + \draw (\x pt, 0) -- +(0, -10pt); +} +% era 2 +% slot length 3 +% epoch size 16 +% 3+ epochs +\foreach \x in {180, 183, ..., 330} { + \draw (\x pt, 0) -- +(0, -3pt); +} +\foreach \x in {180, 228, ..., 330} { + \draw (\x pt, 0) -- +(0, -10pt); +} +\end{tikzpicture} +\end{center} +% +We can read off from this depiction that $t$ is in epoch 1 \emph{of the second +era}, and relative slot 14 within that epoch. Since there are 16 slots to an +epoch in that era, that makes it slot $1 \times 16 + 14 = 30$ within that era. +The second era was preceded by three epochs in the first era, each of which +contained 10 slots, which means that time $t$ was slot $3 \times 10 + 30 = 60$ +globally. + +But now consider how this calculation changes if the era transition would have +happened one epoch later: +% +\begin{center} +\begin{tikzpicture} +\draw (0,0) -- (330pt, 0); +\draw [dotted] (240pt,20pt) node[above] {era transition} -- (240pt,-30pt); +\node at (273 pt,0) {$\bullet$}; +\node at (273 pt,0) [above] {$t$}; +% era 1 +% slot length 6 +% epoch size 10 +% 4 epochs +\foreach \x in {0, 6, ..., 240} { + \draw (\x pt, 0) -- +(0, -3pt); +} +\foreach \x in {0, 60, ..., 240} { + \draw (\x pt, 0) -- +(0, -10pt); +} +% era 2 +% slot length 3 +% epoch size 16 +% 1+ epochs +\foreach \x in {240, 243, ..., 330} { + \draw (\x pt, 0) -- +(0, -3pt); +} +\foreach \x in {240, 288, ..., 330} { + \draw (\x pt, 0) -- +(0, -10pt); +} +\node at (273 pt,0) {$\bullet$}; +\node at (273 pt,0) [above] {$t$}; +\end{tikzpicture} +\end{center} +% +Slot $t$ is now in epoch 0 of the second era, with relative +slot 11, making it slot $0 \times 16 + 11 = 11$ within the second era. +Since the second era got preceded by \emph{four} epochs of the first era, +that makes time $t$ global slot $4 \times 10 + 11 = 51$. + +All of this would be no more than a minor complication if the exact moment of +the era transition would be statically known. This however is not the case: the +moment of the era transition is decided \emph{on the chain itself}. This leads +to the inevitable conclusion that time/slot conversions depend on the ledger +state, and may indeed be impossible: the slot at time $t$ is \emph{simply not +yet known} if the transition to era 2 has not been decided yet. + +\section{Slots, blocks and stability} +\label{time:slots-vs-blocks} + +In \cref{consensus:overview:k} we discussed the fundamental parameter $k$: +blocks that are more than $k$ blocks away from the tip of the chain are +considered to be immutable by the consensus layer and no longer subject to +rollback. We say that such blocks are \emph{stable}. + +The ledger layer itself also depends on stability; for example, in Shelley the +stake distribution to be used for the leadership check needs to be stable before +it is adopted (this avoids malicious nodes from inspecting the leadership +schedule and then trying to cause a rollback if that leadership schedule is not +beneficial to them). + +The ledger layer however does not use block numbers to determine stability, but +uses slot numbers to approximate it instead. This ultimately comes from the fact +that in Ouroboros the length of an \emph{epoch} is based on slots, not blocks, +although this is something we may wish to revisit (\cref{future:block-vs-slot}). + +Depending on the particular choice of consensus algorithm, not all slots contain +blocks. For example, in Praos only a relatively small percentage of slots +contain blocks, depending on the Praos $f$ parameter (in Shelley, $f$ is set to +5\%). However, the various Ouroboros protocols come with proofs (actually, a +probabilistic argument) providing a window of a certain number of slots that is +guaranteed to contain at least $k$ blocks; for example, for Ouroboros Classic +that window is $2k$ slots\footnote{Without much justification, we adopt this +same window for PBFT as well. It is almost certainly a gross overestimation.}, +and for Ouroboros Praos that window is $3k/f$. Stability requirements in the +ledger then take the form ``at least $3k/f$ slots must have passed'' instead of +``at least $k$ blocks must have been applied''. + +\section{Definitions} + +\subsection{Time conversion} + +As we saw in \cref{time:introduction}, we cannot do time conversions independent +of a ledger state. This motivates the following definition: + +\begin{definition}[Time conversion] +Let $\timeconv{\sigma}{t}$ be the function that converts time $t$, with $t$ +either specified as a wallclock time, a slot number, or an epoch number, to a +triplet +\begin{center} +(wallclock time, slot number, epoch number) +\end{center} +if $\sigma$ contains sufficient information to do so; $\timeconv{\sigma}{t}$ +is undefined otherwise. +\end{definition} + +Since all past era transitions are (obviously) known, time conversion should +always be possible for points in the past: + +\begin{property}[Conversion for past points] +$\timeconv{\sigma}{t}$ should be defined for all $t \le \ledgertip{\sigma}$. +\end{property} + +Furthermore, we assume that time conversion is monotone: + +\begin{property}[Monotonicity of time conversion] +\label{time-conversion-monotone} +If $\timeconv{\sigma}{t}$ is defined, then $\timeconv{\applyBlocks{bs}{\sigma}}{t}$ must be as well and +\begin{equation*} +\timeconv{\applyBlocks{bs}{\sigma}}{t} = \timeconv{\sigma}{t} +\end{equation*} +\end{property} + +\subsection{Forecast range} + +Under certain conditions a ledger state may be usable to do time conversions +for slots ahead of the ledger state. + +\begin{definition}[Forecast range] +We say that time $t > \ledgertip{\sigma}$ is within the forecast range of +$\sigma$ if \timeconv{\sigma}{t} is defined. +\end{definition} + +Note that monotonicity (\cref{time-conversion-monotone}) should still +apply. + +\subsection{Safe zone} + +In order to be able to have a non-empty forecast range, we need to restrict +when era transitions can occur. + +\begin{definition}[Safe zone] +A \emph{safe zone} is a period of time ahead of a ledger's tip in which an +era transition is guaranteed not to occur if it is not yet known. +\end{definition} + +Intuitively, a non-empty safe zone means that there will be time between an +era transition being announced and it happening, no matter how the chain +is extended (no matter which blocks are applied): +% +\begin{equation} +\begin{tikzpicture}[baseline=0pt] +\draw [thick] (-50pt,0) -- (50pt, 0) coordinate (tip); +\draw (tip) -- ++(25pt, 15pt) -- ++(40pt, 0pt); +\draw (tip) -- ++(25pt, -15pt) -- ++(40pt, 0pt); +\node at (tip) {$\bullet$}; +\node at (tip) [above left] {ledger tip}; +\draw [dashed] (tip) + -- ++(0pt, 20pt) node[above right] {safe zone} + -- ++(50pt, 0pt) -- ++(0pt, -40pt) -- ++(-50pt, 0pt) -- cycle; +\end{tikzpicture} +\end{equation} + +\section{Ledger restrictions} +\label{time:ledgerrestrictions} + +\subsection{Era transitions must be stable} + +Monotonicity (\cref{time-conversion-monotone}) only talks about a chain's linear +history; since the consensus layer needs to deal with rollbacks (switching to +alternative chains) too, we will actually need a stronger property. Clearly, +time conversions cannot be invariant under switching to arbitrary chains; after +all, alternative chains might have era transitions in different places. The +consensus layer however does not \emph{support} switching to arbitrary +alternative chains; we have a maximum rollback (\cref{consensus:overview:k}), +and we never switch to a shorter chain (\cref{consensus:overview:chainsel}, +\cref{never-shrink}). This means that we can model switching to an alternative +chain as $$\switch{n}{bs}{\sigma}$$ where $n \le k$ indicates how many blocks we +want to rollback, $\mathit{bs}$ is a list of new blocks we want to apply, and +$\mathtt{length} \; \mathit{bs} \ge n$. + +\begin{property}[Time conversions stable under chain evolution] +\label{time-stable-under-evolution} +If \timeconv{\sigma}{t} is defined, then so is +\timeconv{\switch{n}{bs}{\sigma}}{t} +and moreover +\begin{equation*} + \timeconv{\sigma}{t} += \timeconv{\switch{n}{bs}{\sigma}}{t} +\end{equation*} +\end{property} + +Intuitively, \cref{time-stable-under-evolution} says that we might not be able +to do time conversion for some time $t$ because it's outside our current forecast +range, but \emph{if} it is within forecast range, then we don't need to earmark +the answers we get from conversion as ``subject to rollback'': either we don't +know, or we know for sure. This requirement may not be strictly \emph{required} +for consensus to operate (\cref{future:relax-time-requirements}), but it is +a useful assumption which simplifies reasoning about time both within consensus +and within clients of the consensus layer such as the wallet. + +The existence of safe zones is not sufficient to establish this stronger +property, in two ways: + +\begin{itemize} +\item If we switch from a chain where an era transition is already known but +far in the future, to a chain on which the era transition happens much sooner +(or indeed, to a chain on which the era transition is not yet known), then +the forecast range will shrink and hence +\timeconv{\switch{n}{bs}{\sigma}}{t} +might not be defined, even if \timeconv{\sigma}{t} is. +\item Conversely, if we switch from a chain on which the era transition is +happening relatively soon, to a chain on which the era transition is happening +later, then the forecast range will not shrink, but the time conversions on +both chains will not agree with each other.\footnote{Going from a +chain on which the era transition is not yet known to one in which it \emph{is} +known is not problematic, due to safe zones.} +\end{itemize} + +The key problem is that switching to an alternative chain can change our +information about future era transitions, and hence result in different time +conversions. We therefore insist that an era transition is not considered +``known'' until the block confirming the era transition is stable (no longer +subject to rollback). This means that the minimum distance from the announcement +of the era transition to the actual era transition must be $k$ plus the width of +the safe zone: +% +\begin{equation} +\begin{tikzpicture}[baseline=0pt] +\draw [thick] (-50pt,0) -- (50pt, 0) coordinate (tip); +\draw (tip) -- ++(25pt, 15pt) -- ++(40pt, 0pt); +\draw (tip) -- ++(25pt, -15pt) -- ++(40pt, 0pt); +\node at (tip) {$\bullet$}; +\node at (tip) [above left] {ledger tip}; +\draw [dashed] (tip) + -- ++(0pt, 20pt) node[above right] {safe zone} + -- ++(40pt, 0pt) coordinate (transition) + -- ++(0pt, -40pt) -- ++(-40pt, 0pt) -- cycle; +\draw [dotted] (transition) ++(0pt, 20pt) node[above] {era transition} + -- ++(0pt, -70pt); +\draw [<-] (tip) ++(-50pt, 0) + -- +(0,-40pt) node[below] {transition announced}; +% again, cheating... +\node at (25pt, -10pt) {$\underbrace{\hspace{50pt}}_\textrm{$k$ blocks}$}; +\end{tikzpicture} +\end{equation} +% +Many ledgers set the width of the safe zone such that it guarantees at least $k$ +blocks, but \emph{in principle} there is no need for the width of the safe zone +to be related to $k$ at all, although other parts of consensus might have +requirements for the width of the safe zone; we will discuss that in the next +section (\cref{time:ledgerrestrictions:safezones}). + +\subsection{Size of the safezones} +\label{time:ledgerrestrictions:safezones} + +The most important example of where we might need to do time translation for +blocks ahead of the ledger's tip is forecasting the Shelley ledger view +(\cref{ledger:forecasting}). The Shelley ledger view contains an abstraction +called \lstinline!EpochInfo! allowing the ledger to do time conversions, for +example to decide when rewards should be allocated. + +As discussed in \cref{forecast:ledgerview}, it is important that the forecast +range of the ledger to allow us to validate at least $k + 1$ blocks after the +ledger tip; consequently, the safe zone of the ledger must be wide enough to +guarantee that it can span $k + 1$ blocks. This combination of the requirements +of the ledger with the header/body split +(\cref{nonfunctional:network:headerbody}) means that in practice the width of +the safe zone should be at least equal to the forecast range of the ledger, and +hence defined in terms of $k$ after all. + +\subsection{Stability should not be approximated} + +We discussed in \cref{time:slots-vs-blocks} that the ledger uses uses slot +numbers to approximate stability. Such an approximation would violate +\cref{time-stable-under-evolution}, however. Although we never switch to a +shorter chain in terms of blocks, it is certainly possible that we might switch +to a chain with a smaller \emph{slot} number at its tip: this would happen +whenever we switch to a longer but denser chain. If stability would be based on +slot numbers, this might mean that we could go from a situation in which the era +transition is considered known (and hence the forecast extends into the next +era) to a situation in which the era transition is not yet considered known (and +hence the forecast range only includes the safe zone in the current era). + +Admittedly such a reduction of the forecast range would be temporary, and once +the era transition is considered known again, it will be in the same location; +after all, the block that confirmed the era transition \emph{is} stable. This +means that any previously executed time conversions would remain to be valid; +however, the fact that the forecast range shrinks might lead to unexpected +surprises. Using blocks rather than slot numbers to determine stability avoids +this problem. + +\section{Properties} + +\subsection{Forecast ranges arising from safe zones} + +Slot length and epoch size can only change at era transitions. This means that +if the transition to the next era is not yet known, any time $t$ within the +era's safe zone is guaranteed to be within the era's forecast range. If the +transition to the next era \emph{is} known, the safe zone of the current era is +not relevant, but the safe zone of the next era is: +% +\begin{equation} +\begin{tikzpicture}[baseline=0pt] +\draw [thick] (-50pt,0) -- (50pt, 0) coordinate (tip); +\draw (tip) -- ++(25pt, 15pt) -- ++(40pt, 0pt); +\draw (tip) -- ++(25pt, -15pt) -- ++(40pt, 0pt); +\node at (tip) {$\bullet$}; +\node at (tip) [above left] {ledger tip}; +\draw [dashed] (tip) ++(20pt, 0pt) coordinate (transition) + -- ++(0pt, 20pt) node[above right] {safe zone} + -- ++(20pt, 0pt) -- ++(0pt, -40pt) -- ++(-20pt, 0pt) -- cycle; +\draw [dotted] (transition) ++(0pt, 40pt) node[above] {era transition} + -- ++(0pt, -70pt); +\end{tikzpicture} +\end{equation} +% +The safe zone of the next era might be smaller or larger than (or indeed of +equal size as) the safe zone of the previous era; in this example it happens to +be smaller. + +\Cref{hfc:era-transition-becoming-known} shows how the forecast range changes as +the next era transition becomes known; as shown, the next era starts at the +earliest possible moment (right after the safe zone); in general it could start +later than that, but of course not earlier (that would violate the definition of +the safe zone). + +\begin{figure} + +\begin{equation} +\begin{tikzpicture}[baseline=0pt] +\path (0,0) -- ++(200pt, 0pt); % adjust bounding box +\draw [thick] (-50pt,0) -- (50pt, 0) coordinate (tip); +\draw (tip) -- ++(25pt, 15pt) -- ++(50pt, 0pt); +\draw (tip) -- ++(25pt, -15pt) -- ++(50pt, 0pt); +\node at (tip) {$\bullet$}; +\node at (tip) [above left] {ledger tip}; +\draw [dashed] (tip) + -- ++(0pt, 20pt) node[above right] {safe zone} + -- ++(50pt, 0pt) + -- ++(0pt, -40pt) + -- ++(-50pt, 0pt) coordinate[pos=0.5] (safezone) + -- cycle; +\node at (safezone) [below] {$\underbrace{\hspace{50pt}}_\textrm{forecast range}$}; +\end{tikzpicture} +\end{equation} + +\begin{equation} +\begin{tikzpicture}[baseline=0pt] +\path (0,0) -- ++(200pt, 0pt); % adjust bounding box +\draw [gray] (-50pt,0) -- (50pt, 0) coordinate (oldtip); +\draw [gray, name path=chaintop] (oldtip) -- ++(25pt, 15pt) coordinate[pos=0.25] (tip) -- ++(50pt, 0pt); +\draw [gray] (oldtip) -- ++(25pt, -15pt) -- ++(50pt, 0pt); +\draw [thick] (-50pt,0) -- (50pt, 0) -- (tip); +\node at (tip) {$\bullet$}; +\node at (tip) [above left] {ledger tip}; +\draw (tip) -- ++(25pt, 25pt) -- +(50pt, 0pt); +\draw (tip) -- ++(25pt, -5pt) -- +(50pt, 0pt); +\draw [dotted, name path=transition] + (oldtip) ++(50pt, 60pt) node[above] {era transition} + -- ++(0pt, -90pt); +\path [name intersections={of=transition and chaintop}] + (intersection-1) coordinate (safezone); +\draw [dashed] (safezone) + -- ++(0pt, 20pt) node[above right] {safe zone} + -- ++(20pt, 0pt) + -- ++(0pt, -40pt) + -- ++(-20pt, 0pt) coordinate[pos=0.5] (safezone) + -- cycle; + +% cheat: we should compute this of course :) +\node at (90pt,-20pt) [below] {$\underbrace{\hspace{60pt}}_\textrm{forecast range}$}; +\end{tikzpicture} +\label{forecast-range-known-era-transition} +\end{equation} +\caption{\label{hfc:era-transition-becoming-known}Era transition becoming known} +\end{figure} + +\subsection{Cross-fork conversions} +\label{time:cross-fork} + +\begin{lemma}[Cross fork conversions] +Suppose we have the ledger state at some point $P$, and want to do time +conversions for time $t$ of a point $Q$ on a different fork of the chain: + +\begin{center} +\begin{tikzpicture} +\draw (0,0) -- (50pt, 0) coordinate (A); +\draw (A) -- ++(20pt, 20pt) + -- ++(30pt, 0) coordinate(P) + -- ++(30pt, 0); +\draw (A) -- ++(20pt, -20pt) + -- ++(10pt, 0) coordinate(Q1) + -- ++(40pt, 0) coordinate(Q2) + -- ++(10pt, 0); +\node at (A) {$\bullet$}; +\node at (A) [above left] {$A$}; +\node at (P) {$\bullet$}; +\node at (P) [above] {$P$}; +\node at (Q1) {$\bullet$}; +\node at (Q1) [below] {$Q$}; +\draw [dashed] (A) -- ++(0, 40pt) node[above right] {forecast range} + -- ++(40pt, 0) + -- ++(0, -80pt) + -- ++(-40pt, 0) + -- cycle; +\draw [dotted] (Q1) -- +(0, 80pt) -- +(0, -30pt) node[below] {$t$}; +\end{tikzpicture} +\end{center} + +Provided that $Q$ is within the forecast range of the common ancestor $A$ +of $P$ and $Q$, the ledger state at $P$ can be used to do time conversions +for point $t$. +\end{lemma} + +\begin{proof} +Since $t$ is within the forecast range at $A$, by definition $\timeconv{A}{t}$ +is defined. By monotonicity (\cref{time-conversion-monotone}) we must have +\begin{align*} +\timeconv{A}{t} & = \timeconv{P}{t} \\ +\timeconv{A}{t} & = \timeconv{Q}{t} +\end{align*} +It follows that $\timeconv{P}{t} = \timeconv{Q}{t}$. +\end{proof} + +\section{Avoiding time} +\label{hfc:avoiding-time} + +Time is complicated, and time conversions were pervasive throughout the +consensus layer. Despite the exposition above and the increased understanding, +we nonetheless have attempted to limit the use of time as much as possible, +in an attempt to simplify reasoning whenever possible. The use of +time within the core consensus layer is now very limited indeed: + +\begin{enumerate} +\item When we check if we are a slot leader and need to produce a block, we +need to know the current time as a slot number (\todo{TODO.}We should discuss +this somewhere. The chapter on the consensus protocol discusses the protocol +side of things, but not the actual ``fork block production'' logic.) +\item When we add new blocks to the chain DB, we need to check if their slot +number is ahead of the wallclock (\cref{chainsel:infuture}). +\item Specific consensus protocols may need to do time conversions; for example, +Praos needs to know when various points in an epoch have been reached in order +to update nonces, switch stake distribution, etc. +\end{enumerate} + +None of these use cases require either forecasting or cross-chain conversions. +The most important example of where forecasting is required is in projecting +the ledger view, as discussed in \cref{time:ledgerrestrictions:safezones}. +Cross-fork conversions (\cref{time:cross-fork}) may arise for example when the consensus layer makes time conversions available to tooling such as the wallet, +which may use it for example to show the wallclock of slots of blocks that may +not necessarily live on the current chain. + +Keeping track of era transitions, and providing time conversions that take +them into account, is the responsibility of the hard fork combinator and +we will discuss it in more detail in \cref{hfc:time}. + +In the remainder of this section we will discuss some simplifications +that reduced the reliance on time within the consensus layer. + +\subsection{``Header in future'' check} +\label{time:header-infuture-check} + +Recall from \cref{nonfunctional:network:headerbody} that block downloading +proceeds in two steps: first, the chain sync client downloads the block header +and validates it; if it finds that the header is valid, the block download logic +may decide to also download the block body, depending on chain selection +(\cref{consensus:overview:chainsel,consensus:class:chainsel}). + +Suppose the node's own ledger state is at point $P$, and the incoming header is +at point $Q$. In order to validate the header, we need a ledger \emph{view} at +point $Q$ without having the ledger \emph{state} at point $Q$; this means that +point $Q$ must be within the ledger's forecast range at the common ancestor $A$ +of $P$ and $Q$ (\cref{,ledger:forecasting}): + +\begin{center} +\begin{tikzpicture} +\draw (0, 0) -- (50pt, 0) coordinate (A); +\draw (A) -- ++(20pt, 20pt) -- ++(20pt, 0) coordinate (P) -- ++(40pt, 0); +\draw (A) -- ++(20pt, -20pt) -- ++(40pt, 0) coordinate (Q) -- ++(20pt, 0); +\node at (P) {$\bullet$}; +\node at (Q) {$\bullet$}; +\node at (A) [above left] {$A$}; +\node at (P) [above] {$P$}; +\node at (Q) [below] {$Q$}; +\draw [dashed] (A) -- ++(0, 40pt) node[above right] {ledger forecast range} + -- ++(70pt, 0) + -- ++(0, -80pt) + -- ++(-70pt, 0) + -- cycle; +\end{tikzpicture} +\end{center} + +As we have seen in \cref{time:cross-fork}, if $Q$ is within the \emph{time} +forecast range at $A$---put another way, if the time forecast range is at least +as wide as the ledger forecast range---then we also can use the ledger state at +$P$ to do time conversions at point $Q$. Moreover, as we saw in +\cref{time:ledgerrestrictions:safezones}, for many ledgers that inclusion +\emph{must} hold. If we make this a requirement for \emph{all} ledgers, in +principle the chain sync client could do a header-in-future check. + +For simplicity, however, we nonetheless omit the check. As we will see in the +next section, the chain database must repeat this check \emph{anyway}, and so +doing it ahead of time in the chain sync client does not help very much; +skipping it avoids one more use of time within the consensus layer. Indeed, a +case could be made that we could skip header validation altogether, which would +alleviate the need for forecasting \emph{at all}; we will come back to this in +\cref{future:eliminating-forecasting}. + +\subsection{Ahead-of-time ``block in future'' check} +\label{time:block-infuture-check} + +In the original design of the chain database, when a new block was added we +first checked if the block's slot number was ahead of the wallclock, before +considering it for chain selection. If it was ahead of the wallclock by a small +amount (within the permissible clock skew), we then scheduled an action to +reconsider the block when its slot arrived. + +In order to compare the block's slot number to the wallclock, we can either +convert the block's slot to a wallclock time, or convert the current wallclock +time to a slot number. Both are problematic: the only ledger state we have +available is our own current ledger state, which may not be usable to translate +the current wallclock time to a slot number, and since we don't know anything +about the provenance of the block (where the block came from), that ledger state +may also not be usable to translate the block's slot number to a wallclock +time. We now circumvent this problem by delaying the in-future check until we +have validated the block, and so can use the block's \emph{own} ledger state to +do the time conversion (\cref{chainsel:infuture}). + +We saw in the previous section that the chain sync client \emph{could} do the +in-future check on headers, but the chain sync client is not the only way that +blocks can be added to the chain database, so simply skipping the check in the +chain database altogether, stipulating as a \emph{precondition} that the block +is not ahead of the wallclock, is not a good idea. Nonetheless it is worth +considering if we could use a weaker precondition, merely requiring that the +node's current ledger tip must be usable for time conversions for the slot +number of the new block. Specifically, can we guarantee that we can satisfy this +precondition in the chain sync client, if we do the in-future check on headers +after all? + +It turns out that in general we cannot, not even in relatively common cases. +Consider again the diagram from \cref{time:header-infuture-check}, but +specialised to the typical case that the upstream node is on the same chain as +we are, but a bit ahead of us: + +\begin{center} +\begin{tikzpicture} +\path (0,0) -- ++(200pt, 0pt); % adjust bounding box +\draw (0, 0) -- (50pt, 0) coordinate (A) coordinate (P); +\draw (A) -- ++(20pt, 0pt) -- ++(20pt, 0) -- ++(40pt, 0); +\draw (A) -- ++(20pt, 0pt) -- ++(40pt, 0) coordinate (Q) -- ++(20pt, 0); +\node at (P) {$\bullet$}; +\node at (Q) {$\bullet$}; +\node at (A) [above left] {$A$}; +\node at (A) {$\bullet$}; +\node at (P) [below left] {$P$}; +\node at (Q) [below] {$Q$}; +\draw [dashed] (A) -- ++(0, 20pt) node[above right] {forecast range} + -- ++(70pt, 0) + -- ++(0, -40pt) + -- ++(-70pt, 0) + -- cycle; +\end{tikzpicture} +\end{center} + +Since $P$ and $Q$ are on the same chain, point $P$ is necessarily also the +``intersection'' point, and the distance between $P$ and $Q$ can only arise from +the block download logic lagging behind the chain sync client. +Now consider what happens when the node switches to an alternative fork: + +\begin{center} +\begin{tikzpicture} +\path (0,0) -- ++(200pt, 0pt); % adjust bounding box +\draw (0, 0) -- (30pt, 0) coordinate (A); +\draw (A) -- ++(20pt, 20pt) -- ++(20pt, 0) coordinate (P) -- ++(60pt, 0); +\draw (A) -- ++(20pt, -20pt) -- ++(60pt, 0) coordinate (Q) -- ++(20pt, 0); +\node at (P) {$\bullet$}; +\node at (Q) {$\bullet$}; +\node at (A) [above left] {$A$}; +\node at (P) [above] {$P$}; +\node at (Q) [below] {$Q$}; +\draw [dashed] (A) -- ++(0, 40pt) node[above right] {forecast range} + -- ++(70pt, 0) + -- ++(0, -80pt) + -- ++(-70pt, 0) + -- cycle; +\end{tikzpicture} +\end{center} + +Note what happens: since the node is switching to another fork, it must rollback +some blocks and then roll forward; consequently, the intersection point $A$ +moves back, and $P$ moves forward (albeit on a different chain). $Q$ stays the +same, \emph{but might have fallen out of the forecast range at $A$}. + +This means that even if the chain sync client was able to verify that a header +(at point $Q$) was not ahead of the wallclock, if the node switches to a +different fork before the block download logic has downloaded the corresponding +block, when it presents that downloaded block to the chain database, the block +might no longer be within the forecast range of the node's current ledger and +the chain database will not be able to verify (ahead of time) whether or not the +block is ahead of the wallclock. What's worse, unlike the chain sync client, the +chain database has no access to the intersection point $A$; it all it has is the +ledger's current tip at point $P$ and the new block at point $Q$. It therefore +has no reliable way of even determining \emph{if} it can do time conversions for +the new block. + +\subsection{``Immutable tip in future'' check} +\label{time:imm-tip-in-future} + +The chain database never adopts blocks from the future +(\cref{chainsel:infuture}). Nevertheless, it is possible that if the user sets +their computer system clock back by (the equivalent of) more than $k$ blocks, +the immutable database (\cref{storage:components}) might contain blocks +whose slot numbers are ahead of the wall clock. We cannot verify this during a +regular integrity check of the immutable database because, as we have seen in +this chapter, we would need a ledger state to do so, which we are not +constructing during that integrity check. For now, we simply omit this check +altogether, declaring it to be the user's responsibility instead to do a +fresh install if they do reset their clock by this much. + +However, in principle this check is not difficult: we initialise the immutable +DB \emph{without} doing the check, then initialise the ledger DB, passing it the +immutable DB (which it needs to replay the most recent blocks, see +\cref{ledgerdb}), and then ask the ledger DB for the ledger state +corresponding to the tip of the immutable database. That ledger state will then +allow us to do time conversions for any of the blocks in the immutable DB, +trimming any blocks that are ahead of the wallclock. + +\subsection{Scheduling actions for slot changes} +\label{time:scheduling-actions} + +The consensus layer provides an abstraction called \lstinline!BlockchainTime! +that provides access to the current slot number. It also offers an interface +for scheduling actions to be run on every slot change. However, if the node +is still syncing with the chain, and does not have a recent ledger state +available, the current slot number, and indeed the current slot length, +are simply unknown. In this case the blockchain time will report the current +slot number as unavailable, and any scheduled actions will not be run. + +We therefore limit the use of this scheduler to a single application only: +it is used to trigger the leadership check (and corresponding block +production, if we find we are a leader). This means that the leadership +check will not be run if we are still syncing with the chain and have no +recent ledger state available, but that is correct: producing blocks based on +ancient ledger states is anyway not useful. + +\subsection{Switching on ``deadline mode'' in the network layer} + +Under normal circumstances, the priority of the network layer is to reduce +\emph{latency}: when a block is produced, it must be distributed across the +network as soon as possible, so that the next slot leader can construct the +\emph{next} block as this block's successor; if the block arrives too late, +the next slot leader will construct their block as the successor of the previous +block instead, and the chain temporarily forks. + +When we are far behind, however, the priority is not to reduce latency, but +rather to improve \emph{throughput}: we want to catch up as quickly as we can +with the chain, and aren't producing blocks anyway +(\cref{time:scheduling-actions}). + +In order to switch between these two modes we want to know if we are near the +tip of the ledger---but how can we tell? If we know the current slot number +(the slot number corresponding to the current wall clock), we can compare +that current slot number to the slot number at the tip of the ledger. But, +as we mentioned before, if we are far behind, the current slot number of +simply unknown. Fortunately, we can use this to our advantage: if the +slot number is unknown, we \emph{must} be far behind, and hence we can use +the decision, turning on deadline mode only if the slot number is known +\emph{and} within a certain distance from the ledger tip. diff --git a/ouroboros-consensus/docs/report/chapters/intro/intro.tex b/ouroboros-consensus/docs/report/chapters/intro/intro.tex new file mode 100644 index 00000000000..38be113cd4b --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/intro/intro.tex @@ -0,0 +1,59 @@ +\chapter{Introduction} + +The Cardano Consensus and Storage layer, or \emph{the consensus layer} for +short, is a critical piece of infrastructure in the Cardano Node. It +orchestrates between the \emph{network layer} below it and the +\emph{ledger layer} above it. + +The network layer is a highly concurrent piece of software that deals with +low-level concerns; its main responsibility is to transmit data efficiently +across the network. Although it primarily transmits blocks and block headers, it +does not interpret them and does not need to know much about them. In the few +cases where it \emph{does} need to make some block-specific decisions, it +calls back into the consensus layer to do so. + +The ledger layer by contrast exclusively deals with high-level concerns. It is +entirely stateless: its main responsibility is to define a single pure +function describing how the ledger state is transformed by blocks (verifying +that blocks are valid in the process). It is only concerned with linear history; +it is not aware of the presence of multiple competing chains or the roll backs +required when switching from one chain to another. We do require that the ledger +layer provides limited \emph{lookahead}, computing (views on near) +\emph{future} ledger states (required to be able to validate block headers +without access to the corresponding block bodies) + +The consensus layer mediates between these two layers. It includes a +bespoke storage layer that provides efficient access to the current ledger state +as well as recent \emph{past} ledger states (required in order to be able +to validate and switch to competing chains). The storage layer also +provides direct access to the blocks on the blockchain itself, so that they can +be efficiently streamed to clients (via the network layer). When there are +competing chains, the consensus layer decides which chain is preferable and +should be adopted, and it decides when to \emph{contribute} to the chain +(produce new blocks). All ``executive decisions'' about the chain are made in +and by the consensus layer. + +Lastly, as well we see, the consensus layer is highly abstract and places a +strong emphasis on compositionality, making it usable with many different +consensus algorithms and ledgers. Importantly, compositionality enables the +\emph{hard fork combinator} to combine multiple ledgers and regard them as a +single blockchain. + +The goal of this document is to outline the design goals for the consensus +layer, how we achieved them, and where there is still scope for improvement. We +will both describe \emph{what} the consensus layer is, and \emph{why} it is the +way it is. Throughout we will also discuss what \emph{didn't} work, approaches +we considered but rejected, or indeed adopted but later abandoned; discussing +these dead ends is sometimes at least as informative as discussing the solution +that did work. + +We will consider some of the trade-offs we have had to make, how they +affected the development, and discuss which of these trade-offs should perhaps +be reconsidered. We will also take a look at how the design can scale to +facilitate future requirements, and which requirements will be more problematic +and require more large-scale refactoring. + +The target audience for this document is primarily developers working on the +consensus layer. It may also be of more broader interest to people generally +interested in the Cardano blockchain, although we will assume that the +reader has a technical background. diff --git a/ouroboros-consensus/docs/report/chapters/intro/nonfunctional.tex b/ouroboros-consensus/docs/report/chapters/intro/nonfunctional.tex new file mode 100644 index 00000000000..1335925d372 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/intro/nonfunctional.tex @@ -0,0 +1,91 @@ +\chapter{Non-functional requirements} +\label{nonfunctional} + +This whole chapter is Duncan-suitable :) +\duncan + +\section{Network layer} +\label{nonfunctional:network} + +This report is not intended as a comprehensive discussion of the network layer; +see \cite{network-spec} instead. However, in order to understand +some of the design decisions in the consensus layer we need to understand some +of the requirements imposed on it by the network layer. + +TODOs: + +\begin{itemize} +\item Highlight relevant aspects of the design of the network layer +\item Discuss requirements this imposes on the consensus layer +Primary example: Forecasting. +\item How do we keep the overlap between network and consensus as small +as possible? Network protocols do not involve consensus protocols +(chain sync client is not dependent on chain selection). Chain sync +client + "pre chain selection" + block download logic keeps things isolated. +\item Why do we even want to validate headers ahead of time? (Thread model etc.) +(Section for Duncan?). +Section with a sketch on an analysis of the amortised cost for attackers versus +our own costs to defend against it ("budget for work" that grows and shrinks +as you interact with a node). +\end{itemize} + +\subsection{Header/Body Split (aka: Header submission)} +\label{nonfunctional:network:headerbody} + +Discuss the chain fragments that we store per upstream node. +Discuss why we want to validate headers here -- without a full ledger state +(necessarily so, since no block bodies -- can't update ledger state): to prevent +DoS attacks. +(\cref{ledger:forecasting} contains a discussion of this from the point of view of +the ledger). +Forward reference to the chain sync client (\cref{chainsyncclient}). +Discuss why it's useful if the chain sync client can race ahead for +\emph{performance} (why it's required for chain selection is the discussed in +\cref{forecast:ledgerview}). + +See also section on avoiding the stability window +(\cref{low-density:pre-genesis}). + +\subsection{Block submission} +\label{nonfunctional:network:blocksubmission} + +Forward reference to \cref{servers:blockfetch}. + +\subsection{Transaction submission} +\label{nonfunctional:network:txsubmission} + +Mention that these are defined entirely network side, no consensus involvement +(just an abstraction over the mempool). + +\section{Security "cost" concerns} + +TODO: Look through the code and git history to find instances of where we +one way but not the other because it would give an attacker an easy way to +make it do lots of work (where were many such instances). + +Fragile. Future work: how might be make this less brittle? +Or indeed, how might we test this? + +Counter-examples (things we don't want to do) + +\begin{itemize} +\item Parallel validation of an entire epoch of data (say, crypto only). +You might do a lot of work before realising that that work was not needed because +of an invalid block in the middle. +\end{itemize} + +Future work: opportunities for parallelism that we don't yet exploit +(important example: script evaluation in Goguen). + +\section{Hard time constraints} + +Must produce a block on time, get it to the next slot leader + +Bad counter-example: reward calculation in the Shelley ledger bad +(give examples of why). + +\section{Predictable resource requirements} + +make best == worst + +(not \emph{just} a security concern: a concern even if every node honest) diff --git a/ouroboros-consensus/docs/report/chapters/intro/overview.tex b/ouroboros-consensus/docs/report/chapters/intro/overview.tex new file mode 100644 index 00000000000..d17bff17c13 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/intro/overview.tex @@ -0,0 +1,251 @@ +\chapter{Overview} + +\section{Components} + +\subsection{Consensus protocols} +\label{overview:consensus} + +The consensus protocol has two primary responsibilities: +\label{consensus-responsibilities} + +\begin{description} +\item[Chain selection] Competing chains arise when two or more nodes extend the +chain with different blocks. This can happen when nodes are not aware of each +other's blocks due to temporarily network delays or partitioning, but depending +on the particular choice of consensus algorithm it can also happen in the normal +course of events. When it happens, it is the responsibility of the consensus +protocol to choose between these competing chains. + +\item[Leadership check] In proof-of-work blockchains any node can produce a +block at any time, provided that they have sufficient hashing power. By +contrast, in proof-of-stake time is divided into \emph{slots}, and each slot has +a number of designated \emph{slot leaders} who can produce blocks in that slot. +It is the responsibility of the consensus protocol to decide on this mapping +from slots to slot leaders. +\end{description} + +The consensus protocol will also need to maintain its own state; we will discuss +state management in more detail in \cref{storage:inmemory}. + +\subsection{Ledger} +\label{overview:ledger} + +The role of the ledger is to define what is stored \emph{on} the blockchain. +From the perspective of the consensus layer, the ledger has three primary +responsibilities: + +\begin{description} +\item[Applying blocks] The most obvious and most important responsibility of +the ledger is to define how the ledger state changes in response to new blocks, +validating blocks at it goes and rejecting invalid blocks.\ + +\item[Applying transactions] Similar to applying blocks, the ledger layer also +must provide an interface for applying a single transaction to the ledger state. +This is important, because the consensus layer does not just deal with +previously constructed blocks, but also constructs \emph{new} blocks. + +\item[Ticking time] Some parts of the ledger state change due to the passage of +time only. For example, blocks might \emph{schedule} some changes to be applied +later, and then when the relevant slot arrives those changes should be applied, +independent from any blocks. + +\item[Forecasting] Some consensus protocols require limited information from the +ledger. In Praos, for example, a node's probability of being a slot leader is +proportional to its stake, but the stake distribution is something that the +ledger keeps track of. We refer to this as a \emph{view} on the ledger, and we +require not just that the ledger can give us a view on the \emph{current} ledger +state, but also \emph{predict} what that view will be for slots in the near +future. We will discuss the motivation for this requirement in +\cref{nonfunctional:network:headerbody}. +\end{description} + +The primary reason for separating out ``ticking'' from applying blocks is that +the consensus layer is responsible to the leadership check +(\cref{consensus-responsibilities}), and when we need to decide if we should be +producing a block in a particular slot, we need to know the ledger state at that +slot (even though we don't have a block for that slot \emph{yet}). It is also +required in the mempool; see \cref{mempool}. + +\section{Design Goals} + +\subsection{Multiple consensus protocols} +\label{multiple-consensus-protocols} + +From the beginning it was clear that we would need support for multiple +consensus algorithms: the Byron era uses a consensus algorithm called +(Permissive) BFT (\cref{bft}) and the Shelley era uses a consensus algorithm +called Praos (\cref{praos}). Moreover, the Cardano blockchain is a \emph{hybrid} +chain where the prefix of the chain runs Byron (and thus uses BFT), and then +continues with Shelley (and thus uses Praos); we will come back to the topic of +composing protocols when we discuss the hard fork combinator (\cref{hfc}). It is +therefore important that the consensus layer abstracts over a choice of +consensus protocol. + +\subsection{Support for multiple ledgers} +\label{multiple-ledgers} + +For much the same reason that we must support multiple consensus protocols, we +also have to support multiple ledgers. Indeed, we expect more changes in ledger +than in consensus protocol; currently the Cardano blockchain starts with a +Byron ledger and then transitions to a Shelley ledger, but further changes to +the ledger have already been planned (some intermediate ledgers currently +code-named Allegra and Mary, as well as larger updates to Goguen, Basho and +Voltaire). All of the ledgers (Shelley up to including Voltaire) +use the Praos consensus algorithm (potentially extended with the genesis chain +selection rule, see \cref{genesis}). + +\subsection{Decouple consensus protocol from ledger} +\label{decouple-consensus-ledger} + +As we saw above (\cref{multiple-ledgers}), we have multiple ledgers that all +use the same consensus protocol. We therefore should be able to define the +consensus protocol \emph{independent} from a particular choice of ledger, +merely defining what the consensus protocol expects from the ledger +(we will see what this interface looks like in \cref{ledger}). + +\subsection{Testability} +\label{testability} + +The consensus layer is a critical component of the Cardano Node, the software +that runs the Cardano blockchain. Since the blockchain is used to run the ada +cryptocurrency, it is of the utmost importance that this node is reliable; +network downtime or, worse, corruption of the blockchain, cannot be tolerated. +As such the consensus layer is subject to much stricter correctness criteria +than most software, and must be tested thoroughly. To make this possible, we +have to design for testability. + +\begin{itemize} +\item We must be able to simulate various kinds of failures (disk +failures, network failures, etc.) and verify that the system can recover. +\item We must be able to run \emph{lots} of tests which means that tests need to +be cheap. This in turn will require for example the possibility to swap the +cryptographic algorithms for much faster ``mock'' crypto algorithms. +\item We must be able to test how the system behaves under certain +expected-but-rare circumstances. For example, under the Praos consensus +protocol it can happen that a particular slot has multiple leaders. We should be +able to test what happens when this happens repeatedly, but the leader selection +is a probabilistic process; it would be difficult to set up test scenarios to +test for this specifically, and even more difficult to set things up so that +those scenarios are \emph{shrinkable} (leading to minimal test cases). We must +therefore be able to ``override'' the behaviour of the consensus protocol (or +indeed the ledger) at specific points. +\item We must be able to test components individually (rather than just the +system as a whole), so that if a test fails, it is much easier to see where +something went wrong. +\end{itemize} + +\subsection{Adaptability and Maintainability} +\label{adaptability} + +The Cardano Node began its life as an ambitious replacement of the initial +implementation of the Cardano blockchain, which had been developed by Serokell. +At the time, the Shelley ledger was no more than a on-paper design, and +the Praos consensus protocol existed only as a research paper. Moreover, since +the redesign would be unable to reuse any parts of the initial implementation, +even the Byron ledger did not yet exist when the consensus layer was started. +It was therefore important from the get-go that the consensus layer was not +written for a specific ledger, but rather abstract over a choice of ledger +and define precisely what the responsibilities of that ledger were. + +This abstraction over both the consensus algorithm and the ledger is important +for other reasons, too. As we've mentioned, although initially developed to +support the Byron ledger and the (Permissive) BFT consensus algorithm, the goal +was to move to Shelley/Praos as quickly as possible. Moreover, additional +ledgers had already been planned (Goguen, Basho and Voltaire), and research on +consensus protocols was (and still is) ongoing. It was therefore important that +the consensus layer could easily be adapted. + +Admittedly, adaptability does not \emph{necessarily} require abstraction. We +could have built the consensus layer against the Byron ledger initially +(although we might have had to wait for it to be partially completed at least), +and then generalise as we went. There are however a number of downsides to this +approach. + +\begin{itemize} +\item When working with a concrete interface, it is difficult to avoid certain +assumptions creeping in that may hold for this ledger but will not hold for +other ledgers necessarily. When such assumptions go unnoticed, it can be costly +to adjust later. (For one example of such an assumption that nonetheless +\emph{did} go unnoticed, despite best efforts, and took a lot of work to +resolve, see \cref{time} on removing the assumption that we can always +convert between wallclock time and slot number.) + +\item IOHK is involved in the development of blockchains other than the public +Cardano instance, and from the start of the project, the hope was that the +consensus layer can be used in those projects as well. Indeed, it is currently +being integrated into various other IOHK projects. + +\item Perhaps most importantly, if the consensus layer only supports a single, +concrete ledger, it would be impossible to \emph{test} the consensus layer with +any ledgers other than that concrete ledger. But this means that all consensus +tests need to deal with all the complexities of the real ledger. By contrast, +by staying abstract, we can run a lot of consensus tests with mock ledgers that +are easier to set up, easier to reason about, more easily instrumented and more +amenable to artificially creating rare circumstances (see \cref{testability}). +\end{itemize} + +Of course, abstraction is also just good engineering practice. Programming +against an abstract interface means we are clear about our assumptions, +decreases dependence between components, and makes it easier to understand and +work with individual components without having to necessarily understand the +entire system as a whole. + +\subsection{Composability} +\label{composability} + +The consensus layer is a complex piece of software; at the time we are writing +this technical report, it consists of roughly 100,000 lines of code. It is +therefore important that we split it into into small components that can be +understood and modified independently from the rest of the system. Abstraction, +discussed in \cref{adaptability}, is one technique to do that, but by no means +the only. One other technique that we make heavy use of is composability. We +will list two examples here: + +\begin{itemize} +\item As discussed in \cref{multiple-consensus-protocols} and +\cref{multiple-ledgers}, the Cardano blockchain has a prefix that runs the BFT +consensus protocol and the Byron ledger, and then continues with the Praos +consensus protocol and the Shelley ledger. We do not however define a consensus +protocol that is the combination of Byron and Praos, nor a ledger that is the +combination of Byron and Shelley. Instead, the \emph{hard fork combinator} +(\cref{hfc}) makes it possible to \emph{compose} consensus protocols and +ledgers: construct the hybrid consensus protocol from an implementation of BFT +and an implementation of Praos, and similarly for the ledger. + +\item We mentioned in \cref{testability} that it is important that we can +test the behaviour of the consensus layer under rare-but-possible circumstances, +and that it is therefore important that we can override the behaviour of the +consensus algorithm in tests. We do not accomplish this however by adding +special hooks to the Praos consensus algorithm (or any other); instead we define +another combinator that takes the implementation of a consensus algorithm and +\emph{adds} additional hooks for the sake of the testing infrastructure. This +means that the implementation of Praos does not have to be aware of testing +constraints, and the combinator that adds these testing hooks does not need to +be aware of the details of how Praos is implemented. +\end{itemize} + +\subsection{Predictable Performance} + +Make sure node operators do not set up nodes for "normal circumstances" only +for the network to go down when something infrequent (but expected) occurs. +(This is not about malicious behaviour, that's the next section). + +\duncan + +\subsection{Protection against DoS attacks} + +Brief introduction to asymptotic attacker/defender costs. (This is just an +overview, we come back to these topics in more detail later.) + +\duncan + +\section{History} +\label{overview:history} % OBFT references refer to this section as well + +\duncan + +\begin{itemize} +\item Briefly describe the old system (\lstinline!cardano-sl!) the decision +to rewrite it +\item Briefly discuss the OBFT hard fork. +\end{itemize} diff --git a/ouroboros-consensus/docs/report/chapters/miniprotocols/chainsyncclient.tex b/ouroboros-consensus/docs/report/chapters/miniprotocols/chainsyncclient.tex new file mode 100644 index 00000000000..0af28ef8de5 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/miniprotocols/chainsyncclient.tex @@ -0,0 +1,28 @@ +\chapter{Chain sync client} +\label{chainsyncclient} + +\section{Header validation} +\label{chainsyncclient:validation} + +Discuss the fact that we validate headers (maybe a forward reference to the genesis chapter, where this becomes critical). + +Discuss that this means we need efficient access to the $k$ most recent ledger states (we refer to this section for that). + +\section{Forecasting requirements} +\label{chainsyncclient:forecasting} + +Discuss that forecasting must have sufficient range to validate a chain longer than our own chain, so that we can meaningfully apply chain selection. + +NOTE: Currently \cref{low-density} contains such a discussion. + +\section{Trimming} +\label{chainsyncclient:trimming} + +\section{Interface to the block fetch logic} +\label{chainsyncclient:plausiblecandidates} + +We should discuss here the (very subtle!) reasoning about how we establish +the precondition that allows us to compare candidates +(\cref{chainsel:fragments:precondition}). See +\lstinline!plausibleCandidateChain! in \lstinline!NodeKernel! +(PR \#2735). diff --git a/ouroboros-consensus/docs/report/chapters/miniprotocols/servers.tex b/ouroboros-consensus/docs/report/chapters/miniprotocols/servers.tex new file mode 100644 index 00000000000..1a0364df224 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/miniprotocols/servers.tex @@ -0,0 +1,31 @@ +\chapter{Mini protocol servers} +\label{servers} + +The division of work between the network layer and the consensus layer when it +comes to the implementation of the clients and servers of the mini protocols is +somewhat pragmatic. Servers and clients that do significant amounts of network +layer logic (such as block fetch client which is making delta-Q related +decisions, node-to-node transaction server and client, which are dealing with +transaction windows, etc), live in the network layer. Clients and servers that +primarily deal with consensus side concerns live in the consensus layer; the +chain sync client (\cref{chainsyncclient}), is the primary example of this. +There are also a number of servers for the mini protocols that do little more +than provide glue code between the mini protocol and the consensus interface; +these servers are described in this chapter. + +\section{Local state query} +\label{servers:lsq} + +\section{Chain sync} +\label{servers:chainsync} + +\section{Local transaction submission} +\label{servers:txsubmission} + +Unlike remote (node to node) transaction submission, local (client to node) +transaction submission does not deal with transaction windows, and is +consequently much simpler; it therefore lives consensus side rather than +network side. + +\section{Block fetch} +\label{servers:blockfetch} diff --git a/ouroboros-consensus/docs/report/chapters/storage/chaindb.tex b/ouroboros-consensus/docs/report/chapters/storage/chaindb.tex new file mode 100644 index 00000000000..25af08d8592 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/chaindb.tex @@ -0,0 +1,141 @@ +\chapter{Chain Database} +\label{chaindb} + +TODO\todo{TODO}: This is currently a disjoint collection of snippets. + +\section{Block processing queue} +\label{chaindb:queue} + +Discuss the chain DB's block processing queue, the future/promises/events, +concurrency concerns, etc. + +Discuss the problem of the effective queue size (\#2721). + +\section{Marking invalid blocks} +\label{chaindb:invalidblocks} + +The chain database keeps a set of hashes of known-to-be-invalid blocks. +This information is used by the chain sync client (\cref{chainsyncclient}) to +terminate connections to nodes with a chain that contains an invalid block. + +\begin{lemma} +\label{chaindb:dont-mark-invalid-successors} +When the chain database discovers an invalid block $X$, it is sufficient +to mark only $X$; there is no need to additionally mark any successors of $X$. +\end{lemma} + +\begin{proof}[Proof (sketch).] +The chain sync client maintains a chain fragment corresponding to some suffix +of the upstream node's chain, and it preserves an invariant that that suffix +must intersect with the node's own current chain. It can therefore never be +the case that the fragment contains a successor of $X$ but not $X$ itself: +since $X$ is invalid, the node will never adopt it, and so a fragment that +intersects the node's current chain and includes a successor of $X$ \emph{must} +also contain $X$. +\end{proof} + +TODO\todo{TODO}: We should discuss how this relates to GC (\cref{chaindb:gc}). + +\section{Effective maximum rollback} + +The maximum rollback we can support is bound by the length of the current fragment. This will be less than $k$ only if + +\begin{itemize} +\item We are near genesis and the immutable database is empty, or +\item Due to data corruption the volatile database lost some blocks +\end{itemize} + +Only the latter case is some cause for concern: we are in a state where +conceptually we \emph{could} roll back up to $k$ blocks, but due to how we chose +to organise the data on disk (the immutable/volatile split) we cannot. One +option here would be to move blocks \emph{back} from the immutable DB to the +volatile DB under these circumstances, and indeed, if there were other parts of +the system where rollback might be instigated that would be the right thing to +do: those other parts of the system should not be aware of particulars of the +disk layout. + +However, since the chain database is \emph{exclusively} in charge of switching +to forks, all the logic can be isolated to the chain database. So, when we have +a short volatile fragment, we will just not roll back more than the length of +that fragment. Conceptually this can be justified also: the fact that $I$ is the +tip of the immutable DB means that \emph{at some point} it was in our chain at +least $k$ blocks back, and so we considered it to be immutable: the fact that +some data loss occurred does not really change that. We may still roll back more +than $k$ blocks when disk corruption occurs in the immutable database, of +course. + +One use case of the current fragment merits a closer examination. When the chain +sync client (\cref{chainsyncclient}) looks for an intersection between our chain +and the chain of the upstream peer, it sends points from our chain fragment. If +the volatile fragment is shorter than $k$ due to data corruption, the client +would have fewer points to send to the upstream node. However, this is the +correct behaviour: it would mean we cannot connect to upstream nodes who fork +more than $k$ of what \emph{used to be} our tip before the data corruption, even +if that's not where our tip is anymore. In the extreme case, if the volatile +database gets entirely erased, only a single point is available (the tip of the +immutable database $I$), and hence we can only connect to upstream nodes that +have $I$ on their chain. This is precisely stating that we can only sync with +upstream nodes that have a chain that extends our immutable chain. + +\section{Garbage collection} +\label{chaindb:gc} + +Blocks on chains that are never selected, or indeed blocks whose +predecessor we never learn, will eventually be garbage collected when their +slot number number is more than $k$ away from the tip of the selected chain.\footnote{This is slot based rather than block based for historical +reasons only; we should probably change this.} + +\begin{bug} +The chain DB (more specifically, the volatile DB) can still grow without bound +if we allow upstream nodes to rapidly switch between forks; this should be +addressed at the network layer (for instance, by introducing rate limiting for +rollback in the chain sync client, \cref{chainsyncclient}). +\end{bug} + +Although this is GC of the volatile DB, I feel it belongs here more than in +the volatile DB chapter because here we know \emph{when} we could GC. +But perhaps it should be split into two: a section on how GC is implemented +in the volatile DB chapter, and then a section here how it's used in the +chain DB. References from elsewhere in the report to GC should probably +refer here, though, not to the vol DB chapter. + +\subsection{GC delay} + +For performance reasons neither the immutable DB nor the volatile DB ever makes +explicit \lstinline!fsync! calls to flush data to disk. This means that when the +node crashes, recently added blocks may be lost. When this happens in the +volatile DB it's not a huge deal: when the node starts back up and the chain +database is initialised we just run chain selection on whatever blocks still +remain; in typical cases we just end up with a slightly shorter chain. + +However, when this happens in the immutable database the impact may be larger. +In particular, if we delete blocks from the volatile database as soon as we add +them to the immutable database, then data loss in the immutable database would +result in a gap between the volatile database and the immutable database, making +\emph{all} blocks in the volatile database unusable. We can recover from this, but it +would result in a large rollback (in particular, one larger than $k$). + +To avoid this, we currently have a delay between adding blocks to the immutable +DB and removing them from the volatile DB (garbage collection). The delay is +configurable, but should be set in such a way that the possibility that the +block has not yet been written to disk at the time of garbage collection is +minimised;a a relatively short delay should suffice (currently we use a delay of +1 minute), though there are other reasons for preferring a longer delay: + +\begin{itemize} +\item Clock changes can more easily be accommodated with more overlap (\cref{{future:clockchanges}}) +\item The time delay also determines the worst-case validity of iterators +(todo\todo{TODO}: reference to relevant section). +\end{itemize} + +Larger delays will of course result in more overlap between the two databases. +During normal node operation this might not be much, but the overlap might be +more significant during bulk syncing. + +Notwithstanding the above discussion, an argument could be made that the +additional complexity due to the delay is not worth it; even a ``rollback'' of +more than $k$ is easily recovered from\footnote{Note that the node will never +actually notice such a rollback; the node would crash when discovering data +loss, and then restart with a smaller chain}, and clock changes as well, as +iterators asking for blocks that now live on distant chains, are not important +use cases. We could therefore decide to remove it altogether. diff --git a/ouroboros-consensus/docs/report/chapters/storage/chainselection.tex b/ouroboros-consensus/docs/report/chapters/storage/chainselection.tex new file mode 100644 index 00000000000..07656280089 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/chainselection.tex @@ -0,0 +1,628 @@ +\newcommand{\chainle}{\ensuremath{\mathrel{\sqsubseteq}}} +\newcommand{\chainlt}{\ensuremath{\mathrel{\sqsubset}}} +\newcommand{\chainnotlt}{\ensuremath{\mathrel{\nsqsubset}}} +\newcommand{\wehave}{.\;} +\newcommand{\suchthat}{.\;} +\newcommand{\app}{\ensuremath{\mathrel{\triangleright}}} +\newcommand{\length}[1]{\ensuremath{\mathrm{length} \; #1}} +\newcommand{\ifthen}[2]{\ensuremath{\mathrm{if} \quad #1 \quad \mathrm{then} \quad #2}} +\renewcommand{\iff}{\ensuremath{\qquad\mathrm{iff}\qquad}} +\newcommand{\candidates}[2]{\ensuremath{\mathsf{candidates}_#1(#2)}} +\newcommand{\blockNo}[1]{\ensuremath{\mathtt{blockNo}(#1)}} +\newcommand{\selectviewle}{\ensuremath{\precsim}} + +\chapter{Chain Selection} +\label{chainsel} + +Chain selection is one of the central responsibilities of the chain database +(\cref{chaindb}). It of course depends on chain selection as it is defined by +the consensus protocol (\cref{consensus:class:chainsel}), but needs to take +care of a lot of operational concerns. In this chapter we will take a closer +look at the implementation of chain selection in the chain database, and state +some properties and sketch some proofs to motivate it. + +\section{Comparing anchored fragments} +\label{chainsel:fragments} + +\subsection{Introduction} + +Recall from \cref{consensus:overview:chainsel} that while in the literature +chain selection is defined in terms of comparisons between entire chains, we +instead opted to model it in terms of a comparison between the \emph{headers} at +the tip of those chains (or rather, a \emph{view} on those headers defined by +the specific consensus protocol). + +We saw in \cref{storage:inmemory} (specifically, \cref{storage:fragments}) that +the consensus layer stores chain fragments in memory (the most recent headers on +a chain), both for the node's own current chain as well as for upstream nodes +(which we refer to as ``candidate chains''). Defining chain selection in terms +of fragments is straight-forward when those fragments are non-empty: we simply +take the most recent header, extract the view required by the consensus protocol +(\cref{BlockSupportsProtocol}), and then use the consensus protocol's chain +selection interface to compare them. The question is, however, how to compare +two fragments when one (or both) of them is \emph{empty}. This problem is more +subtle than it might seem at first sight, and requires careful consideration. + +We mentioned in \cref{consensus:overview:chainsel} that consensus imposes a +fundamental assumption that the strict extension of a chain is always (strictly) +preferred over that chain (\cref{prefer-extension}), and that consequently we +\emph{always} prefer a non-empty chain over an empty one (and conversely we +\emph{never} prefer an empty chain over a non-empty one). However, chain +fragments are mere proxies for their chains, and the fragment might be empty +even if the chain is not. This means that in principle it's possible we do not +prefer a non-empty fragment over an empty one, or indeed prefer an empty +fragment over a non-empty one. However, when a fragment is empty, we cannot rely +on the consensus protocol's chain selection because we have no header to give +it. + +Let's consider under which conditions these fragments might be empty: + +\begin{description} +\item[Our fragment] +Our own fragment is a path through the volatile database, anchored at the tip of +the immutable database (\cref{storage:fragments}). Under normal circumstances, +it will be empty only if our \emph{chain} is empty; we will refer to such empty +fragments as \emph{genuinely empty}.\footnote{We can distinguish between an empty +fragment of a non-empty chain and a (necessarily) empty fragment of an empty +chain by looking at the anchor point: if it is the genesis point, the chain must +be empty.} However, our fragment can also be empty even when our chain is not, +if due to data loss the volatile database is empty (or contains no blocks that +fit onto the tip of the immutable database). + +\item[Candidate fragment] +A \emph{genuinely} empty candidate fragment, representing an empty candidate +chain, is never preferred over our chain. Unfortunately, however, the candidate +fragment as maintained by the chain sync client (\cref{chainsyncclient}) can +essentially be empty at any point due to the way that a switch-to-fork is +implemented in terms of rollback followed by roll forward: after a maximum +rollback (and before the roll forward), the candidate fragment is empty. +\end{description} + +\subsection{Precondition} +\label{chainsel:fragments:precondition} + +Since neither of these circumstances can be avoided, we must therefore impose a +precondition for chain selection between chain fragments to be definable: + +\begin{definition}[Precondition for comparing chain fragments] +The two fragments must either both be non-empty, or they must intersect. +\end{definition} + +In this chapter, we establish this precondition in two different ways: + +\begin{enumerate} +\item When we construct candidates chains (potential chains that we may wish +to replace our own chain with), those candidate chains must intersect with +our own chain within $k$ blocks from its tip; after all, if that is not the +case, we would induce a roll back of more than $k$ blocks +(\cref{consensus:overview:k}). + +\item When we compare fragments to each other, we only compare fragments from a +set of fragments that are all anchored at the same point (i.e., the anchor of +all fragments in the set is the same, though it might be different from the +anchor of our current fragment). Since they are all anchored at the same point, +they trivially all intersect with each other. +\end{enumerate} + +There is one more use of fragment selection, which is rather more subtle; +we will come back to this in \cref{chainsyncclient:plausiblecandidates}. + +\todo{TODO} TODO: Throughout we are talking about \emph{anchored} fragments +here. We should make sure that we discuss those somewhere. + +\subsection{Definition} +\label{chainsel:fragments:definition} + +We will now show that this precondition suffices to compare two fragments, +whether or not they are empty; we'll consider each case in turn. + +\begin{description} + +\item[Both fragments empty] +Since the two fragments must intersect, that intersection point can only +be the two anchor points, which must therefore be equal. This means that +two fragments represent the same chain: neither fragment is preferred +over the other. + +\item[First fragment non-empty, second fragment empty] +Since the two fragments must intersect, that intersection can only be the +anchor of the second fragment, which can lie anywhere on the first fragment. + +\begin{itemize} +\item If it lies at the \emph{tip} of the first fragment, the two fragments represent the +same chain, and neither is preferred over the other. +\item If it lies \emph{before} the tip of first fragment, the first fragment is +a strict extension of the second, and is therefore is preferred over the +second. +\end{itemize} + +\item[First fragment empty, second fragment non-empty] +This case is entirely symmetric to the previous; if the intersection is the +tip of the second fragment, the fragments represent the same chain. Otherwise, +the second fragment is a strict extension of the first, and is therefore +preferred. + +\item[Both fragments non-empty] +In this case, we can simply use the consensus protocol chain selection API +to compare the two most recent headers on both fragments. + +\end{description} + +Note that this relies critically on the ``prefer extension'' rule +(\cref{prefer-extension}). + +\section{Preliminaries} +\label{chainsel:spec} + +Recall from \cref{storage:components} that the immutable database stores a +linear chain, terminating in the \emph{tip} $I$ of the immutable database. The +volatile database stores a (possibly fragmented) tree of extensions to that +chain: +% +\begin{center} +\begin{tikzpicture} +\draw (0,0) -- (100pt, 0) coordinate (immtip) node{$\bullet$} node[above left] {$I$}; +\draw (immtip) -- ++(40pt, 40pt); +\draw (immtip) -- ++(40pt, -40pt); +\draw [dotted] (immtip) -- ++(0, 40pt); +\draw [dotted] (immtip) -- ++(0, -50pt); +\node at (50pt, -40pt) [below] {$\underbrace{\hspace{100pt}}_\textrm{immutable}$}; +\node at (120pt, -40pt) [below] {$\underbrace{\hspace{40pt}}_\textrm{volatile}$}; +\end{tikzpicture} +\end{center} +% +The node's \emph{current chain} is stored in memory as a chain fragment through +the volatile database, anchored at $I$. When we start up the node, the chain +database must find the best possible path through the volatile database and +adopt that as our current fragment; every time a new block is added to the +volatile database, we have to recompute the new best possible path. In other +words, we maintain the following invariant: + +\begin{definition}[Current chain invariant] +\label{current-chain-invariant} +The current chain is the best possible path through the volatile DB. +\end{definition} + +``Best'' of course is according to the chain selection rule defined by the +consensus protocol (\cref{consensus:class:chainsel}). In this section we +describe how the chain database establishes and preserves this invariant. + +\subsection{Notation} + +So far we have been relatively informal in our description of chain selection, +but in order to precisely describe the algorithm and state some of its +properties, we have to introduce some notation. + +\begin{definition}[Chain selection] +We will model chain selection as a transitive binary relation (\chainlt) between +valid chains (it is undefined for invalid chains), and let $C \chainle C'$ if +and only if $C \chainlt C'$ or $C = C'$. It follows that (\chainle) is a partial +order (reflexive, antisymmetric, and transitive). +\end{definition} + +For example, the simple ``prefer longest chain'' chain selection rule could be +given as +% +\begin{equation*} +\tag{longest chain rule} +C \chainlt C' \iff \length{C} < \length{C'} +\end{equation*} + +In general of course the exact rule depends on the choice of consensus protocol. +\Cref{prefer-extension} (\cref{consensus:overview:chainsel}) can now be +rephrased as +% +\begin{equation} +\label{eq:prefer-extension} +\forall C, B \wehave C \chainlt (C \app B) +\end{equation} + +We will not be comparing whole chains, but rather chain fragments +(we will leave the anchor of fragments implicit): +% +\begin{definition}[Fragment selection] +We lift $\chainlt$ to chain fragments in the manner described in +\cref{chainsel:fragments}; this means that $\chainlt$ is undefined for two +fragments if they do not intersect (\cref{chainsel:fragments:precondition}). +\end{definition} + +We also lift $\chainle$ to \emph{sets} of fragments, intuitively indicating that +a particular fragment is the ``best choice'' out of a set $\mathcal{S}$ of +candidate fragments: +% +\begin{definition}[Optimal candidate] +\begin{equation*} +\mathcal{S} \chainle F \iff \nexists F' \in \mathcal{S} \suchthat F \chainlt F' +\end{equation*} +(in other words, if additionally $F \in \mathcal{S}$, then $F$ is a maximal +element of $C$). This inherits all the preconditions of $\chainle$ on chains and +fragments. +\end{definition} + +Finally, we will introduce some notation for \emph{computing} candidate +fragments:\footnote{In order to compute these candidates efficiently, the +volatile database must support a ``forward chain index'', able to efficiently +answer the question ``which blocks succeed this one?''.} + +\begin{definition}[Construct set of candidates] +Given some set of blocks $V$, and some anchor $A$ (with $A$ either a block or +the genesis point), $$\candidates{A}{V}$$ is the set of chain fragments +anchored at $A$ using blocks picked from $V$. +\end{definition} + +By construction all fragments in $\candidates{A}{V}$ have the same anchor, and +hence all intersect (at $A$); this will be important for the use of the +$\chainle$ operator. + +\subsection{Properties} + +\begin{lemma}[Properties of the set of candidates] +\label{candidates:properties} +The set of candidates computed by $\candidates{A}{V}$ has the following +properties. + +\begin{enumerate} + +\item \label{candidates:prefixclosed} +It is prefix closed: +\begin{equation*} +\forall F, B \wehave +\ifthen + {(F \app B) \in \candidates{A}{V}} + {F \in \candidates{A}{V}} +\end{equation*} + +\item \label{candidates:appendnew} +If we add a new block into the set, we can append that block to existing +candidates (where it fits): +\begin{equation*} +\ifthen + {F \in \candidates{A}{V}} + {F \app B \in \candidates{A}{V \cup \{ B \}}} +\end{equation*} +provided $F$ can be extended with $B$. + +\item \label{candidates:monotone} +Adding blocks doesn't remove any candidates: +\begin{equation*} +\candidates{A}{V} \subseteq \candidates{A}{V \cup \{B\}} +\end{equation*} + +\item \label{candidates:musthavenew} +If we add a new block, then any new candidates must involve that new block: +\begin{equation*} +\ifthen + {F \in \candidates{A}{V \cup \{B\}}} + {F \in \candidates{A}{V} \text{ or } F = (\ldots \app B \app \ldots)} +\end{equation*} + +\end{enumerate} +\end{lemma} + +The next lemma says that if we have previously found some optimal candidate $F$, +and subsequently learn of a new block $B$, it suffices to find a locally optimal +candidate \emph{amongst the candidates that involve $B$}; this new candidate +will also be a globally optimal candidate. + +\begin{lemma}[Focus on new block] +\label{focusonnewblock} +Suppose we have $F, F_\mathit{new}$ such that +\begin{enumerate} +\item \label{focusonnewblock:previouslyoptimal} +$\candidates{A}{V} \chainle F$ +\item \label{focusonnewblock:optimalamongstnew} +$(\candidates{A}{V \cup \{B\}} \setminus \candidates{A}{V}) \chainle F_\mathit{new}$ +\item \label{focusonnewblock:betterthanold} +$F \chainle F_\mathit{new}$ +\end{enumerate} +Then +\begin{equation*} +\candidates{A}{V \cup \{B\}} \chainle F_\mathit{new} +\end{equation*} +\end{lemma} + +\begin{proof} +Suppose there exists $F' \in \candidates{A}{V \cup \{B\}}$ such that +$F_\mathit{new} \chainlt F'$. By transitivity and +assumption~\ref{focusonnewblock:betterthanold}, $F \chainlt F'$. As +shown in \cref{candidates:properties} (\cref{candidates:musthavenew}), there are two possibilities: + +\begin{itemize} +\item $F' \in \candidates{A}{V}$, which would violate +assumption~\cref{focusonnewblock:previouslyoptimal}, or +\item $F'$ must contain block $B$, which would violate +assumption~\cref{focusonnewblock:optimalamongstnew}. +\end{itemize} +\end{proof} + +\section{Initialisation} +\label{chainsel:init} + +The initialisation of the chain database proceeds as follows. + +\begin{enumerate} + +\item +\label{chaindb:init:imm} +Initialise the immutable database, determine its tip $I$, and ask the +ledger DB for the corresponding ledger state $L$. + +\item Compute the set of candidates anchored at the immutable database's tip +\label{chaindb:init:compute} +$I$ using blocks from the volatile database $V$ +$$\candidates{I}{V}$$ +ignoring known-to-be-invalid blocks (if any; see \cref{chaindb:invalidblocks}) +and order them using $(\chainlt)$ so that we process better candidates +first.\footnote{Technically speaking we should \emph{first} validate all +candidates, and only then apply selection to the valid chains. We perform chain +selection first, because that is much cheaper. Both approaches are semantically +equivalent, since \lstinline!sortBy f . filter p = filter p . sortBy f! due to +the stability of \lstinline!sortBy!.} Candidates that are strict prefixes of +other candidates can be ignored (as justified by the ``prefer extension'' +assumption, \cref{prefer-extension}).\footnote{The implementation does not +compute candidates, but rather ``maximal'' candidates, which do not include such +prefixes.} + +\item +\label{chaindb:init:select} +Not all of these candidates may be valid, because the volatile database stores +blocks whose \emph{headers} have been validated, but whose \emph{bodies} are +still unverified (other than to check that they correspond to their headers). +We therefore validate each candidate chain fragment, starting with $L$ (the +ledger state at the tip of the immutable database) each time.\footnote{We make +no attempt to share ledger states between candidates, even if they share a +common prefix, trading runtime performance for lower memory pressure.} + +As soon as we find a candidate that is valid, we adopt it as our current chain. +If we find a candidate that is \emph{invalid}, we mark the invalid +block\footnote{There is no need to mark any successors of invalid blocks; see +\cref{chaindb:dont-mark-invalid-successors}.} (unless it is invalid due to +potential clock skew, see \cref{chainsel:infuture}), and go back to +step~\ref{chaindb:init:compute}. It is important to recompute the set of +candidates after marking some blocks as invalid because those blocks may also +exist in other candidates and we do not know how the valid prefixes of those +candidates should now be ordered. + +\end{enumerate} + +\section{Adding new blocks} +\label{chainsel:addblock} + +When a new block $B$ is added to the chain database, we need to add it to the +volatile DB and recompute our current chain. We distinguish between the +following different cases. + +Before we process the new block, we first run chain selection on any blocks that +had previously been temporarily shelved because their slot number was (just) +ahead of the wallclock (\cref{chainsel:infuture}). We do this independent of what +we do with the new block.\footnote{In a way, calls to \lstinline!addBlock! are +how the chain database sees time advance. It does not rely on slot length to do +so, because slot length is ledger state dependent.} + +The implementation \lstinline!addBlock! additionally provides client code with +various notifications throughout the process (``block added'', ``chain +selection run'', etc.). We will not describe these notifications here. + +\subsection{Ignore} + +We can just ignore the block if any of the following is true. + +\begin{itemize} + +\item +The block is already in the immutable DB, \emph{or} it belongs to a branch which +forks more than $k$ blocks away from our tip, i.e.\footnote{The check is a +little more complicated in the presence of EBBs (\cref{ebbs}). This is relevant +if we switch to an alternative fork after a maximum rollback, and that +alternative fork starts with an EBB. It is also relevant when due to data +corruption the volatile database is empty and the first block we add when we +continue to sync the chain happens to be an EBB.} +% +\begin{equation*} +\blockNo{B} \leq \blockNo{I} +\end{equation*} +% +We could distinguish between between the block being on our chain or on a +distant fork by doing a single query on the immutable database, but it does not +matter: either way we do not care about this block. + +We don't expect the chain sync client to feed us such blocks under normal +circumstances, though it's not impossible: by the time a block is downloaded +it's conceivable, albeit unlikely, that that block is now older than $k$. + +\item +The block was already in the volatile database, i.e. +% +\begin{equation*} +B \in V +\end{equation*} + +\item +The block is known to be invalid (\cref{chaindb:invalidblocks}). + +\end{itemize} + +\subsection{Add to current chain} +\label{chainsel:addtochain} + +If $B$ fits onto the end of our current fragment $F$ (and hence onto our current chain) $F$, i.e. +% +\begin{itemize} +\item $F$ is empty, and $B_\mathit{pred} = I$ +(where $I$ must necessarily also be the anchor of the fragment), or +\item $\exists F' \suchthat F = F' \app B_\mathit{pred}$ +\end{itemize} +% +then any new candidates must be equal to or an extension of $F \app B$ +(\cref{candidates:properties}, \cref{candidates:musthavenew}); this set is +computed by +% +\begin{equation*} +(F \app B \app \candidates{B}{V \cup \{B\}}) +\end{equation*} +% +Since all candidates would be strictly preferred over $F$ (since they are +extensions of $F$), by \cref{focusonnewblock} it suffices to pick the best +candidate amongst these extensions. Apart from the starting point, chain +selection then proceeds in the same way as when we are initialising the database +(\cref{chainsel:init}). + +This case takes care of the common case where we just add a block to our chain, +as well as the case where we stay with the same branch but receive some blocks +out of order. Moreover, we can use the \emph{current} ledger state as the +starting point for validation. + +\subsection{Store, but don't change current chain} + +When we are missing one of the (transitive) predecessors of the block, we store +the block but do nothing else. We can check this by following back pointers +until we reach a block $B'$ such that $B' \notin V$ and $B' \neq I$. The cost of +this is bounded by the length of the longest fragment in the volatile DB, and +will typically be low; moreover, the chain fragment we are constructing this way +will be used in the switch-to-fork case +(\cref{chainsel:switchtofork}).\footnote{The function that constructs these +fragments is called \lstinline!isReachable!.} + +At this point we \emph{could} do a single query on the immutable DB to check if +$B'$ is in the immutable DB or not. If it is, then this block is on a distant +branch that we will never switch to, and so we can ignore it. If it is not, we +may or may not need this block later and we must store it; if it turns out we +will never need it, it will eventually be garbage collected (\cref{chaindb:gc}). + +An alternative and easier approach is to omit the check on the immutable DB, +simply assuming we might need the block, and rely on garbage collection to +eventually remove it if we don't. This is the approach we currently use. + +\subsection{Switch to a fork} +\label{chainsel:switchtofork} + +If none of the cases above apply, we have a block $B$ such that + +\begin{enumerate} +\item \label{chainsel:switchtofork:notinvoldb} +$B \notin V$ +\item \label{chainsel:switchtofork:notinimmdb} +$\blockNo{B} > \blockNo{I}$ (and hence $B$ cannot be in the immutable DB) +\item \label{chainsel:switchtofork:connected} +For all transitive predecessors $B'$ of $B$ we have $B' \in V$ or $B' = I$. +In other words, we must have a fragment +$$F_\mathit{prefix} = I \app \ldots \app B$$ +in $\candidates{I}{V \cup \{B\}}$. +\item \label{chainsel:switchtofork:doesnotfit} +(Either $F$ is empty and $B_\mathit{pred} \neq I$, or) $\exists F', B' \suchthat +F = F' \app B'$ where $B' \neq B_\mathit{pred}$; i.e., block does not fit onto +current chain.\footnote{\Cref{chainsel:switchtofork:connected} rules out the +first option: if $B_\mathit{pred} \neq I$ then we must have $B_\mathit{pred} \in +V$ and moreover this must form some kind of chain back to $I$; this means that +the preferred candidate cannot be empty.} +\end{enumerate} + +We proceed in similar fashion to the case when the block fit onto the tip of our +chain (\cref{chainsel:addtochain}). The new candidates in $\candidates{I}{V \cup +\{B\}}$ must involve $B$ (\cref{candidates:properties}, +\cref{candidates:musthavenew}), which in this case means they must all be +extensions of $F_\mathit{prefix}$; we can compute these candidates +using\footnote{The implementation of the chain database actually does not +construct fragments that go back to $I$, but rather to the intersection point +with the current chain. This can be considered to be an optimisation of what we +describe here.} +$$I \app \ldots \app B \app \candidates{B}{V \cup \{B\}}$$ +Not all of these fragments might be preferred over the current chain; we filter +those out.\footnote{Recall that the current chain gets special treatment: when +two candidates are equally preferable, we can pick either one, but when a +candidate and the current chain are equally preferable, we must stick with the +current chain.} We then proceed as usual, considering each of the remaining +fragments in $(\chainle)$ order, and appeal to \cref{focusonnewblock} +again to conclude that the fragment we find in this way will be an optimal +candidate across the entire volatile database. + + +% +% * +% +% * +% +% +% It is worth pointing out that we do _not_ rely on `F_prefix` being longer than +% the current chain. Indeed, it may not be: if two leaders are selected for the +% same slot, and we _receive_ a block for the current slot before we can _produce_ +% one, our current chain will contain the block from the other leader; when we +% then produce our own block, we end up in the switch-to-fork case; here it is +% important that `preferCandidate` would prefer a candidate chain (the chain that +% contains our own block) over our current chain, even if they are of the same +% length, if the candidate ends in a block that we produced (and the current chain +% does not); however, the `ChainDB` itself does not need to worry about this +% special case. +% +% [let's be explicit about the difference between current chain and self +% produced blocks] +% + +\section{In-future check} +\label{chainsel:infuture} + +As we saw in \cref{chainsel:spec}, the chain DB performs full +block validation during chain selection. When we have validated a block, we then +do one additional check, and verify that the block's slot number is not ahead of +the wallclock time (for a detailed discussion of why we require the block's +ledger state for this, see \cref{time}, especially +\cref{time:block-infuture-check}). If the block is far ahead of the wallclock, +we treat this as any other validation error and mark the block as invalid. + +Marking a block as invalid will cause the network layer to disconnect from the +peer that provided the block to us, since non-malicious (and non-faulty) peers +should never send invalid blocks to us. It is however possible that an upstream +peer's clock is not perfectly aligned with us, and so they might produce a block +which \emph{we} think is ahead of the wallclock but \emph{they} do not. To avoid +regarding such peers as malicious, the chain database supports a configurable +\emph{permissible clock skew}: blocks that are ahead of the wallclock by an +amount less than this permissible clock skew are not marked as invalid, but +neither will chain selection adopt them; instead, they simply remain in the +volatile database available for the next chain selection. + +It is constructive to consider what happens if \emph{our} clock is off, in +particular, when it is slow. In this scenario \emph{every} (or almost every) +block that the node receives will be considered to be in the future. Suppose we +receive two consecutive blocks $A$ and $B$. When we receive $A$, chain selection +runs, we find that $A$ is ahead of the clock but within the permissible clock +skew, and we don't adopt it. When we then receive $B$, chain selection runs +again, we now discover the $A, B$ extension to our current chain; during +validation we cut off this chain at $B$ because it is ahead of the clock, but we +adopt $A$ because it is now valid. In other words, we are always behind one +block, adopting each block only when we receive the \emph{next} block. + +\begin{bug} +One problem with this scheme is that if we receive a block $B$ which is ahead of +the clock when we receive it, we might never notice it if block $B$ is not (yet) +connected to our chain: by the time we receive the missing blocks (which connect +$B$ to our chain), $B$ might no longer be ahead of the clock and we might adopt +it, even if $B$ was ahead by more than the permissible clock skew. + +We could avoid this problem if we stored the time we received a block alongside +the block in the volatile database, but in the current design, the volatile +database does not store \emph{any} information on disk apart from raw blocks, +so this would be quite a significant design change. +\end{bug} + +\section{Sorting} + +In this chapter we have modelled chain selection as a partial order +$(\chainle)$. This suffices for the formal treatment, and in theory also +suffices for the implementation. However, at various points during the chain +selection process we need to \emph{sort} candidates in order of preference. We +can of course sort values based on a preorder only (topological sorting), but we +can do slightly better. Recall from \cref{consensus:class:chainsel} that we +require that the \lstinline!SelectView! on headers must be a total order. We can +therefore define + +\begin{definition}[Same select view] +Let $C \selectviewle C'$ if the select view at the tip of $C$ is less than +or equal to the select view at the tip of $C'$. +\end{definition} + +(\selectviewle) forms a total preorder (though not a partial order); if $C +\selectviewle C'$ \emph{and} $C' \selectviewle C$ then the select views at the +tips of $C$ and $C'$ are equal (though they might be different chains, of +course). Since $C \selectviewle C'$ implies $C' \chainnotlt C$, we can use this +preorder to sort candidates (in order words, we will sort them \emph{on} their +select view, in Haskell-parlance). diff --git a/ouroboros-consensus/docs/report/chapters/storage/immutabledb.tex b/ouroboros-consensus/docs/report/chapters/storage/immutabledb.tex new file mode 100644 index 00000000000..54e0174c45b --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/immutabledb.tex @@ -0,0 +1,2 @@ +\chapter{Immutable Database} +\label{immutable} diff --git a/ouroboros-consensus/docs/report/chapters/storage/ledgerdb.tex b/ouroboros-consensus/docs/report/chapters/storage/ledgerdb.tex new file mode 100644 index 00000000000..5841c8df397 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/ledgerdb.tex @@ -0,0 +1,8 @@ +\chapter{Ledger Database} +\label{ledgerdb} + +\section{Initialisation} +\label{ledgerdb:initialisation} + +Describe why it is important that we store a single snapshot and then replay +ledger events to construct the ledger DB. diff --git a/ouroboros-consensus/docs/report/chapters/storage/mempool.tex b/ouroboros-consensus/docs/report/chapters/storage/mempool.tex new file mode 100644 index 00000000000..5feadb7221e --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/mempool.tex @@ -0,0 +1,7 @@ +\chapter{Mempool} +\label{mempool} + +\section{Consistency} +\label{mempool:consistency} + +Discuss that we insist on \emph{linear consistency}, and why. diff --git a/ouroboros-consensus/docs/report/chapters/storage/overview.tex b/ouroboros-consensus/docs/report/chapters/storage/overview.tex new file mode 100644 index 00000000000..39fd40e3eb7 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/overview.tex @@ -0,0 +1,40 @@ +\chapter{Overview} +\label{storage} + +\section{Components} +\label{storage:components} + +\begin{tikzpicture} +\draw [dotted] + (-50pt, -65pt) + -- ++(0, 90pt) node[above right] {chain database} + -- ++(150pt, 0) + -- ++(0, -90pt) + -- cycle; +\node [draw, shape=rectangle, minimum width=80pt, minimum height=30pt] at (0,0) {immutable}; +\node [draw, shape=rectangle, minimum width=50pt, minimum height=30pt] at (65pt, 0) {volatile}; +\node [draw, shape=rectangle, minimum width=50pt, minimum height=30pt] at (65pt, - 40pt) {ledger}; +\end{tikzpicture} + +Discuss the immutable/volatile split (we reference this section for that). + +\section{In memory} +\label{storage:inmemory} + +TODO: After we discussed the overview, we should give an overview of everything +we store in memory in any component, so that we have a better understanding of +memory usage of the chain DB as a whole. + +\subsection{Chain fragments} +\label{storage:fragments} + +\subsection{Extended ledger state} +\label{storage:extledgerstate} +\label{storage:headerstate} + +TODO: Is there a more natural place to talk about this? Introducing the +header state when introducing the storage layer does not feel quite right. +The storage layer might be storing the header state, but that doesn't +explain its existence. + +ChainDepState, (ChainIndepState), LedgerState, ExtLedgerState diff --git a/ouroboros-consensus/docs/report/chapters/storage/volatiledb.tex b/ouroboros-consensus/docs/report/chapters/storage/volatiledb.tex new file mode 100644 index 00000000000..46aa1786c53 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/storage/volatiledb.tex @@ -0,0 +1,2 @@ +\chapter{Volatile Database} +\label{volatile} diff --git a/ouroboros-consensus/docs/report/chapters/testing/consensus.tex b/ouroboros-consensus/docs/report/chapters/testing/consensus.tex new file mode 100644 index 00000000000..3a05a916878 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/testing/consensus.tex @@ -0,0 +1,7 @@ +\chapter{Reaching consensus} +\label{testing:consensus} + +\section{Dire-but-not-to-dire} +\label{testing:dire} + +We should mention the PBFT threshold here \cref{bft-paper}. diff --git a/ouroboros-consensus/docs/report/chapters/testing/storage.tex b/ouroboros-consensus/docs/report/chapters/testing/storage.tex new file mode 100644 index 00000000000..bcda786eb26 --- /dev/null +++ b/ouroboros-consensus/docs/report/chapters/testing/storage.tex @@ -0,0 +1,2 @@ +\chapter{The storage layer} +\label{testing:storage} diff --git a/ouroboros-consensus/docs/report/cleanbuild.sh b/ouroboros-consensus/docs/report/cleanbuild.sh new file mode 100755 index 00000000000..a2053fd3afe --- /dev/null +++ b/ouroboros-consensus/docs/report/cleanbuild.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +SOURCES=`find . -name '*.tex'` +MAIN=report.tex + +rm -f *.aux *.log *.out *.pdf *.toc *.bbl *.blg *.nav *.snm + +pdflatex -halt-on-error $MAIN >/dev/null +bibtex report +pdflatex -halt-on-error $MAIN >/dev/null +pdflatex -halt-on-error $MAIN >pdflatex.log diff --git a/ouroboros-consensus/docs/report/genesis.tex b/ouroboros-consensus/docs/report/genesis.tex new file mode 100644 index 00000000000..476aa2e0fd9 --- /dev/null +++ b/ouroboros-consensus/docs/report/genesis.tex @@ -0,0 +1,627 @@ +\documentclass[usenames,dvipsnames,t]{beamer} +\usetheme{Rochester} + +\usepackage{tikz} +\usepackage[utf8]{inputenc} +\usepackage{setspace} +\usepackage{bbding} +\usepackage{microtype} + +%Information to be included in the title page: +\title{Implementing the Genesis Chain Selection Rule} +\author{Edsko de Vries} +\institute{Well-Typed} +\date{November 2020} + +\begin{document} + +\frame{\titlepage} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} +\frametitle{The Genesis Rule} + +\begin{alertblock}{Genesis chain selection rule} +A candidate chain is preferred over our current chain if + +\begin{itemize} +\item The intersection between the candidate chain and our chain is \textbf{no +more than $k$} blocks back, and the candidate chain is strictly \textbf{longer} +than our chain. + +\item If the intersection \emph{is} \textbf{more than $k$} blocks back, and the +candidate chain is \textbf{denser} (contains more blocks) than our chain in +a region of $s$ slots starting at the intersection. +\end{itemize} +\end{alertblock} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} +\frametitle{The Genesis Rule} + +\begin{alertblock}{Alternative genesis rule} +A candidate chain is preferred over our current chain if + +\begin{itemize} +\item The intersection between the candidate chain and our chain is +\textbf{at least $s$ slots} back, and the candidate chain is denser in a window +of $s$ slots at the intersection, or + +\item The intersection between the candidate chain and our chain is \textbf{no +more than $k$ blocks} back, and the candidate chain is strictly \textbf{longer} +than our chain. +\end{itemize} + +\end{alertblock} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} +\frametitle{Fundamental Assumptions within the Consensus Layer} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{onlyenv}<1> + +\begin{alertblock}{Invariant} +We never roll back more than $k$ blocks. +\end{alertblock} + +This invariant is used to + +\begin{itemize} +\item \textbf{Organise on-disk and in-memory block and ledger storage}: blocks older +than $k$ are stored in the \emph{immutable} database, the remainder in the +\emph{volatile} database. +\item \textbf{Guarantee efficient block validation}: we have access to the $k$ +most recent ledger states +\item \textbf{Bound memory usage for tracking peers}: we need to track at most $k + 1$ +blocks per upstream peer to be able to decide if we prefer their chain over +ours (apply the longest chain rule) +\item \dots +\end{itemize} + +\end{onlyenv} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{onlyenv}<2> +\begin{alertblock}{Invariant} +We never switch to a shorter chain. +\end{alertblock} + +\pause + +Without this invariant, the previous invariant (never roll back +more than $k$ blocks) is not very useful. + +\begin{itemize} +\item If we \emph{could} switch to a shorter chain but continue to support a +rollback of $k$, the \emph{effective} maximum rollback is infinite. +\item We would need efficient access to \emph{all} past ledger states. +\item We would have to move blocks \emph{back} from the immutable database to the volatile database. +\item \dots +\end{itemize} +\end{onlyenv} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{onlyenv}<3> + +\begin{alertblock}{Invariant} +The strict extension of a chain is always preferred over that chain. +\end{alertblock} + +\begin{itemize} +\item Used to make some local chain selection decisions. +\item (I \emph{think} this one is compatible with Genesis.) +\end{itemize} + +\end{onlyenv} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame}{Towards an Alternative} + +\begin{center} +\begin{tikzpicture} +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) + (-3,0) -- (tip); +\onslide<2->{\draw (tip) -- ++(1.0, 1.0) coordinate (ab);} +\onslide<3->{\draw (tip) -- ++(1.5, -0.5) coordinate (cd);} +\onslide<4->{\draw (ab) -- ++(0.5, 0.5) -- ++(2.0, 0);} +\onslide<5->{\draw (ab) -- ++(0.5, -0.5) -- ++(2.0, 0);} +\onslide<6->{\draw (cd) -- ++(0.5, 0.5) -- ++(1.5, 0);} +\onslide<7->{\draw (cd) -- ++(0.5, -0.5) -- ++(1.5, 0);} +\onslide<8->{\draw [dashed] (tip) -- ++(0, 1.75) -- ++(3, 0) -- ++(0, -3) -- ++(-3, 0) -- cycle;} +\end{tikzpicture} +\end{center} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} +\frametitle{Towards an Alternative} + +\begin{onlyenv}<1> + +\begin{alertblock}{Key Idea: Delay the decision} +Rather than adopting chain $A$ as soon as we see it, +and later switch to chain $B$ (possibly incurring a large rollback), \emph{wait}: +don't adopt \emph{either} $A$ \emph{or} $B$ until we know which one we want. +\end{alertblock} + +Assumptions: + +\begin{itemize} +\item We can guarantee that we see (a representative sample of) all chains +in the network. An attacker \textbf{can't eclipse} us. +\item We can \textbf{detect when} we should delay because the genesis condition +might apply. \\ +(We will come back to this.) \\ +\end{itemize} + +\end{onlyenv} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle<1-4>{Choosing between forks: at genesis} +\frametitle<5>{Choosing between forks: general case} + +\begin{center} +\begin{tikzpicture} +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) -- ++(1.0, 1.0) coordinate (ab) node{$\bullet$} node[above left]{$ab$}; +\draw [dotted] (tip) -- ++(1.5, -0.5) coordinate (cd); +\draw (ab) -- ++(0.5, 0.5) -- ++(2.0, 0) coordinate(A) node[right]{$A$}; +\draw (ab) -- ++(0.5, -0.5) -- ++(2.0, 0) node[right]{$B$}; +\draw [dotted] (cd) -- ++(0.5, 0.5) -- ++(1.5, 0) node[right]{$C$}; +\draw [dotted] (cd) -- ++(0.5, -0.5) -- ++(1.5, 0) node[right]{$D$}; +\draw [dashed] + (tip) + -- ++(0, 1.75) + -- ++(3, 0) + -- ++(0, -3) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_{\text{$s$ slots}}$} + -- cycle; +\path (tip) -- (A) node[pos=0.5, above=1cm]{$\overbrace{\hspace{3.5cm}}^{\text{$> k$ blocks}}$}; + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\onslide<1-2>{\draw (tip) -- ++(1.5, -0.5) node{$\bullet$};} +\onslide<1-2>{\draw (cd) -- ++(0.5, 0.5) -- ++(1.5, 0);} +\onslide<1-2>{\draw (cd) node[below left]{$cd$} -- ++(0.5, -0.5) -- ++(1.5, 0);} +\onslide<2-5>{\draw [red, very thick] (tip) -- (ab) -- ++(0.5, 0.5) -- ++(1.5, 0) ;} +\onslide<5>{\draw (tip) + (-3,0) node{$\bullet$} -- (tip);} +\end{tikzpicture} +\end{center} + +\vspace{-1em} + +\onslide<4-5>{ +\begin{alertblock}{Committing} +By assumption, we have seen all relevant chains. We will \emph{never} be +interested in $C$ or $D$, so we can disconnect from them. +\end{alertblock} +} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle<1-4>{Common prefix: at genesis} +\frametitle<5>{Common prefix: general case} + +\begin{center} +\begin{tikzpicture} +\path (0, 0) coordinate (tip) node{$\bullet$}; +\draw (tip) -- ++(1.0, 0.0) coordinate (branch) node{$\bullet$}; +\draw (branch) -- ++(1.0, 0.9) -- ++ (1.5, 0) node[right]{A}; +\draw (branch) -- ++(1.0, 0.3) -- ++ (1.5, 0) node[right]{B}; +\draw (branch) -- ++(1.0, -0.3) -- ++ (1.5, 0) node[right]{C}; +\draw (branch) -- ++(1.0, -0.9) -- ++ (1.5, 0) node[right]{D}; + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\onslide<1-2>{\node [left] at (tip) {tip};} +\onslide<2>{\draw [red, very thick] (tip) -- ++(1.0, 0.0);} +\onslide<3-5>{\node [above left] at (branch) {tip};} +\onslide<1-2>{\draw [dashed] + (tip) + -- ++(0, 1.5) + -- ++(3, 0) + -- ++(0, -3) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle;} +\onslide<3-5>{\draw [dashed] + (branch) + -- ++(0, 1.5) + -- ++(2, 0) + -- ++(0, -3) + -- ++(-2, 0) node[pos=0.25, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle;} +\onslide<5>{\draw (tip) + (-3,0) node{$\bullet$} -- (tip);} + +\end{tikzpicture} +\end{center} + +\onslide<4-5>{ +\begin{alertblock}{Committing} +By assumption, we have seen all relevant chains. They all share a common prefix. +We can \emph{commit to} the blocks on that common prefix: they will never +be rolled back. +\end{alertblock} +} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Insufficient peers} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) -- (6,0) node[left] {\color{red} \XSolidBrush}; +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) -- ++(1.0, 1.0) coordinate (ab) node{$\bullet$} node[above left]{$ab$}; +\draw (tip) -- ++(1.5, -0.5) coordinate (cd); +\draw (ab) -- ++(0.5, 0.5) -- ++(2.0, 0) node[right]{$A$}; +\draw (ab) -- ++(0.5, -0.5) -- ++(2.0, 0) node[right]{$B$}; +\draw (cd) -- ++(0.5, 0.5) -- ++(1.5, 0) node[right]{$C$}; +\path (cd) -- ++(0.5, -0.5) -- ++(1.5, 0) coordinate (D); +\path (tip) -- (A) node[pos=0.5, above=0.6cm]{$\overbrace{\hspace{3.5cm}}^{\text{$> k$ blocks}}$}; + +\draw [dashed] + (tip) + -- ++(0, 1.75) + -- ++(3, 0) + -- ++(0, -3) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle; +\draw [dotted] (D) -- ++(-2.5,0) -- ++(-0.25,0.25); + +\draw [red, very thick] (tip) -- (ab) -- ++(0.5, 0.5) -- ++(1.5, 0) ; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); +\end{tikzpicture} +\end{center} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) -- (6,0) node[left] {\color{red} \XSolidBrush}; +\path (0, 0) coordinate (tip) node{$\bullet$}; +\draw (tip) -- ++(1.0, 0.0) coordinate (branch) node{$\bullet$}; +\draw (branch) -- ++(1.0, 0.9) -- ++ (1.5, 0) node[right]{A}; +\draw (branch) -- ++(1.0, 0.3) -- ++ (1.5, 0) node[right]{B}; +\draw (branch) -- ++(1.0, -0.3) -- ++ (1.5, 0) node[right]{C}; +\path (branch) -- ++(1.0, -0.9) -- ++ (1.5, 0) coordinate(D); + +\draw [dotted] (D) -- ++(-2.5,0) -- ++(-0.25,0.25); +\node [below left] at (tip) {tip}; +\draw [red, very thick] (tip) -- ++(1.0, 0.0); +\draw [dashed] + (tip) + -- ++(0, 1.25) + -- ++(3, 0) + -- ++(0, -2.5) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); + +\end{tikzpicture} +\end{center} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Insufficient blocks} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) -- (6,0) node[left] {\color{red} \XSolidBrush}; +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) -- ++(1.0, 1.0) coordinate (ab) node{$\bullet$} node[above left]{$ab$}; +\draw (tip) -- ++(1.5, -0.5) coordinate (cd); +\draw (ab) -- ++(0.5, 0.5) -- ++(2.0, 0) node[right]{$A$}; +\draw (ab) -- ++(0.5, -0.5) -- ++(2.0, 0) node[right]{$B$}; +\draw (cd) -- ++(0.5, 0.5) -- ++(1.5, 0) node[right]{$C$}; +\draw (cd) -- ++(0.5, -0.5) node{$\bullet$} node[right]{$D$}; +\path (tip) -- (A) node[pos=0.5, above=0.6cm]{$\overbrace{\hspace{3.5cm}}^{\text{$> k$ blocks}}$}; + +\draw [dashed] + (tip) + -- ++(0, 1.75) + -- ++(3, 0) + -- ++(0, -3.25) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle; + +\draw [red, very thick] (tip) -- (ab) -- ++(0.5, 0.5) -- ++(1.5, 0) ; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); +\end{tikzpicture} +\end{center} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) -- (6,0) node[left] {\color{ForestGreen} \CheckmarkBold}; +\path (0, 0) coordinate (tip) node{$\bullet$}; +\draw (tip) -- ++(1.0, 0.0) coordinate (branch) node{$\bullet$}; +\draw (branch) -- ++(1.0, 0.9) -- ++ (1.5, 0) node[right]{A}; +\draw (branch) -- ++(1.0, 0.3) -- ++ (1.5, 0) node[right]{B}; +\draw (branch) -- ++(1.0, -0.3) -- ++ (1.5, 0) node[right]{C}; +\draw (branch) -- ++(1.0, -0.9) node{$\bullet$} node[right]{D}; + +\node [below left] at (tip) {tip}; +\draw [red, very thick] (tip) -- ++(1.0, 0.0); +\draw [dashed] + (tip) + -- ++(0, 1.25) + -- ++(3, 0) + -- ++(0, -2.5) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); + +\end{tikzpicture} +\end{center} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Threshold for sufficient blocks} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\draw (0,0) -- (2,0) node{$\bullet$} coordinate (tip); +\draw [dotted] (tip) -- ++(1,1) -- ++ (0.5,0) node[right]{$\ldots$}; +\draw [dotted] (tip) -- ++(1,0.5) -- ++ (0.5,0) node[right]{$\ldots$}; +\draw (tip) -- ++(1,-0.25) node {$\bullet$} coordinate (a); +\onslide<2-6>{\draw (a) -- ++(0.5, 0) node {$\bullet$} coordinate (b);} +\onslide<3-6>{\draw (b) -- ++(1.0, 0) node {$\bullet$} coordinate (c);} +\onslide<4-6>{\draw (c) -- ++(0.25, 0) node {$\bullet$} coordinate (d);} +\onslide<5-6>{\draw (d) -- ++(1, 0) node {$\bullet$} coordinate (e);} +\draw [dashed] + (tip) + -- ++(0, 1.5) + -- ++(3, 0) + -- ++(0, -2.25) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle; +\end{tikzpicture} +\end{center} + +\onslide<6>{ +\begin{alertblock}{Dealing with nodes that don't provide sufficient blocks} +As part of the protocol, nodes report their tip. Was optimisation, now becomes essential: + +\begin{itemize} +\item Disconnect from nodes that report a chain shorter than $s$. +\item Timeout from nodes that report a chain longer than $s$ but don't send +us blocks (DoS attack). +\end{itemize} +\end{alertblock} +} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Detecting when to delay} + +\begin{itemize} +\item \textbf{\alert{Cannot} apply density rule when we are closer than $s$ slots from the +wallclock slot.} +\\ (We would be unable to fill the window, by definition.) +\item \textbf{\alert{Don't need} to apply density rule when within $k = 2160$ blocks from +the wallclock slot.} [Handwavy] \\ +(Too?) liberal rephrasing of Theorem 2 of the genesis paper. +\item \textbf{Always sound to delay chain selection} \\ +(Unless \emph{really} near tip and we might have to forge a block) \\ +\item Paper suggests $s = \frac{1}{4} (k/f)$ = 10,800 slots. \\ +(I.e. $s \times f = \frac{1}{4}k = 540$ blocks on average.) +\end{itemize} + +\begin{alertblock}{} +\textbf{Delay if more than $s$ slots from the wallclock.} \\ +(If wallclock slot unknown, must be more than $(3k/f) > s$ slots.) +\end{alertblock} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Generalising delay mode} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) -- ++(1.0, 1.0) coordinate (ab) node{$\bullet$} node[above left]{$ab$}; +\draw [dotted] (tip) -- ++(1.5, -0.5) coordinate (cd); +\draw (ab) -- ++(0.5, 0.5) -- ++(2.0, 0) coordinate(A) node[right]{$A$}; +\draw (ab) -- ++(0.5, -0.5) -- ++(2.0, 0) node[right]{$B$}; +\draw [dotted] (cd) -- ++(0.5, 0.5) -- ++(1.5, 0) node[right]{$C$}; +\draw [dotted] (cd) -- ++(0.5, -0.5) -- ++(1.5, 0) node[right]{$D$}; +\draw [dashed] + (tip) + -- ++(0, 1.75) + -- ++(3, 0) + -- ++(0, -3) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_{\text{$s$ slots}}$} + -- cycle; +\path (tip) -- (A) node[pos=0.5, above=0.7cm]{$\overbrace{\hspace{3.5cm}}^{\text{may be fewer than $k$ blocks}}$}; + +\draw (tip) -- ++(1.5, -0.5) node{$\bullet$}; +\draw (cd) -- ++(0.5, 0.5) -- ++(1.5, 0); +\draw (cd) node[below left]{$cd$} -- ++(0.5, -0.5) -- ++(1.5, 0); +\draw [red, very thick] (tip) -- (ab) -- ++(0.5, 0.5) -- ++(1.5, 0) ; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); +\end{tikzpicture} +\end{center} + +\vspace{-1em} + +\begin{itemize} +\item \textbf{Cannot reliably detect} whether we have more than $k$ blocks \\ +(node reports tip but we cannot verify) +\item \textbf{Can still apply genesis condition}, independent of \# blocks \\ +(justified by alternative genesis rule) +\end{itemize} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Header/Body split: choosing between forks} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) -- ++(1.0, 1.0) coordinate (ab) node{$\bullet$} node[above left]{$ab$}; +\draw [dotted] (tip) -- ++(1.5, -0.5) coordinate (cd); +\draw (ab) -- ++(0.5, 0.5) -- ++(2.0, 0) coordinate(A) node[right]{$A$}; +\draw (ab) -- ++(0.5, -0.5) -- ++(2.0, 0) node[right]{$B$}; +\draw [dotted] (cd) -- ++(0.5, 0.5) -- ++(1.5, 0) node[right]{$C$}; +\draw [dotted] (cd) -- ++(0.5, -0.5) -- ++(1.5, 0) node[right]{$D$}; +\draw [dashed] + (tip) + -- ++(0, 1.75) + -- ++(3, 0) + -- ++(0, -3) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_{\text{$s$ slots}}$} + -- cycle; + +\draw [red, very thick] (tip) -- (ab) -- ++(0.5, 0.5) -- ++(1.5, 0) ; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); +\end{tikzpicture} +\end{center} + +\vspace{-1em} + +\begin{itemize} +\item What if we find an invalid block on $A$ after discarding $C$, $D$? +\item Header validation justifies deciding before block validation. +\begin{minipage}{0.9\textwidth} +\tiny\linespread{0.5} Christian: ``Intuitively, right after the forking point, the lottery to elect slot leaders is still the same on both chains, and there, no adversarial chain can be denser.'' +\end{minipage} +\item Header validation (as separate from block validation) critical. \\ +{\small (So far was ``merely'' required to guard against DoS attacks.)} +\end{itemize} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Header/Body split: common prefix} + +\begin{center} +\begin{tikzpicture}[yscale=0.75] +\path (0, 0) coordinate (tip) node{$\bullet$}; +\draw (tip) -- ++(1.0, 0.0) coordinate (branch) node{$\bullet$}; +\draw (branch) -- ++(1.0, 0.9) -- ++ (1.5, 0) node[right]{A}; +\draw (branch) -- ++(1.0, 0.3) -- ++ (1.5, 0) node[right]{B}; +\draw (branch) -- ++(1.0, -0.3) -- ++ (1.5, 0) node[right]{C}; +\draw (branch) -- ++(1.0, -0.9) -- ++ (1.5, 0) node[right]{D}; + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\node [below left] at (tip) {tip}; +\draw [red, very thick] (tip) -- ++(1.0, 0.0); +\draw [dashed] + (tip) + -- ++(0, 1.25) + -- ++(3, 0) + -- ++(0, -2.5) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_s$} + -- cycle; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); + +\end{tikzpicture} +\end{center} + +\begin{itemize} +\item Blocks from common prefix will be validated by chain database before adoption. +\item If found to be invalid, something went horribly wrong and we are eclipsed by +an attacker after all. Disconnect from all peers and start over. +\end{itemize} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Open questions} + +\begin{itemize} +\item Assumption is that when we see $n$ peers, that gives us a representative sample of all chains in the network. Does that mean that after we discard some peers (not dense enough), we do not have have to look for more peers (apart from for performance reasons, perhaps)? +\item Detection of genesis mode OK? +\item Applying genesis condition even if fork closer than $k$ okay? +\item Concerns about invalid blocks with valid headers? +\item Anything else..? +\end{itemize} + +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame} + +\frametitle{Flip-flopping} + +\begin{center} +\begin{tikzpicture} +\path (0, 0) coordinate (tip) node{$\bullet$} node[below left]{tip}; +\draw (tip) -- ++(1.0, 0.5) -- ++(2.5, 0) coordinate(A) node[right]{$A$}; +\draw (tip) -- ++(1.0, -0.5) -- ++(3.5, 0) coordinate(B) node[right]{$B$}; +\draw [red, very thick] (tip) -- ++(1.0, 0.5) -- ++(2.0, 0); +\draw [dashed] + (tip) + -- ++(0, 0.75) + -- ++(3, 0) + -- ++(0, -1.5) + -- ++(-3, 0) node[pos=0.5, below]{$\underbrace{\hspace{3cm}}_{\text{$s$ slots}}$} + -- cycle; +\path (tip) -- (A) node[pos=0.5, above=0.5cm]{$\overbrace{\hspace{3.5cm}}^{\text{fewer than $k$ blocks}}$}; +\path (tip) -- (B) node[pos=0.5, below=1.1cm]{$\underbrace{\hspace{4.5cm}}_{\text{more than $k$ blocks}}$}; +\draw (tip) + (-3,0) node{$\bullet$} -- (tip); +\end{tikzpicture} +\end{center} + +\pause + +$A$ is preferred over $B$, and $B$ is preferred over $A$! + + +\end{frame} + +\end{document} diff --git a/ouroboros-consensus/docs/report/references.bib b/ouroboros-consensus/docs/report/references.bib new file mode 100644 index 00000000000..adec80cb4f8 --- /dev/null +++ b/ouroboros-consensus/docs/report/references.bib @@ -0,0 +1,107 @@ +@misc{cryptoeprint:2018:1049, + author = {Aggelos Kiayias and Alexander Russell}, + title = {{Ouroboros-BFT}: A Simple {Byzantine} Fault Tolerant Consensus Protocol}, + howpublished = {Cryptology ePrint Archive, Report 2018/1049}, + year = {2018}, + note = {\url{https://eprint.iacr.org/2018/1049}}, +} + +@misc{cryptoeprint:2018:378, + author = {Christian Badertscher and Peter Gazi and Aggelos Kiayias and Alexander Russell and Vassilis Zikas}, + title = {{Ouroboros Genesis}: Composable Proof-of-Stake Blockchains with Dynamic Availability}, + howpublished = {Cryptology ePrint Archive, Report 2018/378}, + year = {2018}, + note = {\url{https://eprint.iacr.org/2018/378}}, +} + +@misc{cryptoeprint:2017:573, + author = {Bernardo David and Peter Ga{\v{z}}i and Aggelos Kiayias and Alexander Russell}, + title = {{Ouroboros Praos}: An adaptively-secure, semi-synchronous proof-of-stake protocol}, + howpublished = {Cryptology ePrint Archive, Report 2017/573}, + year = {2017}, + note = {\url{https://eprint.iacr.org/2017/573}}, +} + +@article{10.1145/571637.571640, +author = {Castro, Miguel and Liskov, Barbara}, +title = {Practical {Byzantine} Fault Tolerance and Proactive Recovery}, +year = {2002}, +issue_date = {November 2002}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +volume = {20}, +number = {4}, +issn = {0734-2071}, +url = {https://doi.org/10.1145/571637.571640}, +doi = {10.1145/571637.571640}, +abstract = {Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2% faster to 24% slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.}, +journal = {ACM Trans. Comput. Syst.}, +month = nov, +pages = {398–461}, +numpages = {64}, +keywords = {state machine replication, Byzantine fault tolerance, state transfer, asynchronous systems, proactive recovery} +} + +@techreport{byron-chain-spec, +author = {Marko Dimja\v{s}evi\'{c} and Nicholas Clark}, +title = {Specification of the Blockchain Layer}, +year = {2019}, +month = {May}, +note = {Part of the Byron specification, available from \url{https://github.com/input-output-hk/cardano-ledger-specs/}}, +institution = {IOHK}, +} + +@techreport{wallet-spec, +author = {Duncan Coutts and Edsko de Vries}, +title = {Formal specification for a {Cardano} wallet}, +year = {2018}, +month = {July}, +note = {Version 1.2}, +institution = {IOHK}, +} + +@techreport{network-spec, +author = {Duncan Coutts and Neil David and Marcin Szamotulski and Peter Thompson}, +title = {Introduction to the design of the Data Diffusion and Networking for {Cardano Shelley}}, +year = {2020}, +month = {August}, +note = {Version 1.9}, +institution = {IOHK}, +} + +@INPROCEEDINGS{6468485, + author={D. {Fiala} and F. {Mueller} and C. {Engelmann} and R. {Riesen} and K. {Ferreira} and R. {Brightwell}}, + booktitle={SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis}, + title={Detection and correction of silent data corruption for large-scale high-performance computing}, + year={2012}, + volume={}, + number={}, + pages={1-12}, + doi={10.1109/SC.2012.49}, +} + +@misc{chen2017algorand, + title={Algorand}, + author={Jing Chen and Silvio Micali}, + year={2017}, + eprint={1607.01341}, + archivePrefix={arXiv}, + primaryClass={cs.CR} +} + +@misc{cryptoeprint:2016:889, + author = {Aggelos Kiayias and Alexander Russell and Bernardo David and Roman Oliynykov}, + title = {Ouroboros: A Provably Secure Proof-of-Stake Blockchain Protocol}, + howpublished = {Cryptology ePrint Archive, Report 2016/889}, + year = {2016}, + note = {\url{https://eprint.iacr.org/2016/889}}, +} + +@misc{buterin2020combining, + title={Combining {GHOST} and {Casper}}, + author={Vitalik Buterin and Diego Hernandez and Thor Kamphefner and Khiem Pham and Zhi Qiao and Danny Ryan and Juhyeok Sin and Ying Wang and Yan X Zhang}, + year={2020}, + eprint={2003.03052}, + archivePrefix={arXiv}, + primaryClass={cs.CR} +} diff --git a/ouroboros-consensus/docs/report/report.dict b/ouroboros-consensus/docs/report/report.dict new file mode 100644 index 00000000000..85300fa6371 --- /dev/null +++ b/ouroboros-consensus/docs/report/report.dict @@ -0,0 +1,223 @@ +personal_ws-1.1 en 222 +Adots +Algorand +ApplyBlock +Astate +AstateN +Aview +AviewN +Badertscher +Bdots +BlockConfig +BlockNo +BlockProtocol +BlockSupportsProtocol +BlockchainTime +Bool +Bstate +Bview +BviewN +ByronConfig +CanBeLeader +Cardano +ChainDepState +ChainIndepState +ChainSelConfig +ChainSelection +Composable +ConsensusConfig +ConsensusProtocol +Conv +Coutts +DB's +DoS +EBBs +EQ +Edsko +EpochInfo +ExtLedgerState +ForestGreen +GC +GetTip +Goguen +Handwavy +IOHK +IsLeader +IsLedger +LedgerCfg +LedgerConfig +LedgerErr +LedgerError +LedgerState +LedgerSupportsProtocol +LedgerView +NodeKernel +Ord +Ouroboros +OutsideForecastRange +PBFT +PParams +ProtocolVersion +SecurityParam +SelectView +Serokell +SlotNo +TODO +TODOs +TTL +TickedLedgerState +UpdateLedger +VRF +ValidateView +ValidationErr +Vries +WithOrigin +acm +ada +addBlock +addblock +addtochain +adoptedProtocolVersion +antisymmetric +api +applyBlocks +applyChainTick +applyLedgerBlock +basicstyle +bft +bg +blk +blockNo +blockchain +blockchains +blockfetch +blocksubmission +boolean +breaklinks +bs +byron +byronProtocolVersion +cand +cd +cfg +chainSelConfig +chaindb +chainsel +chainselection +chainsyncclient +chaintop +checkIsLeader +clockchanges +combinator +commentstyle +compareCandidates +compareChains +composability +compositionality +crypto +cryptocurrency +cryptographic +de +dont +extledgerstate +focusonnewblock +forecastAt +forecastFor +fsync +gc +getTip +gray +hardcoded +hardfork +haskell +headerProtocolVersion +headerSoftwareVersion +headerbody +headerstate +hfc +hoc +iff +imm +immtip +infuture +init +inmemory +invalidblocks +isReachable +keywordstyle +leadershipcheck +ledgerViewForecastAt +ledgerdb +ledgerrestrictions +ledgerview +liveness +lllll +loc +lookahead +maxvalid +mempool +metatheory +microslot +microslots +midH +midL +midM +monotonicity +morekeywords +musthavenew +nonces +oldtip +openkinds +optimalamongstnew +param +pdfborder +pdftitle +plausibleCandidateChain +plausiblecandidates +pos +ppSoftforkRule +ppUpdateProposalTTL +praos +pre +pred +preferAnchoredCandidate +preferCandidate +preorder +previouslyoptimal +priori +protocolLedgerView +protocolSecurityParam +protocolVersion +reapplyLedgerBlock +reupdateChainDepState +runtime +safezone +safezones +sca +se +selectView +serendipitously +shelley +singlesignature +sk +sl +sortBy +srMinThd +sta +stabilityWindow +stakepools +superclass +switchtofork +testability +th +tickChainDepState +todo +toplevel +ttl +tx +txsubmission +unticked +updateChainDepState +validateView +viewN +wallclock diff --git a/ouroboros-consensus/docs/report/report.tex b/ouroboros-consensus/docs/report/report.tex new file mode 100644 index 00000000000..625736b3e0b --- /dev/null +++ b/ouroboros-consensus/docs/report/report.tex @@ -0,0 +1,161 @@ +\documentclass[11pt,a4paper]{report} +\usepackage{hyperref} +\usepackage[margin=2.5cm]{geometry} +\usepackage{amsmath, amsthm} +\usepackage{txfonts} +\usepackage{todonotes} +\usepackage{enumitem} +\usepackage{listings} +\usepackage[nameinlink]{cleveref} +\usepackage{microtype} + +\hypersetup{ + pdftitle={The Cardano Consensus and Storage Layer}, + pdfborder={0 0 0}, + breaklinks=true +} + +\usetikzlibrary{arrows.meta} +\usetikzlibrary{intersections} + +% https://tex.stackexchange.com/questions/229940/can-i-have-a-listing-with-fixed-column-code-and-full-flexible-comments +\makeatletter +\let\commentfullflexible\lst@column@fullflexible +\makeatother + +% Use continuous footnote numbering so we can refer to them +% https://tex.stackexchange.com/questions/10448/continuous-footnote-numbering +\counterwithout{footnote}{chapter} + +\lstset{ + language=haskell + , basicstyle=\small\ttfamily + , keywordstyle=\bfseries + , commentstyle=\normalsize\rmfamily\itshape\commentfullflexible + , columns=fixed + , morekeywords={ + family + , Type + } + } + +\theoremstyle{definition} +\newtheorem{property}{Property} +\newtheorem{definition}{Definition} +\newtheorem{lemma}{Lemma} +\newtheorem{assumption}{Assumption} +\newtheorem{corollary}{Corollary} +\newtheorem{proposal}{Proposal} +\newtheorem{failedattempt}{Failed attempt} +\numberwithin{property}{chapter} +\numberwithin{definition}{chapter} +\numberwithin{lemma}{chapter} +\numberwithin{assumption}{chapter} +\numberwithin{corollary}{chapter} +\numberwithin{proposal}{chapter} +\numberwithin{failedattempt}{chapter} + +\newenvironment{bug} + {\begin{quote} \textbf{Known bug}.} + {\end{quote}} + +\title{The Cardano Consensus and Storage Layer \\ + {\large \sc An IOHK technical report} + } +\author{Edsko de Vries \\ \href{mailto:edsko@well-typed.com} + {\small \texttt edsko@well-typed.com} + \and Thomas Winant \\ \href{mailto:thomas@well-typed.com} + {\small \texttt thomas@well-typed.com} + \and Duncan Coutts \\ \href{mailto:duncan@well-typed.com} + {\small \texttt duncan@well-typed.com} + \\ \href{mailto:duncan.coutts@iohk.io} + {\small \texttt duncan.coutts@iohk.io} + } + +\newcommand{\debugsep}[1]{ + \vspace{2em} + \hrule + \vspace{0.5em} + \textbf{#1} + \vspace{0.5em} + \hrule + \vspace{2em} +} + +% TODO +% +% * Incorporate +% +% - Previous blog posts +% - Specifications currently stored as markdown files in the repo +% - Any discussions in long comments in the code +% +% - choice of k: liveness versus safety +% - make sure we talk about the fact that the ledger can be linear + +\newcommand{\duncan}{\todo{Duncan suitable section.}} + +\begin{document} + +\maketitle + +\tableofcontents + +\input{chapters/intro/intro.tex} +\input{chapters/intro/overview.tex} +\input{chapters/intro/nonfunctional.tex} + +\part{Consensus Layer} + +\input{chapters/consensus/protocol.tex} +\input{chapters/consensus/ledger.tex} +\input{chapters/consensus/serialisation.tex} + +\part{Storage Layer} + +\input{chapters/storage/overview.tex} +\input{chapters/storage/immutabledb.tex} +\input{chapters/storage/volatiledb.tex} +\input{chapters/storage/ledgerdb.tex} +\input{chapters/storage/chainselection.tex} +\input{chapters/storage/chaindb.tex} +\input{chapters/storage/mempool.tex} + +\part{Mini protocols} + +\input{chapters/miniprotocols/chainsyncclient} +\input{chapters/miniprotocols/servers} + +\part{Hard Fork Combinator} + +\input{chapters/hfc/overview.tex} +\input{chapters/hfc/time.tex} +\input{chapters/hfc/misc.tex} + +\part{Testing} + +\input{chapters/testing/consensus.tex} +\input{chapters/testing/storage.tex} + +\part{Future Work} + +\input{chapters/future/genesis.tex} +\input{chapters/future/lowdensity.tex} +\input{chapters/future/ebbs.tex} +\input{chapters/future/misc.tex} + +\part{Conclusions} + +\input{chapters/conclusions/technical.tex} +\input{chapters/conclusions/conclusions} + +\part{Appendices} +\appendix + +\input{chapters/appendix/byron.tex} +\input{chapters/appendix/shelley.tex} + +\bibliographystyle{acm} +\bibliography{references} + +\end{document} diff --git a/ouroboros-consensus/docs/report/spellcheck.sh b/ouroboros-consensus/docs/report/spellcheck.sh new file mode 100755 index 00000000000..7e34354551e --- /dev/null +++ b/ouroboros-consensus/docs/report/spellcheck.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +for i in `find . -name '*.tex'` +do + aspell --dont-backup -l en_GB -p ./report.dict -c $i +done diff --git a/ouroboros-consensus/docs/report/unsoundswitch/UnsoundSwitch.hs b/ouroboros-consensus/docs/report/unsoundswitch/UnsoundSwitch.hs new file mode 100644 index 00000000000..e623890cd5e --- /dev/null +++ b/ouroboros-consensus/docs/report/unsoundswitch/UnsoundSwitch.hs @@ -0,0 +1,62 @@ +module Main (main) where + +import Control.Monad +import System.IO +import Text.Printf (printf) + +import Data.Number.LogFloat (LogFloat) +import qualified Data.Number.LogFloat as LF +import Statistics.Distribution +import Statistics.Distribution.Binomial + +-- | Compute the probability of seeing more than @k@ blocks in @n@ slots +moreThanK :: + Double -- ^ Active slot coefficient + -> Int -- ^ Security parameter (@k@) + -> Int -- ^ Number of slots + -> LogFloat +moreThanK f k n = + LF.sum [LF.logToLogFloat $ logProbability d i | i <- [k + 1 .. n]] + where + d :: BinomialDistribution + d = binomial n f + +defaultS :: + Double -- ^ Active slot coefficient + -> Int -- ^ Security parameter (@k@) + -> Int +defaultS f k = floor (fromIntegral k / f) `div` 4 + +main :: IO () +main = do + forM_ [s .. 4 * s] $ \n -> do + putStrLn $ show n ++ "\t" ++ showLogFloat (moreThanK f k n) + hFlush stdout + where + f = 0.05 + k = 2160 + s = defaultS f k + +{------------------------------------------------------------------------------- + LogFloat util +-------------------------------------------------------------------------------} + +showLogFloat :: LogFloat -> String +showLogFloat lf = printf "%6.4f * 10 ^ %d" m e + where + (m, e) = logFloatToScientific lf + +logFloatToScientific :: LogFloat -> (Double, Int) +logFloatToScientific lf = (m, e) + where + l :: Double + l = LF.logFromLogFloat lf + + e :: Int + e = floor $ l / log10 + + m :: Double + m = exp $ l - log10 * fromIntegral e + + log10 :: Double + log10 = log 10 diff --git a/ouroboros-consensus/docs/report/unsoundswitch/cabal.project b/ouroboros-consensus/docs/report/unsoundswitch/cabal.project new file mode 100644 index 00000000000..a6d3ed955ed --- /dev/null +++ b/ouroboros-consensus/docs/report/unsoundswitch/cabal.project @@ -0,0 +1 @@ +packages: unsoundswitch.cabal diff --git a/ouroboros-consensus/docs/report/unsoundswitch/unsoundswitch.cabal b/ouroboros-consensus/docs/report/unsoundswitch/unsoundswitch.cabal new file mode 100644 index 00000000000..56d51c9e241 --- /dev/null +++ b/ouroboros-consensus/docs/report/unsoundswitch/unsoundswitch.cabal @@ -0,0 +1,13 @@ +cabal-version: 2.4 +name: unsoundswitch +version: 0.1.0.0 +author: IOHK Engineering Team +maintainer: operations@iohk.io + +executable unsoundswitch + main-is: UnsoundSwitch.hs + build-depends: base ^>=4.14.1.0 + , logfloat + , statistics + default-language: Haskell2010 + ghc-options: -Wall diff --git a/ouroboros-consensus/docs/report/watch.sh b/ouroboros-consensus/docs/report/watch.sh new file mode 100755 index 00000000000..5f12ed04cda --- /dev/null +++ b/ouroboros-consensus/docs/report/watch.sh @@ -0,0 +1,15 @@ +#!/bin/bash + +SOURCES=`find . -name '*.tex'` +MAIN=report.tex + +while inotifywait $SOURCES +do + echo "Building.." + pdflatex -halt-on-error $MAIN >/dev/null + bibtex report + pdflatex -halt-on-error $MAIN >/dev/null + pdflatex -halt-on-error $MAIN >pdflatex.log + grep "LaTeX Warning:" pdflatex.log + echo "OK" +done