Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: data square layout specs update part 1 #1905

Merged
merged 9 commits into from
Jun 16, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ import (
// FitsInSquare uses the non interactive default rules to see if blobs of
// some lengths will fit in a square of squareSize starting at share index
// cursor. Returns whether the blobs fit in the square and the number of
// shares used by blobs. See non-interactive default rules
// https://github.com/celestiaorg/celestia-specs/blob/master/src/rationale/message_block_layout.md#non-interactive-default-rules
// https://github.com/celestiaorg/celestia-app/blob/1b80b94a62c8c292f569e2fc576e26299985681a/docs/architecture/adr-009-non-interactive-default-rules-for-reduced-padding.md
// shares used by blobs. See blob share commitment rules
// ../../specs/src/specs/data_square_layout.md#blob-share-commitment-rules
// ../../docs/architecture/adr-013-non-interactive-default-rules-for-reduced-padding.md
func FitsInSquare(cursor, squareSize, subtreeRootThreshold int, blobShareLens ...int) (bool, int) {
if len(blobShareLens) == 0 {
if cursor <= squareSize*squareSize {
Expand All @@ -30,7 +30,7 @@ func FitsInSquare(cursor, squareSize, subtreeRootThreshold int, blobShareLens ..
}

// BlobSharesUsedNonInteractiveDefaults returns the number of shares used by a given set
// of blobs share lengths. It follows the non-interactive default rules and
// of blobs share lengths. It follows the blob share commitment rules and
// returns the share indexes for each blob.
func BlobSharesUsedNonInteractiveDefaults(cursor, squareSize, subtreeRootThreshold int, blobShareLens ...int) (sharesUsed int, indexes []uint32) {
start := cursor
Expand All @@ -44,7 +44,7 @@ func BlobSharesUsedNonInteractiveDefaults(cursor, squareSize, subtreeRootThresho
}

// NextShareIndex determines the next index in a square that can be used. It
// follows the non-interactive default rules defined in ADR013. Assumes
// follows the blob share commitment rules defined in ADR013. Assumes
// that all args are non negative, and that squareSize is a power of two.
// https://github.com/celestiaorg/celestia-specs/blob/master/src/rationale/message_block_layout.md#non-interactive-default-rules
// https://github.com/celestiaorg/celestia-app/blob/0334749a9e9b989fa0a42b7f011f4a79af8f61aa/docs/architecture/adr-013-non-interactive-default-rules-for-zero-padding.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ func TestNextShareIndex(t *testing.T) {
expectedIndex: 11,
},
{
name: "non-interactive default rules for reduced padding diagram",
name: "blob share commitment rules for reduced padding diagram",
cursor: 11,
blobLen: 11,
squareSize: 8,
Expand Down
2 changes: 1 addition & 1 deletion pkg/shares/padding.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (

// NamespacePaddingShare returns a share that acts as padding. Namespace padding
// shares follow a blob so that the next blob may start at an index that
// conforms to non-interactive default rules. The ns parameter provided should
// conforms to blob share commitment rules. The ns parameter provided should
// be the namespace of the blob that precedes this padding in the data square.
func NamespacePaddingShare(ns appns.Namespace) (Share, error) {
b, err := NewBuilder(ns, appconsts.ShareVersionZero, true).Init()
Expand Down
3 changes: 1 addition & 2 deletions specs/src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
- [Block Validity Rules](./specs/block_validity_rules.md)
- [Networking](./specs/networking.md)
- [Public-Key Cryptography](./specs/public_key_cryptography.md)
- [Rationale](./rationale/index.md)
- [Data Square Layout](./rationale/data_square_layout.md)
- [Data Square Layout](./specs/data_square_layout.md)
- [State Machine Modules](./specs/state_machine_modules.md)
- [blob](../../x/blob/README.md)
- [qgb](../../x/qgb/README.md)
Expand Down
3 changes: 1 addition & 2 deletions specs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
- [Block Validity Rules](./specs/block_validity_rules.md)
- [Networking](./specs/networking.md)
- [Public-Key Cryptography](./specs/public_key_cryptography.md)
- [Rationale](./rationale/index.md)
- [Data Square Layout](./rationale/data_square_layout.md)
- [Data Square Layout](./specs/data_square_layout.md)
- [State Machine Modules](./specs/state_machine_modules.md)
- [blob](../../x/blob/README.md)
- [qgb](../../x/qgb/README.md)
Expand Down
3 changes: 0 additions & 3 deletions specs/src/rationale/index.md

This file was deleted.

4 changes: 2 additions & 2 deletions specs/src/specs/block_proposer.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ With these restrictions in mind, the block proposer performs the following actio
1. Collect as many transactions and blobs from the mempool as possible, such that the total number of shares is at most [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants).
1. Compute the smallest square size that is a power of 2 that can fit the number of shares.
1. Attempt to lay out the collected transactions and blobs in the current square.
1. If the square is too small to fit all transactions and blobs (which may happen [due to needing to insert padding between blobs](../rationale/data_square_layout.md)) and the square size is smaller than [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants), double the size of the square and repeat the above step.
1. If the square is too small to fit all transactions and blobs (which may happen [due to needing to insert padding between blobs](../rationale/data_square_layout.md)) and the square size is at [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants), drop the transactions and blobs until the data fits within the square.
1. If the square is too small to fit all transactions and blobs (which may happen [due to needing to insert padding between blobs](../specs/data_square_layout.md)) and the square size is smaller than [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants), double the size of the square and repeat the above step.
1. If the square is too small to fit all transactions and blobs (which may happen [due to needing to insert padding between blobs](../specs/data_square_layout.md)) and the square size is at [`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`](./consensus.md#constants), drop the transactions and blobs until the data fits within the square.
cmwaters marked this conversation as resolved.
Show resolved Hide resolved

Note: the maximum padding shares between blobs should be at most twice the number of blob shares. Doubling the square size (i.e. quadrupling the number of shares in the square) should thus only have to happen at most once.
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,24 @@

## Preamble

Celestia uses [a data availability scheme](https://arxiv.org/abs/1809.09044) that allows nodes to determine whether a block's data was published without downloading the whole block. The core of this scheme is arranging data in a two-dimensional matrix then applying erasure coding to each row and column. This document describes the rationale for how data—transactions, blobs, and other data—[is actually arranged](../specs/data_structures.md#arranging-available-data-into-shares). Familiarity with the [originally proposed data layout format](https://arxiv.org/abs/1809.09044) is assumed.
Celestia uses [a data availability scheme](https://arxiv.org/abs/1809.09044) that allows nodes to determine whether a block's data was published without downloading the whole block. The core of this scheme is arranging data in a two-dimensional matrix then applying erasure coding to each row and column. This document describes the rationale for how data—transactions, blobs, and other data—[is actually arranged](./data_structures.md#arranging-available-data-into-shares). Familiarity with the [originally proposed data layout format](https://arxiv.org/abs/1809.09044) is assumed.

## Rationale
## Layout Rationale

Block data consists of:

1. Cosmos SDK module transactions (e.g. [MsgSend](https://github.com/cosmos/cosmos-sdk/blob/f71df80e93bffbf7ce5fbd519c6154a2ee9f991b/proto/cosmos/bank/v1beta1/tx.proto#L21-L32)). These modify the Celestia chain's state.
1. Celestia-specific transactions (e.g. [PayForBlobs](../specs/data_structures.md#payforblobdata)). These modify the Celestia chain's state.
1. Intermediate state roots: required for fraud proofs of the aforementioned transactions.
1. Blobs: binary blobs which do not modify the Celestia state, but which are intended for a Celestia application identified with a provided namespace.
1. Standard cosmos-SDK transactions: (which are often represented internally as the [`sdk.Tx` interface](https://github.com/celestiaorg/cosmos-sdk/blob/v1.14.0-sdk-v0.46.11/types/tx_msg.go#L42-L50)) as described in [cosmos-sdk ADR020](https://github.com/celestiaorg/cosmos-sdk/blob/v1.14.0-sdk-v0.46.11/docs/architecture/adr-020-protobuf-transaction-encoding.md)
1. These transactions contain protobuf encoded [`sdk.Msg`](https://github.com/celestiaorg/cosmos-sdk/blob/v1.14.0-sdk-v0.46.11/types/tx_msg.go#L14-L26)s, which get executed atomically (if one fails they all fail) to update the Celestia state. The complete list of modules, which define the `sdk.Msg`s that the state machine is capable of handling, can be found in the [state machine modules spec](../specs/state_machine_modules.md). Examples include standard cosmos-sdk module messages such as [MsgSend](https://github.com/cosmos/cosmos-sdk/blob/f71df80e93bffbf7ce5fbd519c6154a2ee9f991b/proto/cosmos/bank/v1beta1/tx.proto#L21-L32)), and celestia specific module messages such as [`MsgPayForBlobs`](https://github.com/celestiaorg/celestia-app/blob/v1.0.0-rc2/proto/celestia/blob/v1/tx.proto#L16-L31)
1. Blobs: binary large objects which do not modify the Celestia state, but which are intended for a Celestia application identified with a provided namespace.

We want to arrange this data into a `k * k` matrix of fixed-sized shares, which will later be committed to in [Namespace Merkle Trees (NMTs)](../specs/data_structures.md#namespace-merkle-tree) so that individual shares in this matrix can be proven to belong to a single data root.
We want to arrange this data into a `k * k` matrix of fixed-sized [shares](../specs/shares.md), which will later be committed to in [Namespace Merkle Trees (NMTs)](https://github.com/celestiaorg/nmt/blob/v0.16.0/docs/spec/nmt.md) so that individual shares in this matrix can be proven to belong to a single data root.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no change needed][question] for consistency, should we refer to it as "Namepsace" or "Namespaced" Merkle Trees?

https://github.com/celestiaorg/nmt README says "Namespaced" but TBH I prefer "Namespace"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with either, but the LL paper uses "namespaced"


The simplest way we can imagine arranging block data is to simply serialize it all in no particular order, split it into fixed-sized shares, then arrange those shares into the `k * k` matrix in row-major order. However, this naive scheme can be improved in a number of ways, described below.

First, we impose some ground rules:

1. Data must be ordered by namespace. This makes queries into a NMT commitment of that data more efficient.
1. Since non-blob data are not naturally intended for particular namespaces, we assign reserved namespaces for them. A range of namespaces is reserved for this purpose, starting from the lowest possible namespace.
1. Since non-blob data are not naturally intended for particular namespaces, we assign [reserved namespaces](./consensus.md#Reservered-Namespaces) for them. A range of namespaces is reserved for this purpose, starting from the lowest possible namespace.
1. By construction, the above two rules mean that non-blob data always precedes blob data in the row-major matrix, even when considering single rows or columns.
1. Data with different namespaces must not be in the same share. This might cause a small amount of wasted block space, but makes the NMT easier to reason about in general since leaves are guaranteed to belong to a single namespace.

Expand All @@ -37,16 +36,14 @@ Specifically, blobs must begin at a new share. We note a nice property from this

This, however, requires the block producer to interact with the transaction sender to provide them the starting location of their blob, so that the sender can sign over the commitment based on that starting location. This can be done selectively, but is not ideal as a default for e.g. end-user wallets.

### Non-Interactive Default Rules
### Blob Share Commitment Rules

As a non-consensus-critical default, we can impose one additional rule on blob placement to make the possible starting locations of blobs sufficiently predictable and constrained such that users can deterministically compute subtree roots without interaction:
To make the possible starting locations of blobs sufficiently predictable and constrained such that users can deterministically compute subtree roots, needed for the `ShareCommitment`s within a PFB, without the need to interact with the block proposer, we impose one additional rule:

> Blobs start at an index that is a multiple of the blob minimum square size. The blob minimum square size is the smallest square that can contain the blob in isolation (i.e. a square with only this blob and no other transactions or blobs).
> Blobs must start at an index that is a multiple of the `SubtreeWidth`. The `SubtreeWidth` is the length of the blob in shares, divided by the [`SubtreeRootThreshold`](https://github.com/celestiaorg/celestia-app/blob/v1.0.0-rc2/pkg/appconsts/v1/app_consts.go#L6) and rounded up to the nearest power of 2 ([implementation here](https://github.com/celestiaorg/celestia-app/blob/v1.0.0-rc2/pkg/shares/non_interactive_defaults.go#L94-L116)).

In the constraint mentioned above, the number of rows/columns in the minimum square size should be a power of 2.
With the above constraint, we can compute subtree roots deterministically. In order to compute the subtree roots, split the blob into chunks that are of maximum size: blob minimum square size. As an example, a blob of length `11` has a minimum square size of `4` because `11` is not greater than `4 * 4 = 16` total shares. Split the blob into chunks of length `4, 4, 2, 1` because each chunk must be a power of `2`. The resulting slices are the leaves of subtrees whose roots can be computed. These subtree roots will be present as internal nodes in the NMT of _some_ row(s).

This is similar to [Merkle Mountain Ranges](https://www.usenix.org/legacy/event/sec09/tech/full_papers/crosby.pdf), though with the largest subtree bounded by the blob minimum square size rather than being unbounded.
The `SubtreeRootThreshold` is an arbitrary versioned protocol constant that aims to put a soft limit on the number of subtree roots included in a blob inclusion proof, as described in [ADR013](../../../docs/architecture/adr-013-non-interactive-default-rules-for-zero-padding.md). A higher `SubtreeRootThreshold` means less padding and more tightly packed squares but also means greater proof sizes.
With the above constraint, we can compute subtree roots deterministically. For example, a blob of 128 shares and `SubtreeRootThreshold` (SRT) = 64, must start on a share index that is a multiple of 2 because 128/64 = 2. In this case, there will be a maximum of 1 share of padding between blobs (more on padding below). The maximum subtree width in shares will also be 2, meaning that there will be 2 shares under each subtree root.

The last piece of the puzzle is determining _which_ row the blob is placed at (or, more specifically, the starting location). This is needed to keep the block producer accountable. To this end, the block producer simply augments each fee-paying transaction with the starting locations of the blobs the transaction pays for.

Expand Down
8 changes: 4 additions & 4 deletions specs/src/specs/data_structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,13 +406,13 @@ For shares **with a namespace equal to [`PARITY_SHARE_NAMESPACE`](./consensus.md

#### Namespace Padding Share

A namespace padding share acts as padding between blobs so that the subsequent blob may begin at an index that conforms to the [non-interactive default rules](../rationale/data_square_layout.md#non-interactive-default-rules). A namespace padding share contains the namespace ID of the blob that precedes it in the data square so that the data square can retain the property that all shares are ordered by namespace.
A namespace padding share acts as padding between blobs so that the subsequent blob may begin at an index that conforms to the [blob share commitment rules](../specs/data_square_layout.md#blob-share-commitment-rules). A namespace padding share contains the namespace ID of the blob that precedes it in the data square so that the data square can retain the property that all shares are ordered by namespace.

The first [`NAMESPACE_SIZE`](./consensus.md#constants) of a share's raw data `rawData` is the namespace of the blob that precedes this padding share. The next [`SHARE_INFO_BYTES`](./consensus.md#constants) bytes are for share information. The sequence start indicator is always `1`. The version bits are filled with the share version. The sequence length is zeroed out. The remaining [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_SIZE`](./consensus.md#constants)`-`[`SHARE_INFO_BYTES`](./consensus.md#constants) `-` [`SEQUENCE_BYTES`](./consensus.md#constants) bytes are filled with `0`.

#### Reserved Padding Share

Reserved padding shares are placed after the last reserved namespace share in the data square so that the first blob can start at an index that conforms to non-interactive default rules. Clients can safely ignore the contents of these shares because they don't contain any significant data.
Reserved padding shares are placed after the last reserved namespace share in the data square so that the first blob can start at an index that conforms to blob share commitment rules. Clients can safely ignore the contents of these shares because they don't contain any significant data.

For shares **with a namespace ID equal to [`RESERVED_PADDING_NAMESPACE`](./consensus.md#constants)** (i.e. reserved padding shares):

Expand Down Expand Up @@ -457,7 +457,7 @@ For each blob, it is placed in the available data matrix, with row-major order,

1. Place the first share of the blob at the next unused location in the matrix, then place the remaining shares in the following locations.

Transactions [must commit to a Merkle root of a list of hashes](#transaction) that are each guaranteed (assuming the block is valid) to be subtree roots in one or more of the row NMTs. For additional info, see [the rationale document](../rationale/data_square_layout.md) for this section.
Transactions [must commit to a Merkle root of a list of hashes](#transaction) that are each guaranteed (assuming the block is valid) to be subtree roots in one or more of the row NMTs. For additional info, see [the rationale document](../specs/data_square_layout.md) for this section.

However, with only the rule above, interaction between the block producer and transaction sender may be required to compute a commitment to the blob the transaction sender can sign over. To remove interaction, blobs can optionally be laid out using a non-interactive default:

Expand All @@ -468,7 +468,7 @@ In the example below, two blobs (of lengths 2 and 1, respectively) are placed us

![fig: original data blob](./figures/rs2d_originaldata_blob.svg)

The non-interactive default rules may introduce empty shares that do not belong to any blob (in the example above, the top-right share is empty). These are zeroes with namespace ID equal to the either [`TAIL_TRANSACTION_PADDING_NAMESPACE_ID`](./consensus.md#constants) if between a request with a reserved namespace ID and a blob, or the namespace ID of the previous blob if succeeded by a blob. See the [rationale doc](../rationale/data_square_layout.md) for more info.
The blob share commitment rules may introduce empty shares that do not belong to any blob (in the example above, the top-right share is empty). These are zeroes with namespace ID equal to the either [`TAIL_TRANSACTION_PADDING_NAMESPACE_ID`](./consensus.md#constants) if between a request with a reserved namespace ID and a blob, or the namespace ID of the previous blob if succeeded by a blob. See the [rationale doc](../specs/data_square_layout.md) for more info.

## Available Data

Expand Down
Loading