Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR009: Non-Interactive Default Rules for Reduced Padding #1003

Merged
merged 12 commits into from
Dec 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Implemented
### Negative

1. The amount of subtree roots per commitment is O(sqrt(n)), while n is the number of message shares. The worst case for the number of subtree roots is depicted in the diagram below - an entire block missing one share.
![Interactive Commitment 2](./assets/complexity.png)
![Interactive Commitment 2](./assets/complexity.png)
adlerjohn marked this conversation as resolved.
Show resolved Hide resolved
The worst case for the current implementation depends on the square size. If it is the worst square size, as in `msgMinSquareSize`, it is O(sqrt(n)) as well. On the other hand, if the message is only in one row, then it is O(log(n)).
Therefore the height of the tree over the subtree roots is in this implementation O(log(sqrt(n))), while n is the number of message shares. In the current implementation, it varies from O(log(sqrt(n))) to O(log(log(n))) depending on the square size.

Expand Down Expand Up @@ -114,13 +114,4 @@ We should note that Rollups can decide to do this scheme without changing core-a
- Currently, prepare proposal performs [`estimateSquareSize`](https://github.com/rootulp/celestia-app/blob/6f3b3ae437b2a70d72ff6be2741abb8b5378caa0/app/estimate_square_size.go#L98-L101) prior to splitting PFBs into shares because the square size is needed to malleate PFBs and extract the appropriate message share commitment for a particular square size. Since malleation no longer requires a square size, it may be possible to remove square size estimation which renders the following issues obsolete:
- <https://github.com/informalsystems/audit-celestia/issues/12>
- <https://github.com/informalsystems/audit-celestia/issues/24>
- Inter-message padding can be reduced because we can change the non-interactive default rules from this:

> - Messages that span multiple rows must begin at the start of a row (this can occur if a message is longer than k shares or if the block producer decides to start a message partway through a row and it cannot fit).
> - Messages begin at a location aligned with the largest power of 2 that is not larger than the message length or k.

To this: Messages start at an index that is a multiple of its `msgMinSquareSize`.

As an example, we have this diagram. Message 1 is three shares long and is followed by message 2, which is 11 shares long, so the `msgMinSquareSize` of the second message is equal to four. Therefore we have a padding of 5 shares shown in light blue. Furthermore, with the new non-interactive default rule set, a message of size 11 can start in this block at index zero and index three because they are multiples of four. Therefore, we save four shares of padding while retaining the same commitment.

![Padding Savings](./assets/padding-savings.png)
- Inter-message padding can be reduced in the worst case by 50% if we can change the non-interactive default rules. An in-depth analysis is performed at [ADR 009: New Non-Interactive Default Rules for Reduced Padding](./adr-009-non-interactive-default-rules-for-reduced-padding.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# ADR 009: Non-Interactive Default Rules for Reduced Padding

## Changelog

- 14.11.2022: Initial Draft

## Context

[ADR 008](./adr-008-square-size-independent-message-commitments.md) makes it possible for the current non-interactive default rules to be modified slightly:

Current

> - Messages must begin at a location aligned with the largest power of 2 that is not larger than the message length or k. If the messages are larger than k, then they must start on a new row.

Proposed

> - Messages start at an index that is a multiple of its `msgMinSquareSize`.

The upside of this proposal is that it reduces the inter-message padding. The downside is that a message inclusion proof will not be as efficient for large square sizes so the proof will be larger.

> **Note**
> This analysis assumes the implementation of [celestia-app#1004](https://github.com/celestiaorg/celestia-app/issues/1004). If the tree over the subtree roots is not a Namespace Merkle Tree then both methods have the same proof size.
nashqueue marked this conversation as resolved.
Show resolved Hide resolved

As an example, take the diagram below. Message 1 is 3 shares long and message 2 is 11 shares long.

With the current non-interactive default rules, message 2 must start at a location aligned with the largest power of 2 that is not larger than 11 (message length) or 8 (square size). Therefore, message 2 must start at a location aligned with 8 which is index 16 in the example below. This arrangement results in 5 shares of padding.

With the proposed non-interactive default rules, message 2 must start at an index that is a multiple of `msgMinSquareSize`. A message of 11 shares can fit in a square size of 4 (since 4 * 4 = 16 available shares which are < 11) so `msgMinSquareSize` is 4. Therefore, message 2 can start at index 12. This arrangement results in 1 share of padding.

![Padding Savings](./assets/padding-savings.png)

### Defining variables for this analysis

- n := Message length in number of shares
- k := Square size
- r := Number of rows a message fills in the original square

### The following questions will be answered

- Given square size independent commitments, why does a message in a larger square size result in an O(log(n)) message inclusion proof size?
- Assuming message inclusion proof sizes change from O(log(n)) to O(sqrt(n)), what is the worst-case constructible message?
- Why can we not use the same trick that we used in Question 1 in a single row for more efficient proofs over the row roots?
- How big is the proof size for this message?
- What is the worst constructible block with the most amount of padding with old and new non-interactive defaults?
- What is the quantified padding and proof size cost?

## 1. Given square size independent commitments, why does a message in a larger square size result in an O(log(n)) message inclusion proof size?
nashqueue marked this conversation as resolved.
Show resolved Hide resolved

If you use the current non-interactive default rules then the message begins at a location aligned with the largest power of 2 that is not larger than the message length or k. Because the subtree roots are aligned you can skip some subtree roots and calculate their parents.
In the example below instead of proving H1, H2, H3, and H4 to the DataRoot you can prove H10. **H10 is part of the commitment generation and part of the Merkle tree to the DataRoot.** That is why you can use it for more efficient proofs. In smaller square sizes, you cannot do this, because H10 does not exist. The nodes in **red** are the subtree nodes that you need to provide for the message inclusion proof. The nodes in **blue** are the additional nodes for the Merkle proof.

![Efficient Merkle Proofs with ADR008](./assets/effizicient-proof-size-ADR008.png)

So why can you not do it with the proposed non-interactive default rules? This is because H10 is not generated. In the diagram below the first 8 shares are in the row before and therefore the tree structure changes. The previous subtree root H10 is now H23 and cannot be used for the efficiency boost.
The commitment is still the same but we need to use the bottom subtree roots for the message inclusion proof.

![Shifted Message](./assets/new-ni-rules-message-shift.png)

## 2. Assuming message inclusion proof sizes change from O(log(n)) to O(sqrt(n)), what is the worst-case constructible message?

Given a square size k, the biggest message that you can construct that is affected by the proposed non-interactive default rules has a size (k/2)². If you construct a message that is bigger than (k/2)² the `minSquareSize` will be k. If the minSquareSize is k in a square of size k then the current non-interactive default rules are equivalent to the proposed non-interactive default rules, because the message starts always at the beginning of a row. In other words, if you have k² shares in a message the worst constructible message is a quarter of that k²/4, because that is the size of the next smaller square.
nashqueue marked this conversation as resolved.
Show resolved Hide resolved

If you choose k²/4 as the worst constructible message it would still have O(sqrt(n)) subtree roots. This is because the size of the message is k²/4 with a width of k and a length of k/4. This means the number of rows the message fills approaches O(sqrt(n)). Therefore we need to find a message where the number of rows is log(n) of the size of the message.

With k being the square size and n being the number of shares and r being the number of rows, we want to find a message so that:
k * r = n & log(n) = r => k = n/log(n)
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved

By substituting in k we can calculate n. To get r we calculate n/k, rounding up to the next highest integer in the processes.

| k | n | r |
|:----:|:-----:|:--:|
| 2 | 4 | 2 |
| 4 | 16 | 4 |
| 8 | 43 | 6 |
| 16 | 108 | 7 |
| 32 | 256 | 8 |
| 64 | 589 | 10 |
| 128 | 1328 | 11 |
| 256 | 2951 | 12 |
| 512 | 6483 | 13 |
| 1024 | 14116 | 14 |
| 2048 | 30509 | 15 |

The worst case constructible message in a square to have the biggest impact from switching from O(log(n)) to O(sqrt(n)) with n being the size of the message, has r rows in a square size of k. If r is larger than k/4 we need to take k/4 as the number of rows instead because of the first point in this message. With adopting this rule the messages look as follows:

| k | n | r |
|:----:|:-----:|:--:|
| 2 | **2** | **1** |
| 4 | **4** | **1** |
| 8 | **16** | **2** |
| 16 | **64** | **4** |
rootulp marked this conversation as resolved.
Show resolved Hide resolved
| 32 | 256 | 8 |
| 64 | 589 | 10 |
| 128 | 1328 | 11 |
| 256 | 2951 | 12 |
| 512 | 6483 | 13 |
| 1024 | 14116 | 14 |
| 2048 | 30509 | 15 |

Reminder: We did this calculation because we need O(log(n)) rows.

## 3. Why can we not use the same trick that we used in Question 1 in a single row for more efficient proofs over the row roots?

The node needs to be part of the commitment generation **and** part of the Merkle tree to the DataRoot for the trick to work. The diagram shows a Celestia square that is erasure coded and those parity shares are marked in green.
H12 is part of the commitment generation and part of the Merkle tree to the DataRoot.
It is only generated in the bigger square and not in the smaller square because in the smaller square you have to take into account the nodes over the parity shares.
As H12 only exists in the bigger square the more efficient proofs only work in those squares.
nashqueue marked this conversation as resolved.
Show resolved Hide resolved

![Row root might not be subtree root](./assets/rowroots-might-not-be-subtreeroot.png)

## 4. How big is the proof size for this message?

We differentiate the size of the proof between the current non-interactive default rules and the proposed non-interactive default rules.
For completion, we also included the scenario of k/4 to compare the proof size before and after even though the % gain is not that high.

### Current Non-Interactive Default Rules

Each row consists of one subtree root, which means if you have log(n) rows you will have in total log(n) subtree roots. The last row has log(k) subtree roots. To get the row roots we will need log(n) blue nodes from the parity shares. Blue nodes are additional nodes that you need for the Merkle proof, which have been used in the previous diagrams. After having now r row roots we need to a Merkle proof of them to the `DataRoot`. In the worst case, the message lies in the middle of the block. Therefore we will need 2* log(k) blue nodes for the proof.

![Current ni rules proof size](./assets/current-ni-rules-proof-size.png)

NMT-Node size := 32 bytes + 2\*8 bytes = 48 bytes
MT-Node size := 32 bytes

Proof size = subtree roots (rows) + subtree roots (last row) + blue nodes (parity shares) + 2 \* blue nodes (`DataRoot`)
Proof size = (log(n) + log(k) + log(n)) \* NMT-Node size + 2\*log(k) \* MT-Node size
Proof size = 48 \* (2\*log(n) + log(k)) + 64 \*log(k)

### Current Non-Interactive Default Rules for k/4

Proof size = subtree roots (rows) + subtree roots (last row) + blue nodes (parity shares) + 2 \* blue nodes (`DataRoot`)
Proof size = (k/4 + log(k) + k/4) \* NMT-Node size + 2\*log(k) \* MT-Node size
Proof size = 48 \* (k/2 + log(k)) + 64 \*log(k)

### Proposed Non-Interactive Default Rules

Each row consists of sqrt(n)/log(n) subtree roots. Which makes in total sqrt(n) subtree roots. The rest is here the same as before.

![Proposed ni rules proof size](./assets/proposed-ni-rules-proof-size.png)

Proof size = subtree roots (all rows) + subtree roots (last row) + blue nodes (parity shares) + 2 \* blue nodes (`DataRoot`)
Proof size = (sqrt(n) + log(k) + log(n)) \* NMT-Node size + 2\*log(k) \* MT-Node size
Proof size = 48 \* (sqrt(n) + log(k) + log(n)) + 64 \*log(k)

### Proposed Non-Interactive Default Rules for k/4

Proof size = subtree roots (rows) + subtree roots (last row) + blue nodes (parity shares) + 2 \* blue nodes (`DataRoot`)
Proof size = (**k/2** + log(k) + k/4) \* NMT-Node size + 2\*log(k) \* MT-Node size
Proof size = 48 \* (3k/4 + log(k)) + 64 \*log(k)

## 5. What is the worst constructible block with the most amount of padding with old and new non-interactive default rules?

For the current non-interactive default rules, when you have a square size of k the worst padding is to fill the square with messages of size k+1
Padding = (k/2) \* (k -1)

To have the most amount of padding for the proposed non-interactive default rules you use repeated messages of size 5 which will result in a padding of 3 in between.

Padding = 3 \* (k-1) \* k/8

![Worst Case Padding](./assets/worst-case-padding.png)

## What are the quantified padding and proof size costs?

### Proof Size for Super-Light-Nodes

Proof size increases from 2928 bytes to 10352 bytes in 2 GB blocks. In the current `MaxSquareSize` it's from 2096 to 3088 bytes. For bigger messages, the number of row roots will approach sqrt(n). Before that, we will get to k/4+1 roots which will make the message act the same before and after the proposed non-interactive default rules.

![Proof Size Result](./assets/proof-size-result.png)

### Proof Size for Light-Nodes

Light Nodes have additional access to row and collum roots from the Data Availability header. Therefore we can discard any blue nodes to the `DataRoot` from the analysis.

![Proof Size Result 2](./assets/proof-size-result2.png)

### Total Proof Size for Parital Nodes

Partial nodes in this context are light clients that may download all of the data in the reserved namespace. They check that the data behind the PFB was included in the `DataRoot`, via blob inclusion proofs.

For this analysis, we take the result from the light nodes and scale them up to fill the whole square. We ignore for now the reserved namespace and what space it might occupy.
For the proposed non-interactive default rules we are also creating 1 more message that could practically fit into a square. This is because the current non-interactive default rules fit one more message if we construct it this way and don't adjust the first and last messages.

![Proof Size Result 3](./assets/proof-size-result3.png)

### Padding

The worst-case padding decreases from 1.1 GB to 0.8 GB in 2 GB Blocks. In the current `MaxSquareSize` it's from 4 MB to 3 MB. In general, the worst-case padding size approaches in current non-interactive default rules 50% and the proposed non-interactive default rules 37.5%. That is a maximum reduction of padding to 25%.

![Padding Size Result](./assets/padding-size-result.png)

## Additional Optimizations

You can further optimize the proof size by using the fact the Namespace is known and the same for all the subtree roots. You can do the same trick for parity shares as the namespace is fixed for them too. Both of these optimizations are not included in the analysis and would save the bytes that are used to store the namespace.

## Status

Accepted

## Consequences

### Positive

The padding decreases a lot.

### Negative

The proof size increases a bit.
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved

## References

[Related Question](https://github.com/celestiaorg/celestia-app/blob/main/docs/architecture/adr-008-square-size-independent-message-commitments.md#positive-celestia-app-changes)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/architecture/assets/proof-size-result2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/architecture/assets/proof-size-result3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/architecture/assets/worst-case-padding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.