Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reserved namespace ID for padding #733

Closed
Tracked by #650
adlerjohn opened this issue Jul 4, 2021 · 5 comments
Closed
Tracked by #650

Add reserved namespace ID for padding #733

adlerjohn opened this issue Jul 4, 2021 · 5 comments
Labels
specs directly relevant to the specs

Comments

@adlerjohn
Copy link
Member

adlerjohn commented Jul 4, 2021

Taking some inspiration from this post's suggestion of

Application ID 0 is interpreted as an empty entry and can be inserted anywhere by the block producers; it might be required for padding. Any application ID 0 fields should be ignored by clients.

We could use the application ID 0x0000000000000000 to denote any padding shares, then modify the rules of the NMT to simply ignore such shares when computing the min and max namespace IDs of a subtree. Note that the NMT already has special rules to handle tail padding, though since tail padding has the maximum namespace ID it is guaranteed to be lexicographically sorted. These padding shares with a zero namespace ID would not be.

The intuitions around why this should be possible is that there is no need to use the lexicographic-less-than binary relation specifically. The binary relation used for ordering namespace IDs does not necessarily need to be transitive or antisymmetric for its entire domain. As such, we can simply define a new one that ignores the zero namespace ID (i.e. is true if one or more of the inputs to the relation is the zero namespace ID).

The reason this would be useful is that currently padding between messages cannot be distinguished from messages. This means applications must potentially download a bunch of extra data that isn't useful for them. If padding has a distinct namespace ID, then it can be distinguished and simply not downloaded.

@musalbas
Copy link
Member

musalbas commented Jul 4, 2021

The reason this would be useful is that currently padding between messages cannot be distinguished from messages. This means applications must potentially download a bunch of extra data that isn't useful for them. If padding has a distinct namespace ID, then it can be distinguished and simply not downloaded.

Isn't that already possible, simply by compressing the padding data when sending it over the network?

Or would the issue be that clients would still request certain leaves over Bitswap and Tendermint block gossiping that they don't need, thus creating extra overhead? (This might be less of an issue with GraphSync-style networking if the entire subtree data is downloaded, however.)

@adlerjohn
Copy link
Member Author

Isn't that already possible, simply by compressing the padding data when sending it over the network?

Technically yes, since padding shares have a value of zero, if the client knows the namespace ID they can compute the hash of padding shares for the namespace ID and simply not download leaves with that hash. So the OP is a bit imprecise that downloading is the issue.

The big issue is that clients have no way of distinguishing between padding shares and genuine shares where the bytes are simply all zero. So even if you avoid downloading those shares, you'd still have to at least process a bunch of padding shares that you then discard (e.g. you'd have to deserialize assuming the padding is real data, which might be invalid protobuf since there's a bunch of extra zeroes). It's extra complexity. If clients can distinguish that some shares are padding they can simply discard padding shares.

Allowing for padding shares with non-zero namespace ID also enables for perverse things, such as storing (quite a bit of) data in the namespace ID that isn't paid for by anyone.

@adlerjohn adlerjohn transferred this issue from celestiaorg/celestia-specs Sep 19, 2022
@adlerjohn adlerjohn added the specs directly relevant to the specs label Sep 19, 2022
@evan-forbes
Copy link
Member

see #577 (comment)

@evan-forbes
Copy link
Member

Allowing for padding shares with non-zero namespace ID also enables for perverse things, such as storing (quite a bit of) data in the namespace ID that isn't paid for by anyone.

this is currently an implicit block validity rule as honest nodes will recompute the DAH and fill in empty shares with tail padding. It could still be done with a dishonest majority currently. Do we need a fraud proof for this? @adlerjohn

@adlerjohn
Copy link
Member Author

Can you clarify the attack? I'm not understanding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
specs directly relevant to the specs
Projects
None yet
Development

No branches or pull requests

3 participants