Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust share size to match the spec and add namespaces to each Message #53

Closed
evan-forbes opened this issue Mar 19, 2021 · 11 comments
Closed

Comments

@evan-forbes
Copy link
Member

Namespaces are now kept in the actual message to ensure that they are recoverable, see the spec PR. This should also be accounted for in the app, specifically SHARE_SIZE - NAMESPACE_ID_BYTES - SHARE_RESERVED_BYTES

@adlerjohn
Copy link
Member

adlerjohn commented Mar 19, 2021

To clarify:

  1. the share size that is erasure coded is still SHARE_SIZE
  2. SHARE_SIZE - NAMESPACE_ID_BYTES - SHARE_RESERVED_BYTES is how requests with a reserved namespace ID are chunked (txs, intermediate state roots, evidence)
  3. SHARE_SIZE - NAMESPACE_ID_BYTES is how messages are chunked

@evan-forbes
Copy link
Member Author

Yeah, that should definitely be clarified.

When generating the commitments for SignedTransactionDataPayForMessage, namespaces are not added to each share after chunking the message. This is what specifically needs to be fixed.

The (WIP) ll-core PR #235 is using a const AdjustedMessageSize, and I think the app should import this.

@liamsi
Copy link
Member

liamsi commented Mar 19, 2021

There is also something else to consider here: the spec currently assumes SHARE_RESERVED_BYTES to be one byte. That almost works because it is an index pointing to some starting point in a 256 bytes long slice. "almost" because if the index is > 127, its encoding currently will actually be 2 bytes long: https://play.golang.org/p/C_wN1E-lKRY

That is bc varint encoding encodes 7 bits at a time. See https://golang.org/pkg/encoding/binary/ for details on varint encoding. The implementation uses this encoding for unsigned ints (lengths/indexes), as almost all protobuf implementations will support this: https://developers.google.com/protocol-buffers/docs/techniques#streaming

Also, note that independent of the above issue, the current implementation does not use this index but rather length prefixed all messages (see celestiaorg/celestia-core#234)

@adlerjohn
Copy link
Member

adlerjohn commented Mar 19, 2021

So...in my head canon I assumed it would just be 1 byte, but you're correct that canonical serialization is prescribed

(the * in the example layout figure below) is the starting byte of the length of the canonically serialized first request that starts in the share, or 0 if there is none, as a canonically serialized big-endian unsigned integer

Now we have a problem: if we use canonical protobuf, this could be 1 or 2 bytes.

  1. We can't always use 2 bytes because that's invalid canonical protobuf.
  2. We don't really want to use a variable number of bytes because it makes the chunking logic more complex...but we could.
  3. We could just not use canonical protobuf and instead just use the uint8 byte directly.

@liamsi
Copy link
Member

liamsi commented Mar 19, 2021

We could just not use canonical protobuf and instead just use the uint8 byte directly.

I think this will be our best choice. It would work for the index that points to the start of a message. So it could always fit in one byte. IMO, the other thing, the length prefixing, should stay varint encoded. Simply because we don not really want to cap how long messages can get, right?

@adlerjohn
Copy link
Member

adlerjohn commented Mar 19, 2021

Yes, the length can still be varint encoded because it doesn't affect chunking logic (there's only a single length).

@liamsi
Copy link
Member

liamsi commented Dec 5, 2021

Can this be closed?

@adlerjohn
Copy link
Member

I'm not sure, I asked about this during the off-site and didn't get a conclusive answer on whether it was implemented. cc @evan-forbes

@evan-forbes
Copy link
Member Author

Sorry, I should have been clearer. The commitments that are generated do include the namespace of the message, but they do not add the length delimiter.

We also need more testing to compare the generated commitments with the subtree roots of an actual square, to make sure that we are doing this important step correctly in the future. This would also help test #49

tree := nmt.New(sha256.New(), nmt.NamespaceIDSize(NamespaceIDSize))
for _, leaf := range set {
nsLeaf := append(make([]byte, 0), append(namespace, leaf...)...)
err := tree.Push(nsLeaf)
if err != nil {
return nil, err
}
}

@evan-forbes
Copy link
Member Author

evan-forbes commented Jan 19, 2022

Was thinking about his more, and while we could use code from core that already exists (done on the evan/user-core-code-to-split-shares branch)

// CreateCommitment generates the commit bytes for a given message, namespace, and
// squaresize using a namespace merkle tree and the rules described at
// https://github.com/celestiaorg/celestia-specs/blob/master/src/rationale/message_block_layout.md#message-layout-rationale
func CreateCommitment(k uint64, namespace, message []byte) ([]byte, error) {
// add padding to the message if necessary
msg := coretypes.Messages{
MessagesList: []coretypes.Message{
{
NamespaceID: namespace,
Data: message,
},
},
}
// split into shares that are length delimited and include the namespace in
// each share
shares := msg.SplitIntoShares().RawShares()

we could also move the commitment creation code to celestia-core, that way it would easier to roundtrip test

@evan-forbes
Copy link
Member Author

I don't think we need this any longer, and am unsure of the context

@evan-forbes evan-forbes closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants