Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DAH creation to better accommodate celestia-node use case #539

Merged

Conversation

evan-forbes
Copy link
Member

@evan-forbes evan-forbes commented Sep 21, 2021

Description

This PR is a quick refactor of #524 so that the user has more control over what happens to the extended data square after it is used to generate a data availability header. Note that this PR was built on top of the branch used for #537, so that should probably merged first.

Closes: #538
blocked by #536 #537

@evan-forbes evan-forbes self-assigned this Sep 21, 2021
@evan-forbes evan-forbes requested review from Wondertan and removed request for adlerjohn, musalbas and Wondertan September 21, 2021 15:18
@evan-forbes evan-forbes changed the base branch from master to evan/merge-theirs September 21, 2021 15:19
@evan-forbes evan-forbes changed the base branch from evan/merge-theirs to evan/cherry-pick-share-merging-splitting September 21, 2021 15:21
Comment on lines +41 to +52
func NewDataAvailabilityHeader(eds *rsmt2d.ExtendedDataSquare) DataAvailabilityHeader {
// generate the row and col roots using the EDS
dah := DataAvailabilityHeader{
RowsRoots: eds.RowRoots(),
ColumnRoots: eds.ColRoots(),
}

// generate the hash of the data using the new roots
dah.Hash()

return dah
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of generating the extended data square for the user, accept it as input to create a data availability header

Comment on lines +54 to 56
func ExtendShares(squareSize uint64, shares [][]byte) (*rsmt2d.ExtendedDataSquare, error) {
// Check that square size is with range
if squareSize < consts.MinSquareSize || squareSize > consts.MaxSquareSize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff make it's difficult to see the differences... I reccomend simply looking at the file. The previous NewDataAvailabilityHeader function has been split in two. This provides the user with direct access to the extended data square.

// NewDataAvailabilityHeader generates a DataAvailability header using the provided square size and shares
func NewDataAvailabilityHeader(eds *rsmt2d.ExtendedDataSquare) DataAvailabilityHeader {
// generate the row and col roots using the EDS
dah := DataAvailabilityHeader{
RowsRoots: eds.RowRoots(),
ColumnRoots: eds.ColRoots(),
}
// generate the hash of the data using the new roots
dah.Hash()
return dah
}
func ExtendShares(squareSize uint64, shares [][]byte) (*rsmt2d.ExtendedDataSquare, error) {
// Check that square size is with range
if squareSize < consts.MinSquareSize || squareSize > consts.MaxSquareSize {
return nil, fmt.Errorf(
"invalid square size: min %d max %d provided %d",
consts.MinSquareSize,
consts.MaxSquareSize,
squareSize,
)
}
// check that valid number of shares have been provided
if squareSize*squareSize != uint64(len(shares)) {
return nil, fmt.Errorf(
"must provide valid number of shares for square size: got %d wanted %d",
len(shares),
squareSize*squareSize,
)
}
tree := wrapper.NewErasuredNamespacedMerkleTree(squareSize)
return rsmt2d.ComputeExtendedDataSquare(shares, consts.DefaultCodec(), tree.Constructor)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the main change and the somewhere else mentioned celestia-node use-case is to re-use this function, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is the main change, and celestia-node would use both of the new functions outlined here. This way they would have access to the rsmt2d.DataSquare

Comment on lines +54 to 56
func ExtendShares(squareSize uint64, shares [][]byte) (*rsmt2d.ExtendedDataSquare, error) {
// Check that square size is with range
if squareSize < consts.MinSquareSize || squareSize > consts.MaxSquareSize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff make it's difficult to see the differences... I reccomend simply looking at the file. The previous NewDataAvailabilityHeader function has been split in two. This provides the user with direct access to the extended data square.

// NewDataAvailabilityHeader generates a DataAvailability header using the provided square size and shares
func NewDataAvailabilityHeader(eds *rsmt2d.ExtendedDataSquare) DataAvailabilityHeader {
// generate the row and col roots using the EDS
dah := DataAvailabilityHeader{
RowsRoots: eds.RowRoots(),
ColumnRoots: eds.ColRoots(),
}
// generate the hash of the data using the new roots
dah.Hash()
return dah
}
func ExtendShares(squareSize uint64, shares [][]byte) (*rsmt2d.ExtendedDataSquare, error) {
// Check that square size is with range
if squareSize < consts.MinSquareSize || squareSize > consts.MaxSquareSize {
return nil, fmt.Errorf(
"invalid square size: min %d max %d provided %d",
consts.MinSquareSize,
consts.MaxSquareSize,
squareSize,
)
}
// check that valid number of shares have been provided
if squareSize*squareSize != uint64(len(shares)) {
return nil, fmt.Errorf(
"must provide valid number of shares for square size: got %d wanted %d",
len(shares),
squareSize*squareSize,
)
}
tree := wrapper.NewErasuredNamespacedMerkleTree(squareSize)
return rsmt2d.ComputeExtendedDataSquare(shares, consts.DefaultCodec(), tree.Constructor)
}

Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM @evan-forbes ! Thank you so much for the quick PR, this will be very helpful for celestia node.

@evan-forbes evan-forbes force-pushed the evan/cherry-pick-share-merging-splitting branch from 4beddda to 66f720d Compare September 21, 2021 17:33
@liamsi liamsi force-pushed the evan/cherry-pick-share-merging-splitting branch from 66f720d to 0ec12a3 Compare September 22, 2021 16:18
@liamsi liamsi changed the base branch from evan/cherry-pick-share-merging-splitting to evan/merge-theirs September 22, 2021 16:28
liamsi and others added 9 commits September 22, 2021 11:44
* move Messages field to the end of Block.Data

* Add some constants for share computation and the NMT:

 - also a bunch of todos regarding shares computation

* First (compiling) stab on creating shares

* Test with Evidence and fix bug discovered by test

* remove resolved todos

* introduce split method

* Introduce LenDelimitedMarshaler interface and some reformatting

* Introduce TxLenDelimitedMarshaler

* add some test cases

* fix some comments

* fix some comments & linter

* Add reserved namespaces to params

* Move ll-specific consts into a separate file (consts.go)

* Add MarshalDelimited to HexBytes

* Add tail-padding shares

* Add ComputeShares method on Data to compute all shares

* Fix compute the next square num and not the next power of two

* lints

* Unexport MakeShares function:

- it's likely to change and it doesn't have to be part of the public API

* lints 2

* First stab on computing row/column roots

* fix rebase glitches:
 - move DA related constants out of params.go

* refactor MakeBlock to take in interm. state roots and messages

* refactor state.MakeBlock too

* Add todos LenDelimitedMarshaler and extract appendShares logic

* Simplify shares computation: remove LenDelimitedMarshaler abstraction

* actually use DA header to compute the DataRoot everywhere (will lead to failing tests for sure)

* WIP: Update block related core data structures in protobuf too

* WIP: fix zero shares edge-case and get rid of Block.Data.hash (use dataAvailabilityHeader.Hash() instead)

* Fixed tests, only 3 failing tests to go: TestReapMaxBytesMaxGas, TestTxFilter, TestMempoolFilters

* Fix TestTxFilter:

 - the size of the wrapping Data{} proto message increased a few bytes

* Fix Message proto and `DataFromProto`

* Fix last 2 remaining tests related to the increased block/block.Data size

* Use infectious lib instead of leopard

* proto-lint: snake_case

* some lints and minor changes

* linter

* panic if pushing to tree fails, extend Data.ToProto()

* revert renaming in comment

* add todo about refactoring as soon as the rsmt2d allows the user to choose the merkle tree
* Export block data compute shares.
* Refactor to use ShareSize constant directly.
* Change message splitting to prefix namespace ID.
* Implement chunking for contiguous.
* Add termination condition.
* Rename append contiguous to split contiguous.
* Update test for small tx.
* Add test for two contiguous.
* Make tx and msg adjusted share sizes exported constants.
* Panic on hopefully-unreachable condition instead of silently skipping.
* Update hardcoded response for block format.

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>
* fix overwrite bug and stop splitting shares of size MsgShareSize

* remove ineffectual code

* review feedback: better docs

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* remove uneeded copy and only fix the source of the bug

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* fix overwrite bug while also being consistent with using NamespacedShares

* update to the latest rsmt2d for the nmt wrapper

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>
* start spec compliant share merging

* refactor and finish unit testing

* whoops

* linter gods

* fix initial changes and use constants

* use constant

* more polish

* docs fix* review feedback: docs and out of range panic protection

* review feedback: add panic protection from empty input

* use constant instead of recalculating `ShareSize`* don't redeclare existing var* be more explicit with returned nil* use constant instead of recalculating `ShareSize`* review feedback: use consistent capitalization

* stop accepting reserved namespaces as normal messages

* use a descriptive var name for message length

* linter and comparison fix

* reorg tests, add test for parse delimiter, DataFromBlock and fix evidence marshal bug

* catch error for linter

* update test MakeShares to include length delimiters for the SHARE_RESERVED_BYTE

* minor iteration change

* refactor share splitting to fix bug

* fix all bugs with third and final refactor

* fix conflict

* revert unnecessary changes

* review feedback: better docs* reivew feedback: add comment for safeLen

* review feedback: remove unnecessay comments

* review feedback: split up share merging and splitting into their own files

* review feedback: more descriptive var names

* fix accidental change

* add some constant docs

* spelling error

Co-authored-by: Hlib Kanunnikov <hlibwondertab@gmail.com>
Co-authored-by: John Adler <adlerjohn@users.noreply.github.com>
Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>
Co-authored-by: rene <41963722+renaynay@users.noreply.github.com>
@evan-forbes evan-forbes force-pushed the evan/erasured-data-hash-and-dah-creation-refactor branch from 422eb4d to ee4a20b Compare September 22, 2021 16:45
Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@liamsi liamsi merged commit d110b6f into evan/merge-theirs Sep 22, 2021
@liamsi liamsi deleted the evan/erasured-data-hash-and-dah-creation-refactor branch September 22, 2021 17:16
liamsi added a commit that referenced this pull request Sep 23, 2021
)

* Basic DA functionality  (#83)

* move Messages field to the end of Block.Data

* Add some constants for share computation and the NMT:

 - also a bunch of todos regarding shares computation

* First (compiling) stab on creating shares

* Test with Evidence and fix bug discovered by test

* remove resolved todos

* introduce split method

* Introduce LenDelimitedMarshaler interface and some reformatting

* Introduce TxLenDelimitedMarshaler

* add some test cases

* fix some comments

* fix some comments & linter

* Add reserved namespaces to params

* Move ll-specific consts into a separate file (consts.go)

* Add MarshalDelimited to HexBytes

* Add tail-padding shares

* Add ComputeShares method on Data to compute all shares

* Fix compute the next square num and not the next power of two

* lints

* Unexport MakeShares function:

- it's likely to change and it doesn't have to be part of the public API

* lints 2

* First stab on computing row/column roots

* fix rebase glitches:
 - move DA related constants out of params.go

* refactor MakeBlock to take in interm. state roots and messages

* refactor state.MakeBlock too

* Add todos LenDelimitedMarshaler and extract appendShares logic

* Simplify shares computation: remove LenDelimitedMarshaler abstraction

* actually use DA header to compute the DataRoot everywhere (will lead to failing tests for sure)

* WIP: Update block related core data structures in protobuf too

* WIP: fix zero shares edge-case and get rid of Block.Data.hash (use dataAvailabilityHeader.Hash() instead)

* Fixed tests, only 3 failing tests to go: TestReapMaxBytesMaxGas, TestTxFilter, TestMempoolFilters

* Fix TestTxFilter:

 - the size of the wrapping Data{} proto message increased a few bytes

* Fix Message proto and `DataFromProto`

* Fix last 2 remaining tests related to the increased block/block.Data size

* Use infectious lib instead of leopard

* proto-lint: snake_case

* some lints and minor changes

* linter

* panic if pushing to tree fails, extend Data.ToProto()

* revert renaming in comment

* add todo about refactoring as soon as the rsmt2d allows the user to choose the merkle tree

* clean up some unused test helper functions

* linter

* still debugging the exact right number of bytes for max data...

* Implement spec-compliant share splitting (#246)

* Export block data compute shares.
* Refactor to use ShareSize constant directly.
* Change message splitting to prefix namespace ID.
* Implement chunking for contiguous.
* Add termination condition.
* Rename append contiguous to split contiguous.
* Update test for small tx.
* Add test for two contiguous.
* Make tx and msg adjusted share sizes exported constants.
* Panic on hopefully-unreachable condition instead of silently skipping.
* Update hardcoded response for block format.

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* fix overwrite bug (#251)

* fix overwrite bug and stop splitting shares of size MsgShareSize

* remove ineffectual code

* review feedback: better docs

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* remove uneeded copy and only fix the source of the bug

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* fix overwrite bug while also being consistent with using NamespacedShares

* update to the latest rsmt2d for the nmt wrapper

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* Spec compliant merge shares (#261)

* start spec compliant share merging

* refactor and finish unit testing

* whoops

* linter gods

* fix initial changes and use constants

* use constant

* more polish

* docs fix* review feedback: docs and out of range panic protection

* review feedback: add panic protection from empty input

* use constant instead of recalculating `ShareSize`* don't redeclare existing var* be more explicit with returned nil* use constant instead of recalculating `ShareSize`* review feedback: use consistent capitalization

* stop accepting reserved namespaces as normal messages

* use a descriptive var name for message length

* linter and comparison fix

* reorg tests, add test for parse delimiter, DataFromBlock and fix evidence marshal bug

* catch error for linter

* update test MakeShares to include length delimiters for the SHARE_RESERVED_BYTE

* minor iteration change

* refactor share splitting to fix bug

* fix all bugs with third and final refactor

* fix conflict

* revert unnecessary changes

* review feedback: better docs* reivew feedback: add comment for safeLen

* review feedback: remove unnecessay comments

* review feedback: split up share merging and splitting into their own files

* review feedback: more descriptive var names

* fix accidental change

* add some constant docs

* spelling error

Co-authored-by: Hlib Kanunnikov <hlibwondertab@gmail.com>
Co-authored-by: John Adler <adlerjohn@users.noreply.github.com>
Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* refactor to better accomodate real world use cases (celestia node)

Co-authored-by: rene <41963722+renaynay@users.noreply.github.com>

* thank you linter

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>
Co-authored-by: John Adler <adlerjohn@users.noreply.github.com>
Co-authored-by: Hlib Kanunnikov <hlibwondertab@gmail.com>
Co-authored-by: rene <41963722+renaynay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor DAH generation to give the user more control over what happens to the EDS
4 participants