enable the user to choose a concrete tree implementation #8 #9

liamsi · 2020-10-30T15:35:24Z

ref #8

liamsi · 2020-10-30T15:44:52Z

tree.go

+type Tree interface {
+	Push(data []byte)
+	// TODO(ismail): is this general enough?
+	Prove(idx int) (merkleRoot []byte, proofSet [][]byte, proofIndex uint64, numLeaves uint64)
+	Root() []byte
+}


Please don't merge yet. Trying to distill a minimal interface. This one is basically nebolouslabs' API (and these are the methods used).

tree.go

liamsi · 2020-10-30T21:02:26Z

tree.go

+type Tree interface {
+	Push(data []byte)
+	// TODO(ismail): is this general enough?
+	Prove(idx int) (merkleRoot []byte, proofSet [][]byte, proofIndex uint64, numLeaves uint64)


Compare: https://github.com/lazyledger/nmt/blob/1a5d9124ad10e40d0d743d7695e1983992089a2f/nmt.go#L110
https://github.com/lazyledger/nmt/blob/1a5d9124ad10e40d0d743d7695e1983992089a2f/proof.go#L19

liamsi · 2020-11-26T13:56:56Z

ref: evan-forbes@2ea46c3

@evan-forbes started an alternative approach which abstracts away the Merkle tree altogether in favour of a "Prover" - it combines the aggregation with generating the proof and the relevant functions take a func() Prover as a parameter. Hence a tree creator / factory becomes obsolete. Need to think a bit further about this and if there are any undesired side-effects but so far I like Evan's approach better as it would mean a single simple interface.

evan-forbes · 2020-12-01T20:38:47Z

In my second attempt, with guidance from @liamsi, I've attempted to further isolate data erasure and commitment generation. Instead of configuring, generating, and keeping track of row and column roots using the dataSquare struct, this experiment delgates that functionality to the VectorCommitmentProver.

// VectorCommitmentProver describes the expected methods for issuing and proving
// vector commitments over portions of a data square
type VectorCommitmentProver interface {
	// Commitment provides the commitment root for the selected row or column.
	// Returns a commitment to a zeroed slice if the index is too high.
	Commitment(a Axis, idx uint) []byte
	// Prove issues a proof that can verify the inclusion of a selected row or column.
	Prove(a Axis, idx uint) (Proof, error)
}

The VectorCommitmentProver is meant to work with the following types. Proof was taken from this PR draft.

// Axis indicates to the VectorCommitmentProver whether to use columns or rows
type Axis bool

const (
	// Column indicates use of the x axis of a data square
	Column Axis = true
	// Row indicates use of the y axis of a data square
	Row Axis = false
)

// Proof describes the data needed to verify a rsmt2d compatible merkle tree
type Proof struct {
	Root   []byte
	Set    [][]byte
	Index  uint64
	Leaves uint64
}

Making VCP implementations will likely consist of wrapping an extended data square. For example, if we want to make a name spaced merkle tree version, we configure the types of nmts we wish to create, and define the expected methods (leaving Prove out for brevity)

// VCP uses configured name spaced merkle trees to adhere to the
// rsmt2d.VectorCommiterProver interface
type VCP struct {
	setters   []nmt.Option
	freshHash func() hash.Hash
	eds       *rsmt2d.ExtendedDataSquare
	opts      *nmt.Options
}

// Commitment returns the root of a selected row or column using a configured
// name space merkle tree. It also fullfills it's portion of the
// rsmt2d.VectorCommiterProver interface.
func (v VCP) Commitment(a rsmt2d.Axis, idx uint) []byte {
	// push all the data onto an nmt
	tree := New(v.freshHash(), v.setters...)
	// use data from the selected row or column of the extended data square 
	leaves := v.fetchLeaves(a, idx)
	// push data to the tree
	for _, leaf := range leaves {
		err := tree.Push(namespace.NewPrefixedData(tree.NamespaceSize(), leaf))
		if err != nil {
			panic(
				fmt.Sprintf("invalid data; could not push share to tree: %#v, err: %v",
					leaf,
					err,
				),
			)
		}
	}
	return tree.Root().Bytes()
}

It should be noted that this experimental version requires that the extended data square be prefixed with namespace.IDs. (which was done here)

At this point, we can define some quality of life functions that now work with all VCP implementations

// Commitments uses the provided VectorCommitmentProver to collect commitments
// from a selected axis
func Commitments(a Axis, vcp VectorCommitmentProver, width uint) [][]byte {
	commits := make([][]byte, width)
	for i := uint(0); i < width; i++ {
		commits[i] = vcp.Commitment(a, i)
	}
	return commits
}

and finally we can get a decent idea of how these changes would look like in lazyledger-core (current method).

// fillDataAvailabilityHeader fills in any remaining DataAvailabilityHeader fields
// that are a function of the block data.
func (b *Block) fillDataAvailabilityHeader() {
	namespacedShares := b.Data.computeShares()
	if len(namespacedShares) == 0 {
		// no shares -> no row/colum roots -> hash(empty)
		b.DataHash = b.DataAvailabilityHeader.Hash()
		return
	}
	rawData, names := namespacedShares.RawShares(), namespacedShares.IDs()
	// TODO(ismail): for better efficiency and a
	// larger number shares we should switch to the rsmt2d.LeopardFF16 codec:
	// erasure the data and assign appropriate namespace IDs after
	extendedDataSquare, err := rsmt2d.ComputeNamedExtendedDataSquare(rawData, names, rsmt2d.RSGF8)
	if err != nil {
		panic(fmt.Sprintf("unexpected error: %v", err))
	}
	// create an nmt VectorCommitmentProver and use the size of the first namespaced share
	vcp := nmt.NewVCP(extendedDataSquare, sha256.New, nmt.NamespaceIDSize(int(namespacedShares[0].ID.Size())))

	// build the data availability header
	width := extendedDataSquare.Width()
	colRoots := rsmt2d.Commitments(rsmt2d.Column, vcp, width)
	rowRoots := rsmt2d.Commitments(rsmt2d.Row, vcp, width)

	// set the dah roots
	b.DataAvailabilityHeader.ColumnRoots = NmtRootsFromBytes(colRoots)
	b.DataAvailabilityHeader.RowsRoots = NmtRootsFromBytes(rowRoots)
	//  create and set the root hash
	b.DataHash = merkle.HashFromByteSlices(append(rowRoots, colRoots...))
}

Overall, these experimental changes are purely organizational, and only achieve opinionated aesthetic changes... so I'm not sure they're really needed. Especially considering that, as far as I know, there are no plans to have multiple implementations of name spaced merkle trees. There's also a small amount of overhead with O(n) complexity introduced due to the conversion of namespace.IDs to []byte and then back to namespace.ID when passing data through the nmt VectorCommitmentProver implementation. If there's continued interest, I'd be more than happy to keep investigating, and see what can be done to more closely weave the nmt and rsmt2d packages into lazyledger-core.

Commits:
New interface:
rsmt2d: evan-forbes@02d7966
Experimental nmt VCP:
nmt: evan-forbes/nmt@1ca1d87
lazyledger-core: evan-forbes/totally-not-a-fork-of-celestia-core@03398e3

musalbas · 2020-12-09T13:58:47Z

Thanks for this @evan-forbes. Requiring a wrapper around EDS seems like an overly complex pattern. Why not simply create an interface for a merkle tree, and allow the EDS to be initialised with any merkle tree object that implements that interface?

evan-forbes · 2020-12-09T20:44:03Z

The VCP interfaces allows for more freedom in aggregating data for a proof or commitment, specifically for cached trees. The ExtendedDataSquare would have to assume that each implementation is not cached, so it would create a new tree and push data each time a proof needs to be generated, where VCP does not make that assumption.

You're right, though, if cached trees are not going to be useful or if they use a different mechanism entirely, then the added friction is not needed. If that's the case, passing a Tree interface to rsmt2d.ComputeExtendedDataSquare would probably be best.

musalbas · 2020-12-09T21:24:27Z

I think the caching mechanism should probably be implemented on the EDS, rather in the tree library itself.

evan-forbes · 2020-12-10T03:16:45Z

I can see that working well. It might be a tad confusing to the user, as any caching that requires access to the tree's internals, such as the caching that occurs in the current version of nmt, would obviously be separate from the type that would be occurring in the EDS. Then again, it might be desirable to keep both types of caching separate.

There's also still the option to keep what is currently written, and not include an interface for the time being. We might be risking a premature abstraction, as the entire use of rsmt2d and nmt in lazyledger-core is currently confined to the fillDataAvailabilityHeader method of block, and there is only a single nmt implementation to use.

liamsi · 2020-12-14T19:39:49Z

Very good discussion! I like the idea of generalizing this to Evan's proposed VCP interface but it's another very good point that @evan-forbes brought up himself: this might indeed be a premature absraction. For now, I just removed the other single-function interface the TreeCreator tentatively to be renamed sth-sth-factory (and kept the tree interface).
It's now a function that returns a Tree. While it does the same thing, I did not really feel comfortable introducing another interface.

LMK what you think. If the current approach seems OK, I'd merge this and let's continue (and link) above's valuable discussion in a separate issue.

musalbas · 2020-12-15T10:18:06Z

tree.go

-	New() Tree
-}
-
-var _ TreeCreator = &defaultTreeCreator{}
 var _ Tree = &DefaultTree{}


How would developers use their own tree implementation for the library? Create a struct that implements Tree, then supply a function that creates a new tree?

Create a struct that implements Tree, then supply a function that creates a new tree?

Yes, exactly.

tree.go

first stab on making the user choose a concrete tree #8

54642a7

liamsi commented Oct 30, 2020

View reviewed changes

musalbas reviewed Oct 30, 2020

View reviewed changes

tree.go Outdated Show resolved Hide resolved

liamsi changed the title ~~first stab on making the user choose a concrete tree #8~~ enable the user to choose a concrete tree implementation #8 Oct 30, 2020

liamsi commented Oct 30, 2020

View reviewed changes

Remove TreeCreator interface in favour of a simple constructor function

51a97d9

liamsi requested a review from musalbas December 14, 2020 19:43

update to ll-fork for merkletree

34ac38d

musalbas approved these changes Dec 15, 2020

View reviewed changes

Rename TreeFn to TreeConstructorFn

589b0b3

liamsi marked this pull request as ready for review December 15, 2020 20:28

liamsi merged commit e5ec791 into master Dec 15, 2020

liamsi deleted the ismail/pluggable_tree branch December 15, 2020 20:31

This was referenced Dec 15, 2020

Allow user to choose underlying Merkle tree implementation #8

Closed

Generalize & improve underlying interfaces #10

Open

evan-forbes mentioned this pull request Mar 26, 2021

Include namespaces in erasured data celestiaorg/celestia-core#230

Closed

evan-forbes mentioned this pull request May 20, 2021

Remove flattenNamespacedEDS celestiaorg/celestia-core#338

Merged

evan-forbes mentioned this pull request Aug 16, 2022

fix: use wrapped nmt when generating commitments celestiaorg/celestia-app#618

Merged

evan-forbes mentioned this pull request Jun 27, 2023

feat: create foundation for fraudulent block production celestiaorg/celestia-app#1992

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable the user to choose a concrete tree implementation #8 #9

enable the user to choose a concrete tree implementation #8 #9

liamsi commented Oct 30, 2020

liamsi Oct 30, 2020

liamsi Oct 30, 2020

liamsi commented Nov 26, 2020 •

edited

Loading

evan-forbes commented Dec 1, 2020 •

edited

Loading

musalbas commented Dec 9, 2020

evan-forbes commented Dec 9, 2020 •

edited

Loading

musalbas commented Dec 9, 2020

evan-forbes commented Dec 10, 2020

liamsi commented Dec 14, 2020 •

edited

Loading

musalbas Dec 15, 2020

liamsi Dec 15, 2020

enable the user to choose a concrete tree implementation #8 #9

enable the user to choose a concrete tree implementation #8 #9

Conversation

liamsi commented Oct 30, 2020

liamsi Oct 30, 2020

Choose a reason for hiding this comment

liamsi Oct 30, 2020

Choose a reason for hiding this comment

liamsi commented Nov 26, 2020 • edited Loading

evan-forbes commented Dec 1, 2020 • edited Loading

musalbas commented Dec 9, 2020

evan-forbes commented Dec 9, 2020 • edited Loading

musalbas commented Dec 9, 2020

evan-forbes commented Dec 10, 2020

liamsi commented Dec 14, 2020 • edited Loading

musalbas Dec 15, 2020

Choose a reason for hiding this comment

liamsi Dec 15, 2020

Choose a reason for hiding this comment

liamsi commented Nov 26, 2020 •

edited

Loading

evan-forbes commented Dec 1, 2020 •

edited

Loading

evan-forbes commented Dec 9, 2020 •

edited

Loading

liamsi commented Dec 14, 2020 •

edited

Loading