Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable the user to choose a concrete tree implementation #8 #9

Merged
merged 4 commits into from
Dec 15, 2020

Conversation

liamsi
Copy link
Member

@liamsi liamsi commented Oct 30, 2020

ref #8

Comment on lines +12 to +17
type Tree interface {
Push(data []byte)
// TODO(ismail): is this general enough?
Prove(idx int) (merkleRoot []byte, proofSet [][]byte, proofIndex uint64, numLeaves uint64)
Root() []byte
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge yet. Trying to distill a minimal interface. This one is basically nebolouslabs' API (and these are the methods used).

tree.go Outdated Show resolved Hide resolved
@liamsi liamsi changed the title first stab on making the user choose a concrete tree #8 enable the user to choose a concrete tree implementation #8 Oct 30, 2020
type Tree interface {
Push(data []byte)
// TODO(ismail): is this general enough?
Prove(idx int) (merkleRoot []byte, proofSet [][]byte, proofIndex uint64, numLeaves uint64)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liamsi
Copy link
Member Author

liamsi commented Nov 26, 2020

ref: evan-forbes@2ea46c3

@evan-forbes started an alternative approach which abstracts away the Merkle tree altogether in favour of a "Prover" - it combines the aggregation with generating the proof and the relevant functions take a func() Prover as a parameter. Hence a tree creator / factory becomes obsolete. Need to think a bit further about this and if there are any undesired side-effects but so far I like Evan's approach better as it would mean a single simple interface.

@evan-forbes
Copy link
Member

evan-forbes commented Dec 1, 2020

In my second attempt, with guidance from @liamsi, I've attempted to further isolate data erasure and commitment generation. Instead of configuring, generating, and keeping track of row and column roots using the dataSquare struct, this experiment delgates that functionality to the VectorCommitmentProver.

// VectorCommitmentProver describes the expected methods for issuing and proving
// vector commitments over portions of a data square
type VectorCommitmentProver interface {
	// Commitment provides the commitment root for the selected row or column.
	// Returns a commitment to a zeroed slice if the index is too high.
	Commitment(a Axis, idx uint) []byte
	// Prove issues a proof that can verify the inclusion of a selected row or column.
	Prove(a Axis, idx uint) (Proof, error)
}

The VectorCommitmentProver is meant to work with the following types. Proof was taken from this PR draft.

// Axis indicates to the VectorCommitmentProver whether to use columns or rows
type Axis bool

const (
	// Column indicates use of the x axis of a data square
	Column Axis = true
	// Row indicates use of the y axis of a data square
	Row Axis = false
)

// Proof describes the data needed to verify a rsmt2d compatible merkle tree
type Proof struct {
	Root   []byte
	Set    [][]byte
	Index  uint64
	Leaves uint64
}

Making VCP implementations will likely consist of wrapping an extended data square. For example, if we want to make a name spaced merkle tree version, we configure the types of nmts we wish to create, and define the expected methods (leaving Prove out for brevity)

// VCP uses configured name spaced merkle trees to adhere to the
// rsmt2d.VectorCommiterProver interface
type VCP struct {
	setters   []nmt.Option
	freshHash func() hash.Hash
	eds       *rsmt2d.ExtendedDataSquare
	opts      *nmt.Options
}

// Commitment returns the root of a selected row or column using a configured
// name space merkle tree. It also fullfills it's portion of the
// rsmt2d.VectorCommiterProver interface.
func (v VCP) Commitment(a rsmt2d.Axis, idx uint) []byte {
	// push all the data onto an nmt
	tree := New(v.freshHash(), v.setters...)
	// use data from the selected row or column of the extended data square 
	leaves := v.fetchLeaves(a, idx)
	// push data to the tree
	for _, leaf := range leaves {
		err := tree.Push(namespace.NewPrefixedData(tree.NamespaceSize(), leaf))
		if err != nil {
			panic(
				fmt.Sprintf("invalid data; could not push share to tree: %#v, err: %v",
					leaf,
					err,
				),
			)
		}
	}
	return tree.Root().Bytes()
}

It should be noted that this experimental version requires that the extended data square be prefixed with namespace.IDs. (which was done here)

At this point, we can define some quality of life functions that now work with all VCP implementations

// Commitments uses the provided VectorCommitmentProver to collect commitments
// from a selected axis
func Commitments(a Axis, vcp VectorCommitmentProver, width uint) [][]byte {
	commits := make([][]byte, width)
	for i := uint(0); i < width; i++ {
		commits[i] = vcp.Commitment(a, i)
	}
	return commits
}

and finally we can get a decent idea of how these changes would look like in lazyledger-core (current method).

// fillDataAvailabilityHeader fills in any remaining DataAvailabilityHeader fields
// that are a function of the block data.
func (b *Block) fillDataAvailabilityHeader() {
	namespacedShares := b.Data.computeShares()
	if len(namespacedShares) == 0 {
		// no shares -> no row/colum roots -> hash(empty)
		b.DataHash = b.DataAvailabilityHeader.Hash()
		return
	}
	rawData, names := namespacedShares.RawShares(), namespacedShares.IDs()
	// TODO(ismail): for better efficiency and a
	// larger number shares we should switch to the rsmt2d.LeopardFF16 codec:
	// erasure the data and assign appropriate namespace IDs after
	extendedDataSquare, err := rsmt2d.ComputeNamedExtendedDataSquare(rawData, names, rsmt2d.RSGF8)
	if err != nil {
		panic(fmt.Sprintf("unexpected error: %v", err))
	}
	// create an nmt VectorCommitmentProver and use the size of the first namespaced share
	vcp := nmt.NewVCP(extendedDataSquare, sha256.New, nmt.NamespaceIDSize(int(namespacedShares[0].ID.Size())))

	// build the data availability header
	width := extendedDataSquare.Width()
	colRoots := rsmt2d.Commitments(rsmt2d.Column, vcp, width)
	rowRoots := rsmt2d.Commitments(rsmt2d.Row, vcp, width)

	// set the dah roots
	b.DataAvailabilityHeader.ColumnRoots = NmtRootsFromBytes(colRoots)
	b.DataAvailabilityHeader.RowsRoots = NmtRootsFromBytes(rowRoots)
	//  create and set the root hash
	b.DataHash = merkle.HashFromByteSlices(append(rowRoots, colRoots...))
}

Overall, these experimental changes are purely organizational, and only achieve opinionated aesthetic changes... so I'm not sure they're really needed. Especially considering that, as far as I know, there are no plans to have multiple implementations of name spaced merkle trees. There's also a small amount of overhead with O(n) complexity introduced due to the conversion of namespace.IDs to []byte and then back to namespace.ID when passing data through the nmt VectorCommitmentProver implementation. If there's continued interest, I'd be more than happy to keep investigating, and see what can be done to more closely weave the nmt and rsmt2d packages into lazyledger-core.

Commits:
New interface:
rsmt2d: evan-forbes@02d7966
Experimental nmt VCP:
nmt: evan-forbes/nmt@1ca1d87
lazyledger-core: evan-forbes/totally-not-a-fork-of-celestia-core@03398e3

@musalbas
Copy link
Member

musalbas commented Dec 9, 2020

Thanks for this @evan-forbes. Requiring a wrapper around EDS seems like an overly complex pattern. Why not simply create an interface for a merkle tree, and allow the EDS to be initialised with any merkle tree object that implements that interface?

@evan-forbes
Copy link
Member

evan-forbes commented Dec 9, 2020

The VCP interfaces allows for more freedom in aggregating data for a proof or commitment, specifically for cached trees. The ExtendedDataSquare would have to assume that each implementation is not cached, so it would create a new tree and push data each time a proof needs to be generated, where VCP does not make that assumption.

You're right, though, if cached trees are not going to be useful or if they use a different mechanism entirely, then the added friction is not needed. If that's the case, passing a Tree interface to rsmt2d.ComputeExtendedDataSquare would probably be best.

@musalbas
Copy link
Member

musalbas commented Dec 9, 2020

I think the caching mechanism should probably be implemented on the EDS, rather in the tree library itself.

@evan-forbes
Copy link
Member

I can see that working well. It might be a tad confusing to the user, as any caching that requires access to the tree's internals, such as the caching that occurs in the current version of nmt, would obviously be separate from the type that would be occurring in the EDS. Then again, it might be desirable to keep both types of caching separate.

There's also still the option to keep what is currently written, and not include an interface for the time being. We might be risking a premature abstraction, as the entire use of rsmt2d and nmt in lazyledger-core is currently confined to the fillDataAvailabilityHeader method of block, and there is only a single nmt implementation to use.

@liamsi
Copy link
Member Author

liamsi commented Dec 14, 2020

Very good discussion! I like the idea of generalizing this to Evan's proposed VCP interface but it's another very good point that @evan-forbes brought up himself: this might indeed be a premature absraction. For now, I just removed the other single-function interface the TreeCreator tentatively to be renamed sth-sth-factory (and kept the tree interface).
It's now a function that returns a Tree. While it does the same thing, I did not really feel comfortable introducing another interface.

LMK what you think. If the current approach seems OK, I'd merge this and let's continue (and link) above's valuable discussion in a separate issue.

@liamsi liamsi requested a review from musalbas December 14, 2020 19:43
New() Tree
}

var _ TreeCreator = &defaultTreeCreator{}
var _ Tree = &DefaultTree{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would developers use their own tree implementation for the library? Create a struct that implements Tree, then supply a function that creates a new tree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a struct that implements Tree, then supply a function that creates a new tree?

Yes, exactly.

tree.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants