Add hooks for storage backend #4

masomel · 2016-06-12T21:46:46Z

Right now, trees are kept in memory, and we're assuming that the user will set up her own storage backend and save the tree. To improve scalability, we should provide hooks for a few popular storage backend. We may also want to provide a mechanism for serializing trees into files on disk for this purpose.

vqhuy · 2016-06-15T22:11:41Z

As stated in Marcela's email:

the persistent storage has multiple purposes. For reliability in case of a server failure. For scalability, which solves the issue of multiple large trees not fitting into memory (eventually, I assume the server will be memory bound), and allows for multiple distributed servers as well. And possibly even for security.

My thought on this issue is I would like to implement a db interface (like the kv interface of coname) for supporting key-value backends (e.g., leveldb), as I did before. All the old trees would be stored in the db. There also has some predefined parameters (maybe from the tree construction arguments), e.g., the number of tree would be stored in the db, presumably older trees could be safely removed. Then we would provide a mechanism for looking up in these old trees by reconstructing the trees from db. This way, it also raises another question: how well it will scale if we store the tree entirely to the db?

What do you think? @masomel @arlolra @liamsi

masomel · 2016-06-16T19:48:37Z

My thought on this issue is I would like to implement a db interface (like the kv interface of coname) for supporting key-value backends (e.g., leveldb), as I did before.

I think this could work. Then the idea would be that the user would implement the interface for their specific key-value backend? Doesn't this conflict with the notion that we would provide a mechanism for reconstructing the tree from the KV store? The reason being that in order to reconstruct the tree from a KV store you would have to make certain assumptions about how the tree data is stored in it, or am I missing something?

This way, it also raises another question: how well it will scale if we store the tree entirely to the db?

You could consider storing the diffs of trees only to save space and not save redundant copies of data that hasn't changed across epochs. The reconstruction might be a bit trickier then because you have to reassign the right node pointers.

Another option that may save more space would be to only store the leaf nodes (and you could even go as far as only storing the changed leaf nodes). But then you have to reconstruct the entire tree from the leaves doing a bunch of insertions. The question then becomes: is it more efficient to do many pointer reassignments or many insert operations?

vqhuy · 2016-06-17T15:36:35Z

I think this could work. Then the idea would be that the user would implement the interface for their specific key-value backend? Doesn't this conflict with the notion that we would provide a mechanism for reconstructing the tree from the KV store? The reason being that in order to reconstruct the tree from a KV store you would have to make certain assumptions about how the tree data is stored in it, or am I missing something?

Actually, the data storing mechanism is implemented in the lib. The user would need to implement the db interface only (i.e., implement how to store a pair (key, value) to the db). Is it right?

Another option that may save more space would be to only store the leaf nodes (and you could even go as far as only storing the changed leaf nodes). But then you have to reconstruct the entire tree from the leaves doing a bunch of insertions. The question then becomes: is it more efficient to do many pointer reassignments or many insert operations?

This way looks mostly the same as how we create the tree currently [1], it might also be a bit easier to implement, compare to the first one. So I think I will do some benchmarks to compare these 2 approaches.

[1] https://github.com/coniks-sys/libmerkleprefixtree-go/blob/92f23eaf94d939cfd603d8431ef861f8d0c38931/README.md

masomel · 2016-06-20T15:36:34Z

Actually, the data storing mechanism is implemented in the lib. The user would need to implement the db interface only (i.e., implement how to store a pair (key, value) to the db). Is it right?

I understand that the data storing mechanism would be implemented in the lib. What doesn't make sense to me is that the user needs to implement how to store a kv-pair when the lib provides the storing mechanism. The fact that the lib specifies what/how the data is stored means that we have already made a decision for the user how the data will be stored, so what else does the user still have to do? Since we're already specifying the storing mechanism, wouldn't it make more sense to provide hooks for specific KV stores (i.e. LevelDB) and the user then only needs to pass in their specific db configuration?

This way looks mostly the same as how we create the tree currently [1], it might also be a bit easier to implement, compare to the first one. So I think I will do some benchmarks to compare these 2 approaches.

I would suggest starting with reconstructing from the leaves (the second method), since, like you said, this should be easier to implement. Once this is working, and if there is time, we can look into reconstructing from tree diffs to see if this method improves the performance of reconstructing from leaves in the db.

vqhuy · 2016-07-13T10:18:41Z

The backend storage is implemented in 15606c5d64fa15683c7ec0e2229ecfdf80128e09.
Below is detailed implementation:

In each update, the tree is entirely stored to the db. (this idea is from Coname authors)
The key of each item in the db is a byte array encoded as follows:
[NodeKeyIdentifier, epoch, len(prefixBits), index]

The developers now can construct the tree from db or do lookup in the STR which is stored in the persistent storage.

OpenMerkleTree loads the tree of a specific epoch from the db. The epoch can be read from the db as well (it's always the latest epoch).
ReconstructBranch reconstructs a part of tree of a specific epoch which is stored in db. The reconstructing path is controlled by the lookup index.

masomel · 2016-07-13T19:28:27Z

This is a good start! I'm going to make some specific comments in the commit, but here are some other comments:

In each update, the tree is entirely stored to the db. (this idea is from Coname authors)

The key of each item in the db is a byte array encoded as follows:
[NodeKeyIdentifier, epoch, len(prefixBits), index]

This seems fine with me since this lets us reconstruct individual paths from the DB.

OpenMerkleTree loads the tree of a specific epoch from the db.

I'm assuming the application for this function is when the server has been restarted and we want to initialize the in-memory PAD with an existing tree?

ReconstructBranch reconstructs a part of tree of a specific epoch which is stored in db. The reconstructing path is controlled by the lookup index.

Is the idea that this would be called whenever the developers needs to do a lookup from the db instead of the in-memory STR? I think it would be nice to abstract this away by calling ReconstructBranch directly in LookupInEpoch when the epoch is older than everything in loadedEpochs. What do you think?

masomel · 2016-07-13T19:38:40Z

In terms of code organization, I think we should separate the MerkleTree, node and PAD related DB hooks and tests into their own modules. I can give this a shot, if you'd like.

vqhuy · 2016-07-14T03:02:07Z

I'm assuming the application for this function is when the server has been restarted and we want to initialize the in-memory PAD with an existing tree?

That's right. This is what NewPADFromDB do right now.

I think it would be nice to abstract this away by calling ReconstructBranch directly in LookupInEpoch when the epoch is older than everything in loadedEpochs.

It's also right. It's should be called in GetSTR method. I'm working on it.

In terms of code organization, I think we should separate the MerkleTree, node and PAD related DB hooks and tests into their own modules. I can give this a shot, if you'd like.

Great!

masomel · 2016-07-14T23:31:31Z

@c633 What do you think of d3a2388?

In addition to refactoring, I also renamed flush() to storeToKV() everywhere. We can rename the KV-related modules (e.g. merkletreekv) if you don't like the naming scheme.

The other thing is tests: node_test and padkv_test are empty right now, but we will probably want to add test cases there in the future.

vqhuy · 2016-07-15T03:31:50Z

Yes. I understand that the reason why you separated these kv-related code to other source files because we could add other db engines (e.g., relational db) in the future, right? If so, I think it makes sense!

masomel · 2016-07-15T16:13:08Z

Yes, that's the main reason why I wanted to separate them. I'm glad you think it makes sense!

vqhuy · 2016-07-15T18:38:00Z

I merged the changes in db-reorg to db branch, and also deleted db-reorg branch.

masomel · 2016-07-15T18:38:43Z

You read my mind! I was just writing to ask if I should handle it. Thanks for taking care of it!

* Part of #4

vqhuy · 2016-10-06T11:39:06Z

TODO

DB hooks for merkletree package Key-value DB using gob for encoding/decoding #88
DB hooks for protocol package
DB hooks for TemporaryBinding - maybe after finishing Create a dedicated protocol/extensions directory #89
Integrate DB hooks into key server implementation

masomel added the storage-backend label Jun 12, 2016

vqhuy self-assigned this Jun 29, 2016

vqhuy mentioned this issue Jul 4, 2016

STR for review #8

Closed

vqhuy mentioned this issue Jul 22, 2016

Separate crypto subpackage into its own repo? #26

Closed

masomel mentioned this issue Jul 22, 2016

Implement registration function #29

Closed

vqhuy added a commit that referenced this issue Jul 26, 2016

Implement db storage

47f3999

* Part of #4

vqhuy pushed a commit that referenced this issue Jul 26, 2016

Reorganize KV-related code

48ecb95

* Part of #4

vqhuy mentioned this issue Jul 27, 2016

[WIP] DB backend hooks #37

Closed

arlolra added this to the 0.1.0 milestone Sep 2, 2016

vqhuy removed this from the 0.1.0 milestone Nov 2, 2016

vqhuy modified the milestones: 0.2.0, 0.1.0 Nov 2, 2016

vqhuy modified the milestones: 0.3.0, 0.2.0 Apr 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hooks for storage backend #4

Add hooks for storage backend #4

masomel commented Jun 12, 2016

vqhuy commented Jun 15, 2016

masomel commented Jun 16, 2016

vqhuy commented Jun 17, 2016

masomel commented Jun 20, 2016

vqhuy commented Jul 13, 2016

masomel commented Jul 13, 2016

masomel commented Jul 13, 2016 •

edited

Loading

vqhuy commented Jul 14, 2016 •

edited

Loading

masomel commented Jul 14, 2016 •

edited

Loading

vqhuy commented Jul 15, 2016

masomel commented Jul 15, 2016

vqhuy commented Jul 15, 2016

masomel commented Jul 15, 2016

vqhuy commented Oct 6, 2016 •

edited

Loading

Add hooks for storage backend #4

Add hooks for storage backend #4

Comments

masomel commented Jun 12, 2016

vqhuy commented Jun 15, 2016

masomel commented Jun 16, 2016

vqhuy commented Jun 17, 2016

masomel commented Jun 20, 2016

vqhuy commented Jul 13, 2016

masomel commented Jul 13, 2016

masomel commented Jul 13, 2016 • edited Loading

vqhuy commented Jul 14, 2016 • edited Loading

masomel commented Jul 14, 2016 • edited Loading

vqhuy commented Jul 15, 2016

masomel commented Jul 15, 2016

vqhuy commented Jul 15, 2016

masomel commented Jul 15, 2016

vqhuy commented Oct 6, 2016 • edited Loading

masomel commented Jul 13, 2016 •

edited

Loading

vqhuy commented Jul 14, 2016 •

edited

Loading

masomel commented Jul 14, 2016 •

edited

Loading

vqhuy commented Oct 6, 2016 •

edited

Loading