Merkle Patricia Trie #10

AlanVerbner · 2017-01-11T13:49:42Z

Description

This branch has the MPT implementation. It also has a DataSource implementation based on IODB.

We tried to have as most coverage as we could for this code because of it's critical nature so please keep special attention at it while reviewing.

Design choices

There is no any defined hash function to be used, it can be configured. The only restriction there is that it should match the iodb's LSMStore keySize config.
Ideally MerklePatriciaTrie class could be immutable but as long as it needs to access a mutable DataSource it's also mutable.
In order to be used with key/value pairs of different types we have created a ByteArraySerializable that will be responsible for their byte array encoding. There is no specific guideline regarding how that should be done as long as the merkle root returns the same hash

…ation of nodes

…ient into feature/patriciaTrie

…sh of empty trie

…ctly access them

…lasses

…tations

…-client into feature/patriciaTrie

… was rejecting the uptate

…eFromStorage lists in put

…the node list on memory

…ciaTree

…-client into feature/patriciaTrie

… dataSource

…ction used

adamsmo · 2017-01-16T17:44:41Z

src/main/scala/io/iohk/ethereum/merklepatriciatrie/MerklePatriciaTrie.scala

+        case (Nil, Some(value)) => LeafNode(Array.emptyByteArray, value, hashFn)
+        case _ => node
+      }
+    case extensionNode@ExtensionNode(sharedKey, next, _) =>


next is unused

rtkaczyk · 2017-01-16T17:47:14Z

src/main/scala/io/iohk/ethereum/merklepatriciatrie/MerklePatriciaTrie.scala

+                                    toDeleteFromStorage: Seq[Node] = Seq(),
+                                    toUpdateInStorage: Seq[Node] = Seq())
+
+private case class NodeRemoveResult(hasChanged: Boolean, maybeNewChild: Option[Node],


NodeInsertResult and NodeRemoveResult are use internally in MerklePatriciaTree, right? So maybe it would better to place them in the companion object?

Yes, they are only used for the output of the remove and put functions. I agree, I'm currently fixing it.

rtkaczyk · 2017-01-16T17:48:15Z

src/main/scala/io/iohk/ethereum/merklepatriciatrie/package.scala

+    def fromBytes(bytes: Array[Byte]): T
+  }
+
+  trait DataSource {


Why are those traits in a package object?

@rtkaczyk Because they are part of the MPT public API

Yes, we decided to put them there as they should be commonly implemented in order to use the trie (if any our implementations of them are not used). Do you think it would be better to put them elsewhere?

Sure, but why package object? Package objects are usually used for things that cannot be placed at a top-level of a package, like vals and defs. But traits can :)

This was proposed by @mcsherrylabs

…oveResult to companion object

rtkaczyk · 2017-01-16T18:10:53Z

src/test/scala/io/iohk/ethereum/ObjectGenerators.scala

@@ -1,7 +1,7 @@
 package io.iohk.ethereum

 import java.math.BigInteger
-import org.scalacheck.{Arbitrary, Gen}
+import org.scalacheck.{Arbitrary, Gen, _}


This is equivalent to import org.scalacheck._. Isn't it?

You are right. As we are only using org.scalacheck.{Arbitrary, Gen} I'll change it to that.

…-client into feature/patriciaTrie

adamsmo · 2017-01-16T18:28:33Z

src/main/scala/io/iohk/ethereum/merklepatriciatrie/MerklePatriciaTrie.scala

+      val keyNibbles = HexPrefix.bytesToNibbles(bytes = kSerializer.toBytes(key))
+      val root = getNode(rootId, dataSource)
+      remove(root, keyNibbles) match {
+        case NodeRemoveResult(true, Some(newRoot), nodesToRemoveFromStorage, nodesToUpdateInStorage) =>


about second parameter: it is called here newRoot but in case class this field is named maybeNewChild
https://github.com/input-output-hk/etc-client/pull/10/files#diff-2c1aab8e8ca156fbe430bbb586ecfc01R17
should it be that way?

In NodeInsertResult and NodeRemoveResult we named it that way as it is more general, the maybeNewChild could be the an inner node of the trie, as is the case in the uses of both case classes in the put and remove function. Maybe we should rename maybeNewChild as maybeNewNode as the use of a the child word can be mixed with it meaning in the BranchNode.

maybe just newNode, as it is in NodeInsertResult?
I think maybe part is implied by Option type
What do you think?

I agree, I'll change the variable's name to newNode.

…io.iohk.ethereum.mpt

…-client into feature/patriciaTrie

…/etc-client into feature/patriciaTrie

…-client into feature/patriciaTrie

rtkaczyk · 2017-01-17T12:56:48Z

~~LGTM!~~ The tests

rtkaczyk · 2017-01-17T15:11:50Z

src/test/scala/io/iohk/ethereum/mpt/MerklePatriciaTrieSuite.scala

+    assert(obtainedAfterDelete.isEmpty)
+  }
+
+  ignore("IODB test - Insert of the first 5000 numbers hashed and then remove half of them"){


Why are these tests ignored?

We set some of the IODB and EthereumJ Compatibility tests to be ignored so that they are not run every time, as they take a while to run.

I have several remarks about those tests:

If we're ignoring tests in such manner it deserves at least a comment. Otherwise an unknowing developer might come at those tests and wonder what he has to do to fix them.

For such long running tests I think a better solution would be to create them in a separate test configuration. Then in Circle CI we could easily run them conditionally with sbt long-running:test, e.g. only when merging to master.
Another reason for a separate config is that, unless I'm missing something, they don't look like regular unit tests. Are they stress tests? Does running them with 40000 numbers give us a big boost of confidence wrt correctness as compared to running with 4 numbers?
Lastly, if we don't run those tests automatically how are we going maintain them? Are we going to nominate a person who's going to run them occasionally to see if they're not broken? I certainly do not volunteer 😁

I ran one of those tests: "IODB Test - PatriciaTrie insert and get" and it takes ~600 ms. I wouldn't say that's terribly slow.

Could we test IodbDataSource in a separate suite and without dependency on MerklePatriciaTrie? It is my understanding MPT depends on a DataSource, not vice versa. More importantly though, what if we find out that IODB doesn't meet our expectations? We shouldn't have to change MPT tests when we replace the DataSource implementation.

@rtkaczyk

I do agree it's confusing. That being said, I don't like (at least in general terms) to push a broken test marked as ignore. All tests should be working or get fixed.

Lol. yes, we can create a separate config that's a good idea. The goal of some of them is to maintain certain level of compatibility against other implementations and allow others to compare our client against ethereum wiki ones.

No it's not too much atm, we can uningnore it

Yes we can, but, based on item (2) there is a benchmark test that requires a non ephemeral trie so it does make sense to use them there.

Good, then I would propose to name the configuration benchmark, and later we can configure Circle to run it when merging to master or by using nightly builds.

Regarding IodbDataSource it's fine if the tests utilise MPT when advantageous, especially in benchmark. However, a separate IodbDataSourceSuite testing get/update in isolation could be useful, even if it's really simple, wdyt?

90 sec does not looks scary to me ;)

90 sec does not looks scary to me ;)

So I'll have to wait 90 secs every time I save a file i a project ;)

@AlanVerbner let's use custom configs. Imho it's most convenient approach from user's point of view. Think of running sbt it:test in SBT console instead of dealing with Tests.Argument("-l", "LongRunningTest") settings.

@whysoserious deal. Will create a custom config called benchmark (as @rtkaczyk suggested) and will move this tests over there.

90 sec does not looks scary to me ;)

It may be scary once we accumulate more of such tests. I wouldn't want to wait over an hour for a PR to go green.

in order to finish this PR CircleCI will have to successfully run an additional step - sbt it:test

@whysoserious it looks that we need a proper circle.yml file first (I thought Circle was smart enough to pick up travis.yml but apparently it's just using some defaults for Scala). So I'd say that's a follow-up PR.

Anyway if those tests currently take just a few minutes then it's OK to require them for every PR, but ultimately I'd like that we run them conditionally: either when merging to master or during nightly builds.

…/etc-client into feature/patriciaTrie

…artitioning [MPT] Test partitioning in order not to ignore long running tests

rtkaczyk · 2017-01-19T23:06:19Z

One final remarks about the tests is that to putting those tests it config is not 100% accurate because they're not integration tests. I can imagine a scenario where we build a test net and test our client as a blackbox - that would constitute a true integration test (though using Hive is probably another valid option).

Anyway, for now I'm glad we have a separation of non-unit tests. Once we have more such tests and we can clearly categorise them we can think about further separation.

rtkaczyk · 2017-01-19T23:06:59Z

LGTM 👍

@adamsmo wanna give a second approval?

adamsmo · 2017-01-20T09:36:50Z

LGTM as well 👍

AlanVerbner · 2017-01-20T12:21:43Z

@rtkaczyk I definetly agree with your comment but what I do like most is

Once we have more such tests and we can clearly categorise them we can think about further separation.

ntallar and others added 30 commits January 2, 2017 14:48

[MPT] Merkle Patricia Trie first version

6437e20

Merge branch 'phase/0/handshake' into feature/patriciaTrie

b5c262c

[MPT] Fix - Each node isn't encoded twice anymore & fixed bugs on cre…

9009233

…ation of nodes

Merge branch 'phase/0/handshake' of github.com:input-output-hk/etc-cl…

3ee73ca

…ient into feature/patriciaTrie

[MPT] Refactoring due to changes in rlp package structure

d7c6c67

[MPT] Refactoring: changed package name

aa5fcd1

[MPT] Refactoring: changed visibility of methods and started using ha…

ea3c48f

…sh of empty trie

[MPT] Private get now operates with nodes and not node ids

226d9a1

[MPT] getNode returns node and throws MPTException if node was not found

b659e54

[MPT] Removed functions to get id of nodes inside other nodes to dire…

a9216e2

…ctly access them

[MPT] More EthereumJ compatibility functions

5355ae0

[MPT] Transformed MerklePatriciaTree and IodbDataSource into normal c…

01c665b

…lasses

[MPT] Removed unused functions from DataSource trait and its implemen…

4f5562b

…tations

[MPT] Test reorganization and added performance test

3f8eeac

[MPT] Store changes in db in a single update

4fab49e

Merge branch 'feature/patriciaTrie' of github.com:input-output-hk/etc…

d24e76b

…-client into feature/patriciaTrie

[MPT] It was trying to insert the same values multiple times and IODB…

f1a20d6

… was rejecting the uptate

[MPT] Fixed how nodes are inserted into toUpdateInStorage and toDelet…

99330e3

…eFromStorage lists in put

[MPT] When next node is required on fix function, we first search on …

b8d2af9

…the node list on memory

[MPT] Fixed scalastyle warnings

5f9f795

[MPT] Version is now stored in IodbDataSource instread of MerklePatri…

7fdde94

…ciaTree

Merge branch 'feature/patriciaTrie' of github.com:input-output-hk/etc…

0f2c8e0

…-client into feature/patriciaTrie

[MPT] Refactoring: removed unused DataSource function and added comments

45520b7

[MPT] Renamed trie package, class and object

2ac4c75

[MPT] Renamed MPT Suite

1d8d248

[MPT] Fixed bug that caused previous trie root to not be removed from…

af743d7

… dataSource

[MPT] Removed unused imports

cb1ebe3

[MPT] Set long tests to ignore and added FIXME regarding the hash fun…

63ee543

…ction used

Small example changes around map

358967f

Test email address correction

acdb09e

adamsmo reviewed Jan 16, 2017

View reviewed changes

rtkaczyk reviewed Jan 16, 2017

View reviewed changes

[MPT] Removed unused variables and moved NodeInsertResult and NodeRem…

c863cce

…oveResult to companion object

rtkaczyk reviewed Jan 16, 2017

View reviewed changes

AlanVerbner and others added 4 commits January 16, 2017 15:12

[MPT] hashFn removed from fix parameters

417792f

[MPT] hashFn used from TrieInstance when fixing instead of node one

62f6f3b

[MPT] Removed unused library from ObjectGenerators

2bc5800

Merge branch 'feature/patriciaTrie' of github.com:input-output-hk/etc…

80cda19

…-client into feature/patriciaTrie

adamsmo reviewed Jan 16, 2017

View reviewed changes

AlanVerbner and others added 5 commits January 16, 2017 16:06

[MPT] package io.iohk.ethereum.merklepatriciatrie renamed to package …

c50bf5b

…io.iohk.ethereum.mpt

Merge branch 'feature/patriciaTrie' of github.com:input-output-hk/etc…

5ad56b5

…-client into feature/patriciaTrie

Merge branch 'phase/2/txHashValidation' of github.com:input-output-hk…

c848180

…/etc-client into feature/patriciaTrie

[MPT] Renamed NodeRemoveResult class variable to newNode

88d1524

Merge branch 'feature/patriciaTrie' of github.com:input-output-hk/etc…

4a78a5b

…-client into feature/patriciaTrie

rtkaczyk reviewed Jan 17, 2017

View reviewed changes

Nicolas Tallar and others added 7 commits January 18, 2017 08:58

Merge branch 'phase/2/txHashValidation' of github.com:input-output-hk…

a8bf783

…/etc-client into feature/patriciaTrie

[MPT] Fixed scalastyle errors

d1099ec

[MPT] Test partitioning in order not to ignore long running tests

f0f4a5b

Merge pull request #18 from input-output-hk/feature/patriciaTrieTestP…

601b2de

…artitioning [MPT] Test partitioning in order not to ignore long running tests

[MPT] Scalastyle fix in EphemDataSource

cb120e6

[MPT] EphemDataSource and IodbDataSource tests

3a1e46b

[MPT] Remove unused test code

568b7f8

rtkaczyk merged commit 576ed02 into phase/2/txHashValidation Jan 20, 2017

rtkaczyk deleted the feature/patriciaTrie branch January 20, 2017 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merkle Patricia Trie #10

Merkle Patricia Trie #10

AlanVerbner commented Jan 11, 2017

adamsmo Jan 16, 2017

ntallar Jan 16, 2017

rtkaczyk Jan 16, 2017

ntallar Jan 16, 2017

rtkaczyk Jan 16, 2017

AlanVerbner Jan 16, 2017

ntallar Jan 16, 2017

rtkaczyk Jan 16, 2017

AlanVerbner Jan 16, 2017 •

edited

Loading

rtkaczyk Jan 16, 2017

ntallar Jan 16, 2017

adamsmo Jan 16, 2017 •

edited

Loading

ntallar Jan 16, 2017

adamsmo Jan 16, 2017 •

edited

Loading

ntallar Jan 16, 2017

rtkaczyk commented Jan 17, 2017 •

edited

Loading

rtkaczyk Jan 17, 2017

ntallar Jan 17, 2017

rtkaczyk Jan 17, 2017 •

edited

Loading

AlanVerbner Jan 17, 2017

rtkaczyk Jan 17, 2017

adamsmo Jan 18, 2017

whysoserious Jan 18, 2017

whysoserious Jan 18, 2017

AlanVerbner Jan 18, 2017 •

edited

Loading

rtkaczyk Jan 18, 2017

rtkaczyk commented Jan 19, 2017

rtkaczyk commented Jan 19, 2017

adamsmo commented Jan 20, 2017

AlanVerbner commented Jan 20, 2017

Merkle Patricia Trie #10

Merkle Patricia Trie #10

Conversation

AlanVerbner commented Jan 11, 2017

Description

Design choices

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlanVerbner Jan 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsmo Jan 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsmo Jan 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rtkaczyk commented Jan 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rtkaczyk Jan 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlanVerbner Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rtkaczyk commented Jan 19, 2017

rtkaczyk commented Jan 19, 2017

adamsmo commented Jan 20, 2017

AlanVerbner commented Jan 20, 2017

AlanVerbner Jan 16, 2017 •

edited

Loading

adamsmo Jan 16, 2017 •

edited

Loading

adamsmo Jan 16, 2017 •

edited

Loading

rtkaczyk commented Jan 17, 2017 •

edited

Loading

rtkaczyk Jan 17, 2017 •

edited

Loading

AlanVerbner Jan 18, 2017 •

edited

Loading