Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle Patricia Trie #10

Merged
merged 63 commits into from
Jan 20, 2017
Merged

Conversation

AlanVerbner
Copy link
Contributor

Description

This branch has the MPT implementation. It also has a DataSource implementation based on IODB.

We tried to have as most coverage as we could for this code because of it's critical nature so please keep special attention at it while reviewing.

Design choices

  • There is no any defined hash function to be used, it can be configured. The only restriction there is that it should match the iodb's LSMStore keySize config.
  • Ideally MerklePatriciaTrie class could be immutable but as long as it needs to access a mutable DataSource it's also mutable.
  • In order to be used with key/value pairs of different types we have created a ByteArraySerializable that will be responsible for their byte array encoding. There is no specific guideline regarding how that should be done as long as the merkle root returns the same hash

ntallar and others added 30 commits January 2, 2017 14:48
case (Nil, Some(value)) => LeafNode(Array.emptyByteArray, value, hashFn)
case _ => node
}
case extensionNode@ExtensionNode(sharedKey, next, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next is unused

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

toDeleteFromStorage: Seq[Node] = Seq(),
toUpdateInStorage: Seq[Node] = Seq())

private case class NodeRemoveResult(hasChanged: Boolean, maybeNewChild: Option[Node],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeInsertResult and NodeRemoveResult are use internally in MerklePatriciaTree, right? So maybe it would better to place them in the companion object?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are only used for the output of the remove and put functions. I agree, I'm currently fixing it.

def fromBytes(bytes: Array[Byte]): T
}

trait DataSource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those traits in a package object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtkaczyk Because they are part of the MPT public API

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we decided to put them there as they should be commonly implemented in order to use the trie (if any our implementations of them are not used). Do you think it would be better to put them elsewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but why package object? Package objects are usually used for things that cannot be placed at a top-level of a package, like vals and defs. But traits can :)

Copy link
Contributor Author

@AlanVerbner AlanVerbner Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was proposed by @mcsherrylabs

@@ -1,7 +1,7 @@
package io.iohk.ethereum

import java.math.BigInteger
import org.scalacheck.{Arbitrary, Gen}
import org.scalacheck.{Arbitrary, Gen, _}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to import org.scalacheck._. Isn't it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. As we are only using org.scalacheck.{Arbitrary, Gen} I'll change it to that.

val keyNibbles = HexPrefix.bytesToNibbles(bytes = kSerializer.toBytes(key))
val root = getNode(rootId, dataSource)
remove(root, keyNibbles) match {
case NodeRemoveResult(true, Some(newRoot), nodesToRemoveFromStorage, nodesToUpdateInStorage) =>
Copy link
Contributor

@adamsmo adamsmo Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about second parameter: it is called here newRoot but in case class this field is named maybeNewChild
https://github.com/input-output-hk/etc-client/pull/10/files#diff-2c1aab8e8ca156fbe430bbb586ecfc01R17
should it be that way?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In NodeInsertResult and NodeRemoveResult we named it that way as it is more general, the maybeNewChild could be the an inner node of the trie, as is the case in the uses of both case classes in the put and remove function. Maybe we should rename maybeNewChild as maybeNewNode as the use of a the child word can be mixed with it meaning in the BranchNode.

Copy link
Contributor

@adamsmo adamsmo Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just newNode, as it is in NodeInsertResult?
I think maybe part is implied by Option type
What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'll change the variable's name to newNode.

@rtkaczyk
Copy link
Contributor

rtkaczyk commented Jan 17, 2017

LGTM! The tests

assert(obtainedAfterDelete.isEmpty)
}

ignore("IODB test - Insert of the first 5000 numbers hashed and then remove half of them"){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these tests ignored?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set some of the IODB and EthereumJ Compatibility tests to be ignored so that they are not run every time, as they take a while to run.

Copy link
Contributor

@rtkaczyk rtkaczyk Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several remarks about those tests:

  1. If we're ignoring tests in such manner it deserves at least a comment. Otherwise an unknowing developer might come at those tests and wonder what he has to do to fix them.
  2. For such long running tests I think a better solution would be to create them in a separate test configuration. Then in Circle CI we could easily run them conditionally with sbt long-running:test, e.g. only when merging to master.
    Another reason for a separate config is that, unless I'm missing something, they don't look like regular unit tests. Are they stress tests? Does running them with 40000 numbers give us a big boost of confidence wrt correctness as compared to running with 4 numbers?
    Lastly, if we don't run those tests automatically how are we going maintain them? Are we going to nominate a person who's going to run them occasionally to see if they're not broken? I certainly do not volunteer 😁
  3. I ran one of those tests: "IODB Test - PatriciaTrie insert and get" and it takes ~600 ms. I wouldn't say that's terribly slow.
  4. Could we test IodbDataSource in a separate suite and without dependency on MerklePatriciaTrie? It is my understanding MPT depends on a DataSource, not vice versa. More importantly though, what if we find out that IODB doesn't meet our expectations? We shouldn't have to change MPT tests when we replace the DataSource implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtkaczyk

  1. I do agree it's confusing. That being said, I don't like (at least in general terms) to push a broken test marked as ignore. All tests should be working or get fixed.
  2. Lol. yes, we can create a separate config that's a good idea. The goal of some of them is to maintain certain level of compatibility against other implementations and allow others to compare our client against ethereum wiki ones.
  3. No it's not too much atm, we can uningnore it
  4. Yes we can, but, based on item (2) there is a benchmark test that requires a non ephemeral trie so it does make sense to use them there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, then I would propose to name the configuration benchmark, and later we can configure Circle to run it when merging to master or by using nightly builds.

Regarding IodbDataSource it's fine if the tests utilise MPT when advantageous, especially in benchmark. However, a separate IodbDataSourceSuite testing get/update in isolation could be useful, even if it's really simple, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90 sec does not looks scary to me ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90 sec does not looks scary to me ;)

So I'll have to wait 90 secs every time I save a file i a project ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanVerbner let's use custom configs. Imho it's most convenient approach from user's point of view. Think of running sbt it:test in SBT console instead of dealing with Tests.Argument("-l", "LongRunningTest") settings.

Copy link
Contributor Author

@AlanVerbner AlanVerbner Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whysoserious deal. Will create a custom config called benchmark (as @rtkaczyk suggested) and will move this tests over there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90 sec does not looks scary to me ;)

It may be scary once we accumulate more of such tests. I wouldn't want to wait over an hour for a PR to go green.

in order to finish this PR CircleCI will have to successfully run an additional step - sbt it:test

@whysoserious it looks that we need a proper circle.yml file first (I thought Circle was smart enough to pick up travis.yml but apparently it's just using some defaults for Scala). So I'd say that's a follow-up PR.

Anyway if those tests currently take just a few minutes then it's OK to require them for every PR, but ultimately I'd like that we run them conditionally: either when merging to master or during nightly builds.

@rtkaczyk
Copy link
Contributor

One final remarks about the tests is that to putting those tests it config is not 100% accurate because they're not integration tests. I can imagine a scenario where we build a test net and test our client as a blackbox - that would constitute a true integration test (though using Hive is probably another valid option).

Anyway, for now I'm glad we have a separation of non-unit tests. Once we have more such tests and we can clearly categorise them we can think about further separation.

@rtkaczyk
Copy link
Contributor

LGTM 👍

@adamsmo wanna give a second approval?

@adamsmo
Copy link
Contributor

adamsmo commented Jan 20, 2017

LGTM as well 👍

@rtkaczyk rtkaczyk merged commit 576ed02 into phase/2/txHashValidation Jan 20, 2017
@rtkaczyk rtkaczyk deleted the feature/patriciaTrie branch January 20, 2017 10:49
@AlanVerbner
Copy link
Contributor Author

@rtkaczyk I definetly agree with your comment but what I do like most is

Once we have more such tests and we can clearly categorise them we can think about further separation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants