[ETCM-73] Transactional blockchain #673

pslaski · 2020-09-16T11:06:05Z

Description

The goal is to make part of the blockchain methods transactional.

Important Changes Introduced

introduced DataSourceBatchUpdate
blockchain methods which are saving to DB are transactional now
removed NodeStorage interface (NodeStorage and CachedNodeStorage are independent now)

Testing

check syncing and pruning

MISC

Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?

ntallar

removed NodeStorage interface (NodeStorage and CachedNodeStorage are independent now)

Was this needed? Our MPT node handling is way too overly complicated and I don't see the benefit of that change, should we maybe keep it as is till we dedicate fully to a better design of the whole of it?

Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?

For me yes! But if it's not a blocker for any task we'll probably kick working on it for later

ntallar · 2020-09-16T16:11:32Z

src/main/scala/io/iohk/ethereum/db/storage/StateStorage.scala

@@ -98,7 +97,7 @@ class ReferenceCountedStateStorage(private val nodeStorage: NodeStorage,
  }

  override def saveNode(nodeHash: NodeHash, nodeEncoded: NodeEncoded, bn: BigInt): Unit = {
-    new FastSyncNodeStorage(nodeStorage, bn).update(Nil, Seq(nodeHash -> nodeEncoded))
+    new FastSyncNodeStorage(cachedNodeStorage, bn).updateInStorage(Nil, Seq(nodeHash -> nodeEncoded))


Why was the nodeStorage replaced here by the cachedNodeStorage?

ntallar · 2020-09-16T16:17:40Z

src/main/scala/io/iohk/ethereum/db/storage/ArchiveNodeStorage.scala


-  override def persist(): Unit = {}
+  override def persist(): Unit = {
+    cachedNodeStorage.forcePersist()


Why not cachedNodeStorage.persist()?

ntallar · 2020-09-16T16:18:52Z

src/main/scala/io/iohk/ethereum/db/storage/CachedNodeStorage.scala

+
+object CachedNodeStorage {
+  // special storage where we want to minimize cache layer because datasource is in memory
+  def withMinimalCache(dataSource: EphemDataSource): CachedNodeStorage =


Doesn't the 0 below mean that it has no caching instead of minimal?

KonradStaniec

Few comments for now.
General remarks:

Our MPT node handling is way too overly complicated and I don't see the benefit of that change, should we maybe keep it as is till we dedicate fully to a better design of the whole of it?

I agree that better desging would be needed but it would need to take all constrains which kinda grow organically i.e diefferent modes of pruning and the fact that handling state nodes is one of the main bottleneck of sync so performance here is critical.

Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?

What do you have in mind by this simplification ? I mean the main constraint here is that it needs to be quite performant as it is frequently used thing, and we cannont update best block in db in each best block update as we do not now if node cache has been flushed. So the idea here is to udpdate the block in db only when flushing the cache, and otherwise operate on this counter.

KonradStaniec · 2020-09-17T07:42:08Z

src/main/scala/io/iohk/ethereum/db/dataSource/EphemDataSource.scala

@@ -9,38 +9,37 @@ class EphemDataSource(var storage: Map[ByteBuffer, Array[Byte]]) extends DataSou
    * key.drop to remove namespace prefix from the key
    * @return key values paris from this storage
    */
-  def getAll(namespace: Namespace): Seq[(IndexedSeq[Byte], IndexedSeq[Byte])] =
-    storage.toSeq.map{case (key, value) => (key.array().drop(namespace.length).toIndexedSeq, value.toIndexedSeq)}
+  def getAll(namespace: Namespace): Seq[(IndexedSeq[Byte], IndexedSeq[Byte])] = synchronized {


Are those synchronised blocks necessary or you just added them to keep consistent with docs on data source Implementations should guarantee that the whole operation is atomic. ?
I am asking as we are using this Storage when fast syncing in one threaded scenarion, and synchronized blocks will probably add some unnecessary overhead.

the same change was introduced in our first template PR ;)

KonradStaniec · 2020-09-17T07:47:38Z

src/main/scala/io/iohk/ethereum/db/dataSource/DataSource.scala

-    *                 If a key is already in the DataSource its value will be updated.
-    * @return the new DataSource after the removals and insertions were done.
-    */
-  def updateOptimized(toRemove: Seq[Array[Byte]], toUpsert: Seq[(Array[Byte], Array[Byte])]): DataSource


Is this necessary to remove this methods ? Main reason for them was the fact that:

in case of NodeStorage our key is type NodeHash = ByteString and value type NodeEncoded = Array[Byte]

when going thorough generic serializers we convert those values into IndexedSeq[Byte]

then when writing to storage we convert them both to Array[Byte]
and this process is really wasteful and in case a lot of nodes in mpt trie (like in eth) those conversions becomes huge bottleneck for syncing.

I created new DataSourceUpdateOptimized type for optimized execution

KonradStaniec · 2020-09-17T08:06:49Z

src/main/scala/io/iohk/ethereum/db/storage/FastSyncNodeStorage.scala

@@ -11,21 +11,34 @@ import io.iohk.ethereum.db.storage.encoding._
  * node saved will have its reference count equal to 1.
  *
  * */
-class FastSyncNodeStorage(nodeStorage: NodesStorage, bn: BigInt) extends ReferenceCountNodeStorage(nodeStorage, bn) {
+class FastSyncNodeStorage(cachedNodeStorage: CachedNodeStorage, bn: BigInt) extends ReferenceCountNodeStorage(cachedNodeStorage, bn) {


FastSync does not have concept of caching it need to save data into database directly.

KonradStaniec · 2020-09-17T08:19:37Z

src/main/scala/io/iohk/ethereum/db/dataSource/DataSourceBatchUpdate.scala

+      this.dataSource eq that.dataSource,
+      "Transactional storage updates must be performed on the same data source"
+    )
+    DataSourceBatchUpdate(dataSource, this.updates ++ that.updates)


I am wondering if queue with appending or list with prepending would not be better here ? As otherwise when batching a lot of updates from list (like in FastSync) we have quite an overhead

Used Array based on https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#concatenation-performance

ntallar

What do you have in mind by this simplification ? I mean the main constraint here is that it needs to be quite performant as it is frequently used thing, and we cannont update best block in db in each best block update as we do not now if node cache has been flushed. So the idea here is to udpdate the block in db only when flushing the cache, and otherwise operate on this counter.

Hmm given that, couldn't we hide the bestKnowBlock (which I would rename to inMemoryBestBlock or cachedBestBlock) in the blockchain interface? That is, instead of saveBestKnownBlock and saveBestBlock have a single saveBestBlock

I see the constraint that this can't be updated before flushing the nodes cache but not the performance constraint. Usually when updating the bestBlock it's because we are removing or inserting whole blocks, aren't those operations much heavier than this?

src/main/scala/io/iohk/ethereum/db/dataSource/DataSourceUpdate.scala

ntallar · 2020-09-18T12:00:20Z

src/main/scala/io/iohk/ethereum/db/dataSource/EphemDataSource.scala

+    storage = afterUpdate
+  }
+
+  def updateOptimized(toRemove: Seq[Array[Byte]], toUpsert: Seq[(Array[Byte], Array[Byte])]): Unit = synchronized {


Minor: should this and the update method (def update(namespace: Namespace, toRemove: Seq[Key], toUpsert: Seq[(Key, Value)]): Unit) be private?

src/test/scala/io/iohk/ethereum/mpt/PersistentStorage.scala

src/test/scala/io/iohk/ethereum/db/storage/TransactionalKeyValueStorageSuite.scala

ntallar

LGTM!

 .\\            //.
. \ \          / /.
.\  ,\     /` /,.-
 -.   \  /'/ /  .
 ` -   `-'  \  -
   '.       /.\`
      -    .-
      :`//.'
      .`.'
      .' BP

pslaski requested review from KonradStaniec and ntallar September 16, 2020 11:06

pslaski force-pushed the etcm-73-transactional-blockchain branch 3 times, most recently from e7a15e1 to ab83593 Compare September 16, 2020 16:19

ntallar requested changes Sep 16, 2020

View reviewed changes

pslaski force-pushed the etcm-73-transactional-blockchain branch from ab83593 to ad927d5 Compare September 17, 2020 07:55

KonradStaniec reviewed Sep 17, 2020

View reviewed changes

pslaski force-pushed the etcm-73-transactional-blockchain branch 4 times, most recently from 47907b6 to 0982653 Compare September 17, 2020 14:53

ntallar reviewed Sep 18, 2020

View reviewed changes

pslaski force-pushed the etcm-73-transactional-blockchain branch from 0982653 to 5be5f3f Compare September 18, 2020 15:19

ntallar approved these changes Sep 18, 2020

View reviewed changes

pslaski force-pushed the etcm-73-transactional-blockchain branch from 5be5f3f to 01484e1 Compare September 21, 2020 08:02

etcm-73 made part of the blockchain transactional

3144d38

pslaski force-pushed the etcm-73-transactional-blockchain branch from 01484e1 to 3144d38 Compare September 21, 2020 08:06

KonradStaniec approved these changes Sep 21, 2020

View reviewed changes

pslaski merged commit 6691f66 into develop Sep 21, 2020

pslaski deleted the etcm-73-transactional-blockchain branch September 21, 2020 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ETCM-73] Transactional blockchain #673

[ETCM-73] Transactional blockchain #673

pslaski commented Sep 16, 2020 •

edited by ntallar

Loading

ntallar left a comment

ntallar Sep 16, 2020

ntallar Sep 16, 2020

ntallar Sep 16, 2020

KonradStaniec left a comment •

edited

Loading

KonradStaniec Sep 17, 2020

pslaski Sep 17, 2020

KonradStaniec Sep 17, 2020

pslaski Sep 17, 2020

KonradStaniec Sep 17, 2020

KonradStaniec Sep 17, 2020

pslaski Sep 17, 2020

ntallar left a comment

ntallar Sep 18, 2020

ntallar left a comment

[ETCM-73] Transactional blockchain #673

[ETCM-73] Transactional blockchain #673

Conversation

pslaski commented Sep 16, 2020 • edited by ntallar Loading

Description

Important Changes Introduced

Testing

MISC

ntallar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KonradStaniec left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntallar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntallar left a comment

Choose a reason for hiding this comment

pslaski commented Sep 16, 2020 •

edited by ntallar

Loading

KonradStaniec left a comment •

edited

Loading