-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETCM-73] Transactional blockchain #673
Conversation
e7a15e1
to
ab83593
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed NodeStorage interface (NodeStorage and CachedNodeStorage are independent now)
Was this needed? Our MPT node handling is way too overly complicated and I don't see the benefit of that change, should we maybe keep it as is till we dedicate fully to a better design of the whole of it?
Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?
For me yes! But if it's not a blocker for any task we'll probably kick working on it for later
@@ -98,7 +97,7 @@ class ReferenceCountedStateStorage(private val nodeStorage: NodeStorage, | |||
} | |||
|
|||
override def saveNode(nodeHash: NodeHash, nodeEncoded: NodeEncoded, bn: BigInt): Unit = { | |||
new FastSyncNodeStorage(nodeStorage, bn).update(Nil, Seq(nodeHash -> nodeEncoded)) | |||
new FastSyncNodeStorage(cachedNodeStorage, bn).updateInStorage(Nil, Seq(nodeHash -> nodeEncoded)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the nodeStorage replaced here by the cachedNodeStorage?
|
||
override def persist(): Unit = {} | ||
override def persist(): Unit = { | ||
cachedNodeStorage.forcePersist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not cachedNodeStorage.persist()?
|
||
object CachedNodeStorage { | ||
// special storage where we want to minimize cache layer because datasource is in memory | ||
def withMinimalCache(dataSource: EphemDataSource): CachedNodeStorage = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the 0 below mean that it has no caching instead of minimal?
ab83593
to
ad927d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments for now.
General remarks:
Our MPT node handling is way too overly complicated and I don't see the benefit of that change, should we maybe keep it as is till we dedicate fully to a better design of the whole of it?
I agree that better desging would be needed but it would need to take all constrains which kinda grow organically i.e diefferent modes of pruning and the fact that handling state nodes is one of the main bottleneck of sync so performance here is critical.
Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?
What do you have in mind by this simplification ? I mean the main constraint here is that it needs to be quite performant as it is frequently used thing, and we cannont update best block in db in each best block update as we do not now if node cache has been flushed. So the idea here is to udpdate the block in db only when flushing the cache, and otherwise operate on this counter.
@@ -9,38 +9,37 @@ class EphemDataSource(var storage: Map[ByteBuffer, Array[Byte]]) extends DataSou | |||
* key.drop to remove namespace prefix from the key | |||
* @return key values paris from this storage | |||
*/ | |||
def getAll(namespace: Namespace): Seq[(IndexedSeq[Byte], IndexedSeq[Byte])] = | |||
storage.toSeq.map{case (key, value) => (key.array().drop(namespace.length).toIndexedSeq, value.toIndexedSeq)} | |||
def getAll(namespace: Namespace): Seq[(IndexedSeq[Byte], IndexedSeq[Byte])] = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those synchronised blocks necessary or you just added them to keep consistent with docs on data source Implementations should guarantee that the whole operation is atomic.
?
I am asking as we are using this Storage when fast syncing in one threaded scenarion, and synchronized blocks will probably add some unnecessary overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same change was introduced in our first template
PR ;)
* If a key is already in the DataSource its value will be updated. | ||
* @return the new DataSource after the removals and insertions were done. | ||
*/ | ||
def updateOptimized(toRemove: Seq[Array[Byte]], toUpsert: Seq[(Array[Byte], Array[Byte])]): DataSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary to remove this methods ? Main reason for them was the fact that:
- in case of NodeStorage our key is
type NodeHash = ByteString
and valuetype NodeEncoded = Array[Byte]
- when going thorough generic serializers we convert those values into
IndexedSeq[Byte]
- then when writing to storage we convert them both to
Array[Byte]
and this process is really wasteful and in case a lot of nodes in mpt trie (like in eth) those conversions becomes huge bottleneck for syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created new DataSourceUpdateOptimized
type for optimized execution
@@ -11,21 +11,34 @@ import io.iohk.ethereum.db.storage.encoding._ | |||
* node saved will have its reference count equal to 1. | |||
* | |||
* */ | |||
class FastSyncNodeStorage(nodeStorage: NodesStorage, bn: BigInt) extends ReferenceCountNodeStorage(nodeStorage, bn) { | |||
class FastSyncNodeStorage(cachedNodeStorage: CachedNodeStorage, bn: BigInt) extends ReferenceCountNodeStorage(cachedNodeStorage, bn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastSync does not have concept of caching it need to save data into database directly.
this.dataSource eq that.dataSource, | ||
"Transactional storage updates must be performed on the same data source" | ||
) | ||
DataSourceBatchUpdate(dataSource, this.updates ++ that.updates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if queue with appending or list with prepending would not be better here ? As otherwise when batching a lot of updates from list (like in FastSync) we have quite an overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
47907b6
to
0982653
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you have in mind by this simplification ? I mean the main constraint here is that it needs to be quite performant as it is frequently used thing, and we cannont update best block in db in each best block update as we do not now if node cache has been flushed. So the idea here is to udpdate the block in db only when flushing the cache, and otherwise operate on this counter.
Hmm given that, couldn't we hide the bestKnowBlock (which I would rename to inMemoryBestBlock or cachedBestBlock) in the blockchain interface? That is, instead of saveBestKnownBlock
and saveBestBlock
have a single saveBestBlock
I see the constraint that this can't be updated before flushing the nodes cache but not the performance constraint. Usually when updating the bestBlock it's because we are removing or inserting whole blocks, aren't those operations much heavier than this?
storage = afterUpdate | ||
} | ||
|
||
def updateOptimized(toRemove: Seq[Array[Byte]], toUpsert: Seq[(Array[Byte], Array[Byte])]): Unit = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: should this and the update method (def update(namespace: Namespace, toRemove: Seq[Key], toUpsert: Seq[(Key, Value)]): Unit
) be private?
src/test/scala/io/iohk/ethereum/db/storage/TransactionalKeyValueStorageSuite.scala
Outdated
Show resolved
Hide resolved
0982653
to
5be5f3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
.\\ //.
. \ \ / /.
.\ ,\ /` /,.-
-. \ /'/ / .
` - `-' \ -
'. /.\`
- .-
:`//.'
.`.'
.' BP
5be5f3f
to
01484e1
Compare
01484e1
to
3144d38
Compare
Description
The goal is to make part of the blockchain methods transactional.
Important Changes Introduced
Testing
check syncing and pruning
MISC
Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?