Initial support for geth databases #47

vpulim · 2018-05-04T01:19:16Z

This is a major refactoring of the code to support reading from geth leveldb databases. One major benefit of these changes is to allow testing of ethereumjs-vm on a blockchain previously synched by geth, including all of the various hardforks (the current ethereumjs-vm only supports the byzantium fork, however).

Apologies ahead of time for the large code diff, but I couldn't come up with a way to make smaller incremental changes since the current architecture makes heavy use of the doubly linked list from detailsDB which had to be removed.

This is a list of the major changes required for geth db compatibility:

detailsDB and blockDB are replaced with a single db reference. Instead of relying on a doubly linked list (stored in detailsDB), geth relies on block numbers and number-to-hash mappings to iterate through the chain.
Related to the above, the getDetails method has been deprecated and now returns an empty object.
td and height are not stored in the db as meta info. Instead, they are computed as needed. The headerchain head and blockchain head are stored under separate keys. As a result, the meta field has been moved into a getter that generates the old meta info from other internal fields.
Block headers and body (transactions and uncle headers) are stored under two separate keys as per geth db design
Changes have been made to properly rebuild the chain and number/hash mappings as a result of forks and deletions.
A write-through cache has been added to reduce database reads
Similar to geth, we now defend against selfish mining vulnerability (https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L960)
Added many more tests to increase coverage to over 90%

Finally, the the ethereumjs-vm blockchain tests have been run on this PR and the number of passing tests remained the same as compared to the current HEAD (https://gist.github.com/vpulim/efbb864d5790643e06cf87b616036141)

coveralls · 2018-05-04T01:23:37Z

Coverage increased (+32.9%) to 96.93% when pulling 607d6cb on vpulim:geth-db-support into 262d906 on ethereumjs:master.

holgerd77 · 2018-05-04T07:31:15Z

Huh, what a PR! Really looking forward to have a look into this, thanks so much! 🤓 📚

holgerd77 · 2018-05-04T09:14:06Z

Just for my test preparation: this should also work with a fast-synced Geth DB to a post-Byzantium state, shouldn't it?

vpulim · 2018-05-04T13:35:14Z

Yes, it should be able to load all of the block headers, transactions and uncle headers from a fast-synced Geth DB, including post-Byzantium blocks.

Something like this should let you iterate through the chain:

const levelup = require('levelup')
const leveldown = require('leveldown')
const Blockchain = require('ethereumjs-blockchain')
const utils = require('ethereumjs-util')

var gethDbPath = './chaindata'
var db = levelup(gethDbPath, { db: leveldown })

new Blockchain({db: db}).iterator('i', (block, reorg, cb) => {
  const blockNumber = utils.bufferToInt(block.header.number)
  const blockHash = block.hash().toString('hex')
  console.log(`BLOCK ${blockNumber}: ${blockHash}`)
  cb()
}, (err) => console.log(err || 'Done.'))

Also, here is an example of running the VM on a full or fast sync geth db after a specific block number:

const levelup = require('levelup')
const leveldown = require('leveldown')
const Blockchain = require('ethereumjs-blockchain')
const Trie = require('merkle-patricia-tree/secure')
const VM = require('ethereumjs-vm')

const gethDbPath = '/Users/vpulim/Library/Ethereum/geth/chaindata'
const db = levelup(gethDbPath, { db: leveldown })

const vm = new VM({
  state: new Trie(db),
  blockchain: new Blockchain(db)
})
const sm = vm.stateManager

sm.blockchain.getBlock(5572034, (err, block) => {
  sm.blockchain._heads['vm'] = block.header.hash()
  sm.trie.root = block.header.stateRoot
  vm.runBlockchain(err => console.log(err || 'Done.'))
})

I get a "tx has a higher gas limit than the block" error when attempting to run the code above. I'm not sure if this is due to a problem with the VM or loading the geth db. Will need to look into this further...

holgerd77

Thanks again for this PR, I'm realizing more and more what is all inside it and how much work this had to be.

I'm now once through line-by-line and I think I am getting most of it structurally, not able yet to make comments on the detail level though.

One thing to be aware of: while this is supporting the old constructors, it won't be possible with this to use an already written DB any more (this is correct, isn't it?). I think that's worth it, also tried to re-cap and I don't think that there are many users of the library who use it on more than a simulation level. Nevertheless I think this should be stated once.

Will continue tomorrow dig a bit deeper into the tests and also locally checkout your fork. Also hope to have a geth fast sync ready, can't wait to try this out! 😄

jwasinger · 2018-05-10T09:37:05Z

@vpulim this is awesome! Thanks @holgerd77 for putting in the effort to review these changes. This is a big PR!

holgerd77 · 2018-05-10T10:05:52Z

index.js

+    return {
+      rawHead: this._headHeader,
+      heads: this._heads,
+      genesis: this._genesis


The meta getter is not really returning the meta like in the previous version and as expected after reading your point 3) description point:

As a result, the meta field has been moved into a getter that generates the old meta info from other internal fields

Old format:

{ heads: {}, td: <BN: 400000000>, rawHead: 'd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3', height: 0, genesis: 'd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3' }

New format:

{ rawHead: <Buffer d4 e5 67 40 f8 76 ae f8 c0 10 b8 6a 40 d5 f5 67 45 a1 18 d0 90 6a 34 e6 9a ec 8c 0d b1 cb 8f a3>, heads: {}, genesis: <Buffer d4 e5 67 40 f8 76 ae f8 c0 10 b8 6a 40 d5 f5 67 45 a1 18 d0 90 6a 34 e6 9a ec 8c 0d b1 cb 8f a3> }

So types are different and td and height are missing. Is this intentional?

while this is supporting the old constructors, it won't be possible with this to use an already written DB any more (this is correct, isn't it?)

Yes, this PR is not compatible with existing DBs and we should definitely make that very clear. I could create a script to migrate old DBs into geth format if there is enough demand for it.

So types are different and td and height are missing. Is this intentional?

The td and height were intentionally left out. The README doesn't mention a meta field and it refers to BlockChain Properties that don't exist so I was unsure whether removing meta would break the published interface. My preference would be to remove the meta field completely and explicitly expose certain properties (headHeader, headBlock, genesis) and async get methods such as getTd(cb) and getHeight(cb). The async methods are needed since computing td and height require db operations under the geth db design.

However, if we absolutely needed to keep the current meta interface and make these values available synchronously, it is possible but would require additional db calls to ensure these values are always up-to-date. My opinion is that the convenience of this doesn't outweigh the additional performance overhead of preemptively computing these values whenever there is a change to the blockchain (instead of computing them on-demand). As a compromise, I could fix the meta getters to return correct values (including pre-computing td and height), but also deprecate meta and add new properties and async get methods to the interface going forward.

Regarding the difference in types for meta.rawHead and meta.genesis, that was a mistake on my part! I can change the getter to return hex strings instead. Internally, I keep all hash values as Buffers until a conversion to String is absolutely necessary.

DB compatibility
I would assume that it won't become necessary but it's good to know that there is this fallback solution with a migration script in case there are more people relying on this then realized. People can also still use the v2.1.0 version (for some time).

meta
I would also say that we can drop the meta "interface" completely. This was always something very implicit, just did a short GitHub search, within the ethereumjs ecosystem I found only two direct accesses on this from within the VM implementation which can be easily updated. I very much prefer your solution to expose these properties directly in the way you described above.

holgerd77 · 2018-05-11T11:50:10Z

index.js

@@ -303,40 +369,63 @@ Blockchain.prototype._putBlock = function (block, cb, isGenesis) {
 /**
 *Gets a block by its hash
 * @method getBlock
- * @param {String|Buffer|Number} hash - the sha256 hash of the rlp encoding of the block
+ * @param {Buffer|Number|BN} hash - the sha256 hash of the rlp encoding of the block


Hmm, I'm unsure how to proceed with API documentation. This is a bit of a mess anyhow atm and we should switch to generated documentation API docs from the code. My tendency is to not update the README on this with this PR and then do the autogeneration on a direct subsequent one and then switch to that and remove the current (already incomplete) API docs from the README.

What do you think?

(if you want to take on the extra work you can also just add the documentation dependency to the dev dependencies, add a npm command like "build:docs": "documentation build ./index.js --format md --shallow > ./docs/index.md or similar (would be cool to omit the _ functions, not sure if such a flag exists) and the do the documentation changes above)

I really like the idea of autogenerating docs. Once this PR is accepted, I'm happy to do another one implementing the approach you describe.

This is already done in various of the other ethereumjs libraries, e.g. in ethereumjs-block.

holgerd77

I'm now done going through the tests.

I have a strong tendency to give this a go, since both test coverage (including testing of existing functionality) has increased significantly and code is more readable/understandable then before.

I will leave this open over the weekend and then eventually approve on Tuesday or Wednesday next week. Everyone who wants to have another look at the code might do so in between.

We also might want to do at least some basic investigation/origin search about the VM "tx has a higher gas limit than the block" error (see this comment) and make sure this is actually originating in the VM code.

holgerd77 · 2018-05-11T12:59:49Z

I would then release this as a new major v3.0.0 release.

holgerd77 · 2018-05-15T09:12:05Z

Ok, have tested the iterator example, this works like a charm, will let this run through a bit... 🏇🏇🏇

Couple of minutes later, just passed the 100.000 mark and no signs of slowing down. Watched the memory a bit, stays relatively constantly around 4%.

Pretty cool. 😄

holgerd77 · 2018-05-15T12:06:49Z

Did a test-PR over on the VM running the tests with the changed ethereumjs-blockchain dependency, this is passing completely: ethereumjs/ethereumjs-monorepo#299
(Circle actually wrongly used a node_modules cache from an old build so no statement possible here, but Travis installed freshly and passed - urgh - always such a pain these things...).

I also tried to run the VM example, I came to the conclusion that this is a separate construction site which we can approach slowly/independently on top of this. Actually got the example running to some extend (I had to add skipBalance: true to the VM options) but it got stuck at some point. (One must say that I couldn't run this on a Byzantium chain cause I didn't manage to do a Geth fast-sync on three (!!) over-the-night sessions, always stuck at some point). Nevertheless I think we are pretty close here, so cool.

Ok. I'll leave this open for another 24 hours for comments.

vpulim · 2018-05-15T13:06:30Z

@holgerd77 Awesome! Happy to hear that all tests passed :) Thanks again for all your work on reviewing/testing this PR.

holgerd77 · 2018-05-15T13:53:59Z

Could you do a review of ethereumjs/ethereumjs-block#44 since you have already looked into the commons library?

vpulim · 2018-05-15T13:59:44Z

Sure, I'll take a look at it today.

holgerd77 · 2018-05-16T09:07:13Z

Ok, will now merge this. Thanks once more @vpulim for this wonderful PR. Will do a subsequent PR with the docs changes and then maybe do a release tomorrow or the day after.

holgerd77

Looks good.

holgerd77 · 2018-05-16T09:27:39Z

Short note: documentationjs is currently not generating useful docs, we should move to an ES6 class structure with this - generally also for readability. I'm always a bit unsure if it is safe to distribute ES6 classes as a node package or if this should be converted to ES5 and we should update our build process here (probably still for some time).

So regarding documentation I'll stick to the conservative approach for now and just manually update the README, we can take on the above separately. Will also update the README with the first usage example you posted.

…ity), added example code snippet

Updated API docs (Geth compatibility PR #47)

fjl · 2018-05-18T16:23:12Z

I would like to note that we do not guarantee stability of the go-ethereum database schema. It can change without notice. You have been warned ;).

holgerd77 · 2018-05-18T19:06:41Z

@fjl Hehe. Thanks for letting us now, we'll keep this in mind. 😄 Will be useful for us anyhow, minimally for VM testing and development purposes.

holgerd77 · 2018-05-23T08:57:50Z

Hi @vpulim, just discovered this: for _getBlock() is it intended that the callback is once called
like cb(null, blockTag, number) and in the other clause with cb(null, hash, blockTag), so with reverse order of the blockTag argument?

holgerd77 · 2018-05-23T09:18:38Z

And any reason you didn't put the height into the meta getter? Wouldn't this be easy to get from the block number from headHeader?

vpulim · 2018-05-23T10:35:25Z

@holgerd77 That bit of code is a little confusing to read unfortunately, but yes that is the intention. Both of those callbacks feed their return values (in that order) to the lookupByHashAndNumber function which takes a hash as the first value and number as the second. In the first cb() call, blockTag is a hash and in the second call, blockTag is a number. So in both cases, cb() is being called with hash and number, in that order.

vpulim · 2018-05-23T10:36:38Z

@holgerd77 headHeader is just a hash value, not a full header object. So a db get operation must be made in order to retrieve the height (from the block number).

vpulim force-pushed the geth-db-support branch from 1dd88dd to a935b92 Compare May 4, 2018 03:32

vpulim force-pushed the geth-db-support branch from a935b92 to dd33e33 Compare May 4, 2018 17:04

Initial support for geth databases

607d6cb

vpulim force-pushed the geth-db-support branch from dd33e33 to 607d6cb Compare May 4, 2018 18:06

holgerd77 reviewed May 10, 2018

View reviewed changes

holgerd77 reviewed May 11, 2018

View reviewed changes

holgerd77 mentioned this pull request May 15, 2018

[DO-NOT-MERGE] Testing geth-compatible ethereumjs-blockchain dependency ethereumjs/ethereumjs-monorepo#299

Closed

holgerd77 approved these changes May 16, 2018

View reviewed changes

holgerd77 merged commit 78e23e8 into ethereumjs:master May 16, 2018

holgerd77 added a commit that referenced this pull request May 16, 2018

Updated README API documentation according to PR #47 (Geth compatibil…

9ec717d

…ity), added example code snippet

holgerd77 mentioned this pull request May 16, 2018

Updated API docs (Geth compatibility PR #47) #48

Merged

holgerd77 added a commit that referenced this pull request May 17, 2018

Updated README API documentation according to PR #47 (Geth compatibil…

cafa973

…ity), added example code snippet

holgerd77 added a commit that referenced this pull request May 17, 2018

Merge pull request #48 from ethereumjs/update-docs

16cddd1

Updated API docs (Geth compatibility PR #47)

holgerd77 mentioned this pull request May 29, 2018

Battle-test VM against Geth-synced mainnet chain DB ethereumjs/ethereumjs-monorepo#300

Closed

vpulim mentioned this pull request Aug 8, 2018

bug - staleDetails undefined #10

Closed

vpulim deleted the geth-db-support branch October 11, 2018 02:34

holgerd77 mentioned this pull request Apr 19, 2019

Remove deprecated db options #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for geth databases #47

Initial support for geth databases #47

vpulim commented May 4, 2018 •

edited

Loading

coveralls commented May 4, 2018 •

edited

Loading

holgerd77 commented May 4, 2018

holgerd77 commented May 4, 2018

vpulim commented May 4, 2018 •

edited

Loading

holgerd77 left a comment

jwasinger commented May 10, 2018 •

edited

Loading

holgerd77 May 10, 2018

vpulim May 10, 2018 •

edited

Loading

holgerd77 May 11, 2018 •

edited

Loading

holgerd77 May 11, 2018

holgerd77 May 11, 2018

vpulim May 11, 2018

holgerd77 May 11, 2018 •

edited

Loading

holgerd77 left a comment

holgerd77 commented May 11, 2018

holgerd77 commented May 15, 2018

holgerd77 commented May 15, 2018

vpulim commented May 15, 2018 •

edited

Loading

holgerd77 commented May 15, 2018

vpulim commented May 15, 2018

holgerd77 commented May 16, 2018

holgerd77 left a comment

holgerd77 commented May 16, 2018

fjl commented May 18, 2018

holgerd77 commented May 18, 2018

holgerd77 commented May 23, 2018

holgerd77 commented May 23, 2018

vpulim commented May 23, 2018

vpulim commented May 23, 2018 •

edited

Loading

Initial support for geth databases #47

Initial support for geth databases #47

Conversation

vpulim commented May 4, 2018 • edited Loading

coveralls commented May 4, 2018 • edited Loading

holgerd77 commented May 4, 2018

holgerd77 commented May 4, 2018

vpulim commented May 4, 2018 • edited Loading

holgerd77 left a comment

Choose a reason for hiding this comment

jwasinger commented May 10, 2018 • edited Loading

holgerd77 May 10, 2018

Choose a reason for hiding this comment

vpulim May 10, 2018 • edited Loading

Choose a reason for hiding this comment

holgerd77 May 11, 2018 • edited Loading

Choose a reason for hiding this comment

holgerd77 May 11, 2018

Choose a reason for hiding this comment

holgerd77 May 11, 2018

Choose a reason for hiding this comment

vpulim May 11, 2018

Choose a reason for hiding this comment

holgerd77 May 11, 2018 • edited Loading

Choose a reason for hiding this comment

holgerd77 left a comment

Choose a reason for hiding this comment

holgerd77 commented May 11, 2018

holgerd77 commented May 15, 2018

holgerd77 commented May 15, 2018

vpulim commented May 15, 2018 • edited Loading

holgerd77 commented May 15, 2018

vpulim commented May 15, 2018

holgerd77 commented May 16, 2018

holgerd77 left a comment

Choose a reason for hiding this comment

holgerd77 commented May 16, 2018

fjl commented May 18, 2018

holgerd77 commented May 18, 2018

holgerd77 commented May 23, 2018

holgerd77 commented May 23, 2018

vpulim commented May 23, 2018

vpulim commented May 23, 2018 • edited Loading

vpulim commented May 4, 2018 •

edited

Loading

coveralls commented May 4, 2018 •

edited

Loading

vpulim commented May 4, 2018 •

edited

Loading

jwasinger commented May 10, 2018 •

edited

Loading

vpulim May 10, 2018 •

edited

Loading

holgerd77 May 11, 2018 •

edited

Loading

holgerd77 May 11, 2018 •

edited

Loading

vpulim commented May 15, 2018 •

edited

Loading

vpulim commented May 23, 2018 •

edited

Loading