Skip to content
This repository has been archived by the owner on Apr 6, 2020. It is now read-only.

Initial support for geth databases #47

Merged
merged 1 commit into from
May 16, 2018

Conversation

vpulim
Copy link
Contributor

@vpulim vpulim commented May 4, 2018

This is a major refactoring of the code to support reading from geth leveldb databases. One major benefit of these changes is to allow testing of ethereumjs-vm on a blockchain previously synched by geth, including all of the various hardforks (the current ethereumjs-vm only supports the byzantium fork, however).

Apologies ahead of time for the large code diff, but I couldn't come up with a way to make smaller incremental changes since the current architecture makes heavy use of the doubly linked list from detailsDB which had to be removed.

This is a list of the major changes required for geth db compatibility:

  1. detailsDB and blockDB are replaced with a single db reference. Instead of relying on a doubly linked list (stored in detailsDB), geth relies on block numbers and number-to-hash mappings to iterate through the chain.
  2. Related to the above, the getDetails method has been deprecated and now returns an empty object.
  3. td and height are not stored in the db as meta info. Instead, they are computed as needed. The headerchain head and blockchain head are stored under separate keys. As a result, the meta field has been moved into a getter that generates the old meta info from other internal fields.
  4. Block headers and body (transactions and uncle headers) are stored under two separate keys as per geth db design
  5. Changes have been made to properly rebuild the chain and number/hash mappings as a result of forks and deletions.
  6. A write-through cache has been added to reduce database reads
  7. Similar to geth, we now defend against selfish mining vulnerability (https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L960)
  8. Added many more tests to increase coverage to over 90%

Finally, the the ethereumjs-vm blockchain tests have been run on this PR and the number of passing tests remained the same as compared to the current HEAD (https://gist.github.com/vpulim/efbb864d5790643e06cf87b616036141)

@coveralls
Copy link

coveralls commented May 4, 2018

Coverage Status

Coverage increased (+32.9%) to 96.93% when pulling 607d6cb on vpulim:geth-db-support into 262d906 on ethereumjs:master.

@holgerd77
Copy link
Member

Huh, what a PR! Really looking forward to have a look into this, thanks so much! 🤓 📚

@holgerd77
Copy link
Member

Just for my test preparation: this should also work with a fast-synced Geth DB to a post-Byzantium state, shouldn't it?

@vpulim
Copy link
Contributor Author

vpulim commented May 4, 2018

Yes, it should be able to load all of the block headers, transactions and uncle headers from a fast-synced Geth DB, including post-Byzantium blocks.

Something like this should let you iterate through the chain:

const levelup = require('levelup')
const leveldown = require('leveldown')
const Blockchain = require('ethereumjs-blockchain')
const utils = require('ethereumjs-util')

var gethDbPath = './chaindata'
var db = levelup(gethDbPath, { db: leveldown })

new Blockchain({db: db}).iterator('i', (block, reorg, cb) => {
  const blockNumber = utils.bufferToInt(block.header.number)
  const blockHash = block.hash().toString('hex')
  console.log(`BLOCK ${blockNumber}: ${blockHash}`)
  cb()
}, (err) => console.log(err || 'Done.'))

Also, here is an example of running the VM on a full or fast sync geth db after a specific block number:

const levelup = require('levelup')
const leveldown = require('leveldown')
const Blockchain = require('ethereumjs-blockchain')
const Trie = require('merkle-patricia-tree/secure')
const VM = require('ethereumjs-vm')

const gethDbPath = '/Users/vpulim/Library/Ethereum/geth/chaindata'
const db = levelup(gethDbPath, { db: leveldown })

const vm = new VM({
  state: new Trie(db),
  blockchain: new Blockchain(db)
})
const sm = vm.stateManager

sm.blockchain.getBlock(5572034, (err, block) => {
  sm.blockchain._heads['vm'] = block.header.hash()
  sm.trie.root = block.header.stateRoot
  vm.runBlockchain(err => console.log(err || 'Done.'))
})

I get a "tx has a higher gas limit than the block" error when attempting to run the code above. I'm not sure if this is due to a problem with the VM or loading the geth db. Will need to look into this further...

Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this PR, I'm realizing more and more what is all inside it and how much work this had to be.

I'm now once through line-by-line and I think I am getting most of it structurally, not able yet to make comments on the detail level though.

One thing to be aware of: while this is supporting the old constructors, it won't be possible with this to use an already written DB any more (this is correct, isn't it?). I think that's worth it, also tried to re-cap and I don't think that there are many users of the library who use it on more than a simulation level. Nevertheless I think this should be stated once.

Will continue tomorrow dig a bit deeper into the tests and also locally checkout your fork. Also hope to have a geth fast sync ready, can't wait to try this out! 😄

@jwasinger
Copy link
Contributor

jwasinger commented May 10, 2018

@vpulim this is awesome! Thanks @holgerd77 for putting in the effort to review these changes. This is a big PR!

return {
rawHead: this._headHeader,
heads: this._heads,
genesis: this._genesis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta getter is not really returning the meta like in the previous version and as expected after reading your point 3) description point:

As a result, the meta field has been moved into a getter that generates the old meta info from other internal fields

Old format:

{ heads: {},
  td: <BN: 400000000>,
  rawHead: 'd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3',
  height: 0,
  genesis: 'd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3' }

New format:

{ rawHead: <Buffer d4 e5 67 40 f8 76 ae f8 c0 10 b8 6a 40 d5 f5 67 45 a1 18 d0 90 6a 34 e6 9a ec 8c 0d b1 cb 8f a3>,
  heads: {},
  genesis: <Buffer d4 e5 67 40 f8 76 ae f8 c0 10 b8 6a 40 d5 f5 67 45 a1 18 d0 90 6a 34 e6 9a ec 8c 0d b1 cb 8f a3> }

So types are different and td and height are missing. Is this intentional?

Copy link
Contributor Author

@vpulim vpulim May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is supporting the old constructors, it won't be possible with this to use an already written DB any more (this is correct, isn't it?)

Yes, this PR is not compatible with existing DBs and we should definitely make that very clear. I could create a script to migrate old DBs into geth format if there is enough demand for it.

So types are different and td and height are missing. Is this intentional?

The td and height were intentionally left out. The README doesn't mention a meta field and it refers to BlockChain Properties that don't exist so I was unsure whether removing meta would break the published interface. My preference would be to remove the meta field completely and explicitly expose certain properties (headHeader, headBlock, genesis) and async get methods such as getTd(cb) and getHeight(cb). The async methods are needed since computing td and height require db operations under the geth db design.

However, if we absolutely needed to keep the current meta interface and make these values available synchronously, it is possible but would require additional db calls to ensure these values are always up-to-date. My opinion is that the convenience of this doesn't outweigh the additional performance overhead of preemptively computing these values whenever there is a change to the blockchain (instead of computing them on-demand). As a compromise, I could fix the meta getters to return correct values (including pre-computing td and height), but also deprecate meta and add new properties and async get methods to the interface going forward.

Regarding the difference in types for meta.rawHead and meta.genesis, that was a mistake on my part! I can change the getter to return hex strings instead. Internally, I keep all hash values as Buffers until a conversion to String is absolutely necessary.

Copy link
Member

@holgerd77 holgerd77 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. DB compatibility
    I would assume that it won't become necessary but it's good to know that there is this fallback solution with a migration script in case there are more people relying on this then realized. People can also still use the v2.1.0 version (for some time).
  1. meta
    I would also say that we can drop the meta "interface" completely. This was always something very implicit, just did a short GitHub search, within the ethereumjs ecosystem I found only two direct accesses on this from within the VM implementation which can be easily updated. I very much prefer your solution to expose these properties directly in the way you described above.

@@ -303,40 +369,63 @@ Blockchain.prototype._putBlock = function (block, cb, isGenesis) {
/**
*Gets a block by its hash
* @method getBlock
* @param {String|Buffer|Number} hash - the sha256 hash of the rlp encoding of the block
* @param {Buffer|Number|BN} hash - the sha256 hash of the rlp encoding of the block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm unsure how to proceed with API documentation. This is a bit of a mess anyhow atm and we should switch to generated documentation API docs from the code. My tendency is to not update the README on this with this PR and then do the autogeneration on a direct subsequent one and then switch to that and remove the current (already incomplete) API docs from the README.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if you want to take on the extra work you can also just add the documentation dependency to the dev dependencies, add a npm command like "build:docs": "documentation build ./index.js --format md --shallow > ./docs/index.md or similar (would be cool to omit the _ functions, not sure if such a flag exists) and the do the documentation changes above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of autogenerating docs. Once this PR is accepted, I'm happy to do another one implementing the approach you describe.

Copy link
Member

@holgerd77 holgerd77 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already done in various of the other ethereumjs libraries, e.g. in ethereumjs-block.

Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now done going through the tests.

I have a strong tendency to give this a go, since both test coverage (including testing of existing functionality) has increased significantly and code is more readable/understandable then before.

I will leave this open over the weekend and then eventually approve on Tuesday or Wednesday next week. Everyone who wants to have another look at the code might do so in between.

We also might want to do at least some basic investigation/origin search about the VM "tx has a higher gas limit than the block" error (see this comment) and make sure this is actually originating in the VM code.

@holgerd77
Copy link
Member

I would then release this as a new major v3.0.0 release.

@holgerd77
Copy link
Member

Ok, have tested the iterator example, this works like a charm, will let this run through a bit... 🏇🏇🏇

Couple of minutes later, just passed the 100.000 mark and no signs of slowing down. Watched the memory a bit, stays relatively constantly around 4%.

Pretty cool. 😄

@holgerd77
Copy link
Member

Did a test-PR over on the VM running the tests with the changed ethereumjs-blockchain dependency, this is passing completely: ethereumjs/ethereumjs-monorepo#299
(Circle actually wrongly used a node_modules cache from an old build so no statement possible here, but Travis installed freshly and passed - urgh - always such a pain these things...).

I also tried to run the VM example, I came to the conclusion that this is a separate construction site which we can approach slowly/independently on top of this. Actually got the example running to some extend (I had to add skipBalance: true to the VM options) but it got stuck at some point. (One must say that I couldn't run this on a Byzantium chain cause I didn't manage to do a Geth fast-sync on three (!!) over-the-night sessions, always stuck at some point). Nevertheless I think we are pretty close here, so cool.

Ok. I'll leave this open for another 24 hours for comments.

@vpulim
Copy link
Contributor Author

vpulim commented May 15, 2018

@holgerd77 Awesome! Happy to hear that all tests passed :) Thanks again for all your work on reviewing/testing this PR.

@holgerd77
Copy link
Member

Could you do a review of ethereumjs/ethereumjs-block#44 since you have already looked into the commons library?

@vpulim
Copy link
Contributor Author

vpulim commented May 15, 2018

Sure, I'll take a look at it today.

@holgerd77
Copy link
Member

Ok, will now merge this. Thanks once more @vpulim for this wonderful PR. Will do a subsequent PR with the docs changes and then maybe do a release tomorrow or the day after.

Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@holgerd77 holgerd77 merged commit 78e23e8 into ethereumjs:master May 16, 2018
@holgerd77
Copy link
Member

Short note: documentationjs is currently not generating useful docs, we should move to an ES6 class structure with this - generally also for readability. I'm always a bit unsure if it is safe to distribute ES6 classes as a node package or if this should be converted to ES5 and we should update our build process here (probably still for some time).

So regarding documentation I'll stick to the conservative approach for now and just manually update the README, we can take on the above separately. Will also update the README with the first usage example you posted.

holgerd77 added a commit that referenced this pull request May 16, 2018
holgerd77 added a commit that referenced this pull request May 17, 2018
holgerd77 added a commit that referenced this pull request May 17, 2018
Updated API docs (Geth compatibility PR #47)
@fjl
Copy link

fjl commented May 18, 2018

I would like to note that we do not guarantee stability of the go-ethereum database schema. It can change without notice. You have been warned ;).

@holgerd77
Copy link
Member

@fjl Hehe. Thanks for letting us now, we'll keep this in mind. 😄 Will be useful for us anyhow, minimally for VM testing and development purposes.

@holgerd77
Copy link
Member

Hi @vpulim, just discovered this: for _getBlock() is it intended that the callback is once called
like cb(null, blockTag, number) and in the other clause with cb(null, hash, blockTag), so with reverse order of the blockTag argument?

@holgerd77
Copy link
Member

And any reason you didn't put the height into the meta getter? Wouldn't this be easy to get from the block number from headHeader?

@vpulim
Copy link
Contributor Author

vpulim commented May 23, 2018

@holgerd77 That bit of code is a little confusing to read unfortunately, but yes that is the intention. Both of those callbacks feed their return values (in that order) to the lookupByHashAndNumber function which takes a hash as the first value and number as the second. In the first cb() call, blockTag is a hash and in the second call, blockTag is a number. So in both cases, cb() is being called with hash and number, in that order.

@vpulim
Copy link
Contributor Author

vpulim commented May 23, 2018

@holgerd77 headHeader is just a hash value, not a full header object. So a db get operation must be made in order to retrieve the height (from the block number).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants