Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk I/O efficiency improvements #585

Closed
braydonf opened this issue Aug 23, 2018 · 11 comments · Fixed by #758
Closed

Disk I/O efficiency improvements #585

braydonf opened this issue Aug 23, 2018 · 11 comments · Fixed by #758
Labels
blockstore enhancement Improving a current feature indexer stability / efficiency Denial of service, better resource usage

Comments

@braydonf
Copy link
Contributor

braydonf commented Aug 23, 2018

Versions
bcoin v1.0.2


Disk Usage

2x reduction of disk usage possible with txindex enabled

When the transaction index is enabled (to be able to query transactions by txid) an entire copy of a transaction is made (see lib/primitives/txmeta.js), instead of a pointer to the block data. This means there is a duplicate data being stored on disk. By deduplicating this data, there would be a 2x reduction of disk usage.

Disk Read/Writes

5x reduction in the number of disk reads/writes for blocks possible

LevelDB uses a log-structured merge-tree (LSM) starting at level 0, and moves data downwards in levels (see https://github.com/google/leveldb/blob/master/doc/impl.md#sorted-tables). The same key/value pair can be read and written several times as it's sorted into tables. This comes with the benefit of query performance as the keys are sorted on-disk, and sequential reads of similar keys is fast. However it's rare to query from all transactions sorted by txid, so the isn't gain from the on-disk sorting. Furthermore, read performance can be achieved via small indexes that point to larger portions of data on disk.

The use of db.compactRange() in lib/blockchain/chaindb.js reduces the performance impacts of that block compaction at query time by eagerly sorting. I've noticed that this can take many hours, after the initial block download.


Are there other areas that could be optimized?

@braydonf braydonf mentioned this issue Aug 23, 2018
3 tasks
@pinheadmz
Copy link
Member

Worth mentioning here that the disk space can also be essentially reclaimed by running a pruned node with indexes on. I have such a configuration and the disk usage is currently ~288GB. So I dunno how this works at the read/write efficiency level, is the indexed database faster than re-reading blocks from disk?

@braydonf
Copy link
Contributor Author

A Proposal

Blocks are stored into a directory structure, using the last 4 bytes of the block hash, similar to git objects. The structure is /e2/6f/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f.block for each block. This reduces the writes for each block to only once. Requests for blocks for serving new nodes joining the network is a direct file read.

The txindex, for archival cases, uses a LevelDB key/value store and map a txid to a blockhash (possibly compacted or truncated), byte position, and size. A query for a transaction reads the index, ideally from memory, and then the transaction from disk at the block file position. Other indexes would be similar or point to other indexes.

For the browser IndexedDB can be used, blocks would be queried by hash, and the API would be shared between the two storage backends.

@braydonf
Copy link
Contributor Author

Thanks @pinheadmz. I hadn't realized it was possible to setup that configuration, and it helped in realizing the tx was duplicated. Could be interesting to consider indexing while pruned, and that it will work still for coins. Currently, it may also have the unintentionally affect of a node not being able to serve blocks to peers, even though the data is stored.

@tuxcanfly
Copy link
Member

Agreed, the other aspect of optimization would be to speed up the initial block sync, and explore how to optimize disk layout. For example, btcd writes in chunks of 512MB files and includes a checksum to prevent file corruption.

@braydonf
Copy link
Contributor Author

Yeah, bitcoind does similar with writing multiple blocks per file, and then uses LevelDB to index a block hash to a file and position.

There is/was (haven't checked recently) an issue where the size isn't included in one of the indexes, and it makes for some unnecessary parsing, that may have been for tx queries. The size is something that should be included in the index, so that it can be read, and possibly not parsed.

@braydonf braydonf changed the title Disk I/O Efficiency Improvements Disk I/O efficiency improvements Aug 25, 2018
@braydonf
Copy link
Contributor Author

Existing work with using files to store blocks (thanks @tuxcanfly):

Long standing issue #107 is also related

@braydonf
Copy link
Contributor Author

Max open files could be an issue with storing a block per file, from the earlier proposal, so keeping an index of block hash to file and position, would help there. This is how bitcoind stores blocks, and I think for btcd as well.

@braydonf
Copy link
Contributor Author

braydonf commented Aug 28, 2018

I had originally based the 7x disk read/write improvement based on there being 7 levels. The actual improvements should be around 5x as there are 5 levels in bcoin usage.

When level-0 files grow to over 4 files, the files are merged together with level-1 files, this process is repeated for multiple levels. When the combined size of files in level-L exceeds (10^L) MB, one file in level-L, and all of the overlapping files in level-(L+1) are merged to form a set of new files for level-(L+1) (see https://github.com/google/leveldb/blob/master/doc/impl.md#sorted-tables). Here is a table with the levels calculated:

level-1 = 10MB
level-2 = 100MB
level-3 = 1GB
level-4 = 10GB
level-5 = 100GB
level-6 = 1TB

Here is the current size of the chain database with indexes enabled at 462GB:

$ du -h /home/bcoin/.bcoin/
20M	/home/bcoin/.bcoin/wallet
462G	/home/bcoin/.bcoin/chain
462G	/home/bcoin/.bcoin/

And another look by separating the indexes from the chain db, as in PR #424:

du -h /home/bcoin/.bcoin/
20M	/home/bcoin/.bcoin/wallet
77G	/home/bcoin/.bcoin/index/addr
174G	/home/bcoin/.bcoin/index/tx
250G	/home/bcoin/.bcoin/index
160G	/home/bcoin/.bcoin/chain
409G	/home/bcoin/.bcoin/

Note: The chain is generally smaller than above because it's not completely synced.

In both of these cases there are 5 levels.

@braydonf
Copy link
Contributor Author

braydonf commented Aug 29, 2018

For comparison, here is Bitcoin Core v0.16.2 with indexing enabled:

du -h /home/bitcoin-core/.bitcoin/
17G	/home/bitcoin-core/.bitcoin/blocks/index
208G	/home/bitcoin-core/.bitcoin/blocks
2.7G	/home/bitcoin-core/.bitcoin/chainstate
211G	/home/bitcoin-core/.bitcoin/

Note: This is sans an address index as it's not available.

The largest LevelDB database here is 17GB, so it would be at around 4 levels, and a rough estimation of 68GB read/writes during compaction lifetime of the index database, in comparison with an estimated 2.3TB of read/write with a 462GB database.

@braydonf
Copy link
Contributor Author

As background: I followed commits back for the origin of txindex. The txindex originated as the primary way to store blocks, and hence the current format.

It wasn't until undo coins were introduced was the layout changed to store blocks "as-is" instead of a compact block with txid references. Presumably it was to improve performance of reading blocks from disk. You can see the change at commit 845a987.

@braydonf
Copy link
Contributor Author

This is mostly resolved now with #703.

There will be a follow-up PR to resolve the duplication of the txindex, currently at the indexer branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blockstore enhancement Improving a current feature indexer stability / efficiency Denial of service, better resource usage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants