Disk I/O efficiency improvements #585

braydonf · 2018-08-23T21:14:52Z

Versions
bcoin v1.0.2

Disk Usage

2x reduction of disk usage possible with txindex enabled

When the transaction index is enabled (to be able to query transactions by txid) an entire copy of a transaction is made (see lib/primitives/txmeta.js), instead of a pointer to the block data. This means there is a duplicate data being stored on disk. By deduplicating this data, there would be a 2x reduction of disk usage.

Disk Read/Writes

5x reduction in the number of disk reads/writes for blocks possible

LevelDB uses a log-structured merge-tree (LSM) starting at level 0, and moves data downwards in levels (see https://github.com/google/leveldb/blob/master/doc/impl.md#sorted-tables). The same key/value pair can be read and written several times as it's sorted into tables. This comes with the benefit of query performance as the keys are sorted on-disk, and sequential reads of similar keys is fast. However it's rare to query from all transactions sorted by txid, so the isn't gain from the on-disk sorting. Furthermore, read performance can be achieved via small indexes that point to larger portions of data on disk.

The use of db.compactRange() in lib/blockchain/chaindb.js reduces the performance impacts of that block compaction at query time by eagerly sorting. I've noticed that this can take many hours, after the initial block download.

Are there other areas that could be optimized?

The text was updated successfully, but these errors were encountered:

pinheadmz · 2018-08-24T01:32:03Z

Worth mentioning here that the disk space can also be essentially reclaimed by running a pruned node with indexes on. I have such a configuration and the disk usage is currently ~288GB. So I dunno how this works at the read/write efficiency level, is the indexed database faster than re-reading blocks from disk?

braydonf · 2018-08-24T03:37:13Z

A Proposal

Blocks are stored into a directory structure, using the last 4 bytes of the block hash, similar to git objects. The structure is /e2/6f/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f.block for each block. This reduces the writes for each block to only once. Requests for blocks for serving new nodes joining the network is a direct file read.

The txindex, for archival cases, uses a LevelDB key/value store and map a txid to a blockhash (possibly compacted or truncated), byte position, and size. A query for a transaction reads the index, ideally from memory, and then the transaction from disk at the block file position. Other indexes would be similar or point to other indexes.

For the browser IndexedDB can be used, blocks would be queried by hash, and the API would be shared between the two storage backends.

braydonf · 2018-08-24T03:47:32Z

Thanks @pinheadmz. I hadn't realized it was possible to setup that configuration, and it helped in realizing the tx was duplicated. Could be interesting to consider indexing while pruned, and that it will work still for coins. Currently, it may also have the unintentionally affect of a node not being able to serve blocks to peers, even though the data is stored.

tuxcanfly · 2018-08-24T09:21:14Z

Agreed, the other aspect of optimization would be to speed up the initial block sync, and explore how to optimize disk layout. For example, btcd writes in chunks of 512MB files and includes a checksum to prevent file corruption.

braydonf · 2018-08-24T20:28:53Z

Yeah, bitcoind does similar with writing multiple blocks per file, and then uses LevelDB to index a block hash to a file and position.

There is/was (haven't checked recently) an issue where the size isn't included in one of the indexes, and it makes for some unnecessary parsing, that may have been for tx queries. The size is something that should be included in the index, so that it can be read, and possibly not parsed.

braydonf · 2018-08-28T18:32:18Z

Existing work with using files to store blocks (thanks @tuxcanfly):

Long standing issue #107 is also related

braydonf · 2018-08-28T18:39:18Z

Max open files could be an issue with storing a block per file, from the earlier proposal, so keeping an index of block hash to file and position, would help there. This is how bitcoind stores blocks, and I think for btcd as well.

braydonf · 2018-08-28T20:42:31Z

I had originally based the 7x disk read/write improvement based on there being 7 levels. The actual improvements should be around 5x as there are 5 levels in bcoin usage.

When level-0 files grow to over 4 files, the files are merged together with level-1 files, this process is repeated for multiple levels. When the combined size of files in level-L exceeds (10^L) MB, one file in level-L, and all of the overlapping files in level-(L+1) are merged to form a set of new files for level-(L+1) (see https://github.com/google/leveldb/blob/master/doc/impl.md#sorted-tables). Here is a table with the levels calculated:

level-1 = 10MB
level-2 = 100MB
level-3 = 1GB
level-4 = 10GB
level-5 = 100GB
level-6 = 1TB

Here is the current size of the chain database with indexes enabled at 462GB:

$ du -h /home/bcoin/.bcoin/
20M	/home/bcoin/.bcoin/wallet
462G	/home/bcoin/.bcoin/chain
462G	/home/bcoin/.bcoin/

And another look by separating the indexes from the chain db, as in PR #424:

du -h /home/bcoin/.bcoin/
20M	/home/bcoin/.bcoin/wallet
77G	/home/bcoin/.bcoin/index/addr
174G	/home/bcoin/.bcoin/index/tx
250G	/home/bcoin/.bcoin/index
160G	/home/bcoin/.bcoin/chain
409G	/home/bcoin/.bcoin/

Note: The chain is generally smaller than above because it's not completely synced.

In both of these cases there are 5 levels.

braydonf · 2018-08-29T00:25:13Z

For comparison, here is Bitcoin Core v0.16.2 with indexing enabled:

du -h /home/bitcoin-core/.bitcoin/
17G	/home/bitcoin-core/.bitcoin/blocks/index
208G	/home/bitcoin-core/.bitcoin/blocks
2.7G	/home/bitcoin-core/.bitcoin/chainstate
211G	/home/bitcoin-core/.bitcoin/

Note: This is sans an address index as it's not available.

The largest LevelDB database here is 17GB, so it would be at around 4 levels, and a rough estimation of 68GB read/writes during compaction lifetime of the index database, in comparison with an estimated 2.3TB of read/write with a 462GB database.

braydonf · 2018-10-23T18:31:34Z

As background: I followed commits back for the origin of txindex. The txindex originated as the primary way to store blocks, and hence the current format.

It wasn't until undo coins were introduced was the layout changed to store blocks "as-is" instead of a compact block with txid references. Presumably it was to improve performance of reading blocks from disk. You can see the change at commit 845a987.

braydonf · 2019-04-12T20:36:48Z

This is mostly resolved now with #703.

There will be a follow-up PR to resolve the duplication of the txindex, currently at the indexer branch.

braydonf mentioned this issue Aug 23, 2018

refactor indexers #424

Closed

3 tasks

braydonf changed the title ~~Disk I/O Efficiency Improvements~~ Disk I/O efficiency improvements Aug 25, 2018

bucko13 mentioned this issue Aug 28, 2018

CPU exhaustion with address index HTTP API queries #589

Closed

pinheadmz added enhancement Improving a current feature stability / efficiency Denial of service, better resource usage labels Jan 21, 2019

pinheadmz mentioned this issue Jan 24, 2019

"bwallet-cli admin wallets" Error: socket hang up #674

Closed

braydonf mentioned this issue Feb 20, 2019

Add file block storage #703

Merged

braydonf added indexer blockstore labels Mar 26, 2019

braydonf mentioned this issue Apr 12, 2019

Indexer fixes and improvements #758

Merged

braydonf closed this as completed in #758 May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk I/O efficiency improvements #585

Disk I/O efficiency improvements #585

braydonf commented Aug 23, 2018 •

edited

Loading

pinheadmz commented Aug 24, 2018

braydonf commented Aug 24, 2018

braydonf commented Aug 24, 2018

tuxcanfly commented Aug 24, 2018

braydonf commented Aug 24, 2018

braydonf commented Aug 28, 2018

braydonf commented Aug 28, 2018

braydonf commented Aug 28, 2018 •

edited

Loading

braydonf commented Aug 29, 2018 •

edited

Loading

braydonf commented Oct 23, 2018

braydonf commented Apr 12, 2019

Disk I/O efficiency improvements #585

Disk I/O efficiency improvements #585

Comments

braydonf commented Aug 23, 2018 • edited Loading

pinheadmz commented Aug 24, 2018

braydonf commented Aug 24, 2018

braydonf commented Aug 24, 2018

tuxcanfly commented Aug 24, 2018

braydonf commented Aug 24, 2018

braydonf commented Aug 28, 2018

braydonf commented Aug 28, 2018

braydonf commented Aug 28, 2018 • edited Loading

braydonf commented Aug 29, 2018 • edited Loading

braydonf commented Oct 23, 2018

braydonf commented Apr 12, 2019

braydonf commented Aug 23, 2018 •

edited

Loading

braydonf commented Aug 28, 2018 •

edited

Loading

braydonf commented Aug 29, 2018 •

edited

Loading