-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk I/O efficiency improvements #585
Comments
Worth mentioning here that the disk space can also be essentially reclaimed by running a pruned node with indexes on. I have such a configuration and the disk usage is currently ~288GB. So I dunno how this works at the read/write efficiency level, is the indexed database faster than re-reading blocks from disk? |
A Proposal Blocks are stored into a directory structure, using the last 4 bytes of the block hash, similar to git objects. The structure is The txindex, for archival cases, uses a LevelDB key/value store and map a txid to a blockhash (possibly compacted or truncated), byte position, and size. A query for a transaction reads the index, ideally from memory, and then the transaction from disk at the block file position. Other indexes would be similar or point to other indexes. For the browser IndexedDB can be used, blocks would be queried by hash, and the API would be shared between the two storage backends. |
Thanks @pinheadmz. I hadn't realized it was possible to setup that configuration, and it helped in realizing the tx was duplicated. Could be interesting to consider indexing while pruned, and that it will work still for coins. Currently, it may also have the unintentionally affect of a node not being able to serve blocks to peers, even though the data is stored. |
Agreed, the other aspect of optimization would be to speed up the initial block sync, and explore how to optimize disk layout. For example, btcd writes in chunks of 512MB files and includes a checksum to prevent file corruption. |
Yeah, There is/was (haven't checked recently) an issue where the |
Existing work with using files to store blocks (thanks @tuxcanfly):
Long standing issue #107 is also related |
Max open files could be an issue with storing a block per file, from the earlier proposal, so keeping an index of block hash to file and position, would help there. This is how |
I had originally based the 7x disk read/write improvement based on there being 7 levels. The actual improvements should be around 5x as there are 5 levels in bcoin usage. When level-0 files grow to over 4 files, the files are merged together with level-1 files, this process is repeated for multiple levels. When the combined size of files in level-L exceeds (10^L) MB, one file in level-L, and all of the overlapping files in level-(L+1) are merged to form a set of new files for level-(L+1) (see https://github.com/google/leveldb/blob/master/doc/impl.md#sorted-tables). Here is a table with the levels calculated:
Here is the current size of the chain database with indexes enabled at 462GB:
And another look by separating the indexes from the chain db, as in PR #424:
Note: The chain is generally smaller than above because it's not completely synced. In both of these cases there are 5 levels. |
For comparison, here is Bitcoin Core v0.16.2 with indexing enabled:
Note: This is sans an address index as it's not available. The largest LevelDB database here is 17GB, so it would be at around 4 levels, and a rough estimation of 68GB read/writes during compaction lifetime of the index database, in comparison with an estimated 2.3TB of read/write with a 462GB database. |
As background: I followed commits back for the origin of txindex. The txindex originated as the primary way to store blocks, and hence the current format. It wasn't until undo coins were introduced was the layout changed to store blocks "as-is" instead of a compact block with txid references. Presumably it was to improve performance of reading blocks from disk. You can see the change at commit 845a987. |
Versions
bcoin v1.0.2
Disk Usage
2x reduction of disk usage possible with txindex enabled
When the transaction index is enabled (to be able to query transactions by txid) an entire copy of a transaction is made (see
lib/primitives/txmeta.js
), instead of a pointer to the block data. This means there is a duplicate data being stored on disk. By deduplicating this data, there would be a 2x reduction of disk usage.Disk Read/Writes
5x reduction in the number of disk reads/writes for blocks possible
LevelDB uses a log-structured merge-tree (LSM) starting at level 0, and moves data downwards in levels (see https://github.com/google/leveldb/blob/master/doc/impl.md#sorted-tables). The same key/value pair can be read and written several times as it's sorted into tables. This comes with the benefit of query performance as the keys are sorted on-disk, and sequential reads of similar keys is fast. However it's rare to query from all transactions sorted by txid, so the isn't gain from the on-disk sorting. Furthermore, read performance can be achieved via small indexes that point to larger portions of data on disk.
The use of
db.compactRange()
inlib/blockchain/chaindb.js
reduces the performance impacts of that block compaction at query time by eagerly sorting. I've noticed that this can take many hours, after the initial block download.Are there other areas that could be optimized?
The text was updated successfully, but these errors were encountered: