This project provides a simple and easy way to download NEAR blockchain archival data from NEAR Lake using BitTorrent.
NEAR Lake on S3 is organized in folders by block height only. Folder name is just padded block height like 000042007123/ and contains all the data for that block. Inside the folder, there are multiple files:
block.json
- block headershard_0.json
- shard 0 datashard_*.json
- other shards data
This doesn't work well for local FS (and as result torrents) as there are too many folders on top level. It also isn't compressed which results in data transfers being costlier than needed.
To solve this, we have a script that takes NEAR Lake data and reorganizes it into a more efficient structure. It also compresses the data to save on transfer costs.
Check out the load-raw-near-lake for more details.
File structure generated by the script:
Top level:
block
- block header0
- data for shard 0*
- data for other shards
Inside each shard folder:
000042/007/000042007120.tgz
- data for blocks 42007120-42007124 (assuming 5 blocks per archive)
Inside of .tgz archive:
000042007120.json
- data for block 42007120000042007121.json
- data for block 42007121000042007122.json
- data for block 42007122000042007123.json
- data for block 42007123000042007124.json
- data for block 42007124
Right now every million blocks are split into a separate torrent, so that it's easier to download only the data you need.
- Download magnet link for relevant data (structured in this repo like
network_id/shard_id.csv
, e.g.mainnet/0.csv
) - Add desired magnet link to your BitTorrent client for download. For example for Transmission using
transmission-remote
:# Download all data for shard 0 for magnet in $(cat mainnet/0.csv); do transmission-remote -a "$magnet" --download-dir /global/path/to/download/0 done