-
Notifications
You must be signed in to change notification settings - Fork 362
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
227 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
--- | ||
id: run-lake-indexer | ||
sidebar_label: Running Lake Indexer | ||
--- | ||
|
||
# Running NEAR Lake Indexer | ||
|
||
:::info | ||
|
||
NEAR Lake is a blockchain indexer built on top of [NEAR Indexer microframework](https://github.com/nearprotocol/nearcore/tree/master/chain/indexer) | ||
to watch the network and store all the events as JSON files on AWS S3. | ||
|
||
::: | ||
|
||
## How to start | ||
|
||
The Lake Indexer setup consists of the following components: | ||
* AWS S3 Bucket as a storage | ||
* NEAR Lake binary that operates as a regular NEAR Protocol peer-to-peer node, so you will operate it as | ||
any other [Regular/RPC Node in NEAR](https://near-nodes.io/rpc/hardware-rpc) | ||
|
||
### Prepare Development Environment | ||
|
||
Before you proceed, make sure you have the following software installed: | ||
* [Rust compiler](https://rustup.rs/) of the version that is mentioned in `rust-toolchain` file in the root of | ||
[nearcore](https://github.com/nearprotocol/nearcore) project. | ||
* Ensure you have [AWS Credentials configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) | ||
From AWS Docs: | ||
|
||
> For example, the files generated by the AWS CLI for a default profile configured with aws configure looks similar to the following. | ||
> | ||
> ~/.aws/credentials | ||
> ``` | ||
> [default] | ||
> aws_access_key_id=AKIAIOSFODNN7EXAMPLE | ||
> aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY | ||
> ``` | ||
### Compile NEAR Lake | ||
|
||
```bash | ||
$ cargo build --release | ||
``` | ||
|
||
### Configure NEAR Lake | ||
|
||
To connect NEAR Lake to the specific chain you need to have necessary configs, you can generate it as follows: | ||
|
||
```bash | ||
$ ./target/release/near-lake --home ~/.near/testnet init --chain-id testnet --download-config --download-genesis | ||
``` | ||
|
||
The above code will download the official genesis config and generate necessary configs. You can replace `testnet` in the command above to different network ID (`betanet`, `mainnet`). | ||
|
||
:::info nearcore configuration | ||
|
||
According to changes in `nearcore` config generation we don't fill all the necessary fields in the config file. | ||
While [this issue is open](https://github.com/nearprotocol/nearcore/issues/3156) you need to download config you want and replace the generated one manually. | ||
- [testnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/config.json) | ||
- [betanet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/betanet/config.json) | ||
- [mainnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json) | ||
|
||
::: | ||
|
||
Configs for the specified network are in the `--home` provided folder. We need to ensure that NEAR Lake follows | ||
all the necessary shards, so `"tracked_shards"` parameters in `~/.near/testnet/config.json` needs to be configured properly. | ||
Currently, `nearcore` treats empty value for `"tracked_shards"` as "do not track any shard" and **any value** as "track all shards". | ||
For example, in order to track all shards, you just add the shard #0 to the list: | ||
|
||
``` | ||
... | ||
"tracked_shards": [0], | ||
... | ||
``` | ||
|
||
### Run NEAR Lake | ||
|
||
Commands to run NEAR Lake, after `./target/release/near-lake` | ||
|
||
| Command | Key/Subcommand | Required/Default | Responsible for | | ||
|--------- |-------------------------- |------------------------------------------------------------------ |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `--home` | Default <br />`~/.near` | Tells the node where too look for necessary files: <br />`config.json`<br />, <br />`genesis.json`<br />, <br />`node_key.json`<br />, and <br />`data`<br /> folder | | ||
| `init` | | | Tells the node to generate config files in `--home-dir` | | ||
| | `--chain-id` | Required<br /><br /> * `localnet`<br /> * `testnet`<br /> * `mainnet` | Defines the chain to generate config files for | | ||
| | `--download-config` | Optional | If provided tells the node to download `config.json` from the public URL. You can download them manually<br /><br /> - [testnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/config.json)<br /> - [mainnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json) | | ||
| | `--download-genesis` | Optional | If provided tells the node to download `genesis.json` from the public URL. You can download them manually<br /><br /> - [testnet genesis.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/genesis.json)<br /> - [mainnet genesis.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/genesis.json) | | ||
| | TODO:<br />Other `neard` keys | | | | ||
| `run` | | | Runs the node | | ||
| | `--bucket` | Required | AWS S3 Bucket name | | ||
| | `--region` | Required | AWS S3 Bucket region | | ||
| | `--fallback-region` | Default eu-central-1 | AWS S3 Fallback region | | ||
| | `--endpoint` | Optional | AWS S3 compatible API endpoint | | ||
| | `--stream-while-syncing` | Optional | If provided Indexer streams blocks while they appear on the node instead of waiting the node to be fully synced | | ||
| | `--concurrency` | Default 1 | Defines the concurrency for the process of saving block data to AWS S3 | | ||
| | `sync-from-latest` | One of the `sync-` subcommands is required | Tells the node to start indexing from the latest block in the network | | ||
| | `sync-from-interruption` | One of the `sync-` subcommands is required | Tells the node to start indexing from the block the node was interrupted on (if it is a first start it will fallback to `sync-from-latest`) | | ||
| | `sync-from-block --height N` | One of the <br />`sync-`<br /> subcommands is required | Tells the node to start indexing from the specified block height `N` (**Ensure** you node data has the block you want to start from) | | ||
|
||
```bash | ||
$ ./target/release/near-lake --home ~/.near/testnet run --stream-while-syncing --concurrency 50 sync-from-latest | ||
``` | ||
|
||
After the network is synced, you should see logs of every block height currently received by NEAR Lake. | ||
|
||
|
||
## Syncing | ||
|
||
Whenever you run NEAR Lake for any network except localnet you'll need to sync with the network. | ||
This is required because it's a natural behavior of `nearcore` node and NEAR Lake is a wrapper | ||
for the regular `nearcore` node. In order to work and index the data your node must be synced | ||
with the network. This process can take a while, so we suggest to download a fresh backup of | ||
the `data` folder and put it in you `--home-dir` of your choice (by default it is `~/.near`) | ||
|
||
:::tip | ||
Running your NEAR Lake node on top of a backup data will reduce the time of syncing process | ||
because your node will download only the data after the backup was cut and it takes reasonable amount time. | ||
::: | ||
|
||
All the backups can be downloaded from the public S3 bucket which contains latest daily snapshots: | ||
|
||
You will need [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) to be installed in order to download the backups. | ||
|
||
### Mainnet | ||
|
||
``` | ||
$ aws s3 --no-sign-request cp s3://near-protocol-public/backups/mainnet/rpc/latest . | ||
$ LATEST=$(cat latest) | ||
$ aws s3 --no-sign-request cp --no-sign-request --recursive s3://near-protocol-public/backups/mainnet/rpc/$LATEST ~/.near/data | ||
``` | ||
|
||
### Testnet | ||
|
||
``` | ||
$ aws s3 --no-sign-request cp s3://near-protocol-public/backups/testnet/rpc/latest . | ||
$ LATEST=$(cat latest) | ||
$ aws s3 --no-sign-request cp --no-sign-request --recursive s3://near-protocol-public/backups/testnet/rpc/$LATEST ~/.near/data | ||
``` | ||
|
||
|
||
|
||
## Running NEAR Lake as an archival node | ||
|
||
It's not necessary but in order to index everything in the network it is better to do it from the genesis. | ||
`nearcore` node is running in non-archival mode by default. That means that the node keeps data only | ||
for [5 last epochs](https://docs.near.org/concepts/basics/epoch). In order to index data from the genesis | ||
you need to turn the node in archival mode. | ||
|
||
To do it you need to update `config.json` located in `--home-dir` (by default it is `~/.near`). | ||
|
||
Find next keys in the config and update them as following: | ||
|
||
```json | ||
{ | ||
... | ||
"archive": true, | ||
"tracked_shards": [0], | ||
... | ||
} | ||
``` | ||
|
||
The syncing process in archival mode can take a lot of time, so it's better to download a backup provided by NEAR | ||
and put it in your `data` folder. After that your node will download only the data after the backup was cut and it | ||
takes reasonable amount time. | ||
|
||
|
||
All the backups can be downloaded from the public S3 bucket which contains the latest daily snapshots: | ||
|
||
* [Archival Mainnet data folder](https://near-protocol-public.s3-accelerate.amazonaws.com/backups/mainnet/archive/data.tar) | ||
* [Archival Testnet data folder](https://near-protocol-public.s3-accelerate.amazonaws.com/backups/testnet/archive/data.tar) | ||
|
||
See [this link](https://near-nodes.io/archival/run-archival-node-with-nearup) for reference | ||
|
||
## Using the data | ||
|
||
We write all the data to AWS S3 buckets: | ||
|
||
- `near-lake-data-testnet` (`eu-central-1` region) for testnet | ||
- `near-lake-data-mainnet` (`eu-central-1` region) for mainnet | ||
|
||
## Custom S3 storage | ||
|
||
In case you want to run you own `near-lake` instance and store data in some S3 compatible storage ([Minio](https://min.io/) or [Localstack](https://localstack.cloud/) as example) | ||
You can override default S3 API endpoint by using `--endpoint` option | ||
|
||
- run `minio` | ||
|
||
```bash | ||
$ mkdir -p /data/near-lake-custom && minio server /data | ||
``` | ||
|
||
- run `near-lake` | ||
|
||
```bash | ||
$ ./target/release/near-lake --home ~/.near/testnet run --endpoint http://127.0.0.1:9000 --bucket near-lake-custom sync-from-latest | ||
``` | ||
|
||
### Data structure | ||
|
||
The data structure we use is the following: | ||
|
||
``` | ||
<block_height>/ | ||
block.json | ||
shard_0.json | ||
shard_1.json | ||
... | ||
shard_N.json | ||
``` | ||
|
||
- `<block_height>` is a 12-character-long `u64` string with leading zeros (e.g `000042839521`). [See this issue for a reasoning](https://github.com/near/near-lake/issues/23) | ||
- `block_json` contains JSON-serialized [`BlockView`](https://github.com/near/nearcore/blob/e9a28c46c2bea505b817abf484e6015a61ea7d01/core/primitives/src/views.rs#L711-L716) struct. **Note:** this struct might change in the future, we will announce it | ||
- `shard_N.json` where `N` is `u64` starting from `0`. Represents the index number of the shard. In order to find out the expected number of shards in the block you can look in `block.json` at `.header.chunks_included` | ||
|
||
### Access the data | ||
|
||
All NEAR Lake AWS S3 buckets have [Request Payer](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RequesterPaysBuckets.html) enabled. It means that anyone with their own AWS credentials can List and Read the bucket's content and **be charged for it by AWS**. Connections to the bucket have to be done with AWS credentials provided. See [NEAR Lake Framework](https://github.com/near/near-lake-framework) for a reference. | ||
|
||
### NEAR Lake Framework | ||
|
||
Once we [set up the public access to the buckets](https://github.com/near/near-lake/issues/22) anyone will be able to build their own code to read it through. | ||
|
||
For our own needs we are working on [NEAR Lake Framework](https://github.com/near/near-lake-framework) to have a simple way to create an indexer on top of the data stored by NEAR Lake itself. | ||
|
||
:::note | ||
See the official NEAR Lake Framework [announcement on the NEAR Gov Forum](https://gov.near.org/t/announcement-near-lake-framework-brand-new-word-in-indexer-building-approach/17668). | ||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters