Table of Contents generated with DocToc
Chainsformer is an Apache Arrow Flight service built on top of ChainStorage as a stateless adaptor service. It currently supports batch data processing and micro batch data streaming from ChainStorage service to the Spark data processing platform.
It aims to provide a set of easy to use interfaces to support spark consumers to read and process ChainStorage Data on the Spark platform:
- It defines a set of standardized block and transaction data schema for each asset class (i.e EVM assets or bitcoin).
- It provides data transformation capability from protobuf to Arrow format.
- It can be easily scaled up to support higher data throughput.
- It can be easily integrated via the Chainsformer Spark Connector (LINK TO BE ADDED LATTER) for structured data streaming.
Make sure your local go version is 1.18 by running the following commands:
brew install go@1.18
brew unlink go
brew link go@1.18
brew install protobuf@3.21.12
brew unlink protobuf
brew link protobuf
To set up for the first time (only done once):
make bootstrap
Rebuild everything:
make build
Chainsformer depends on the following environment variables to resolve the path of the configuration.
The directory structure is as follows: config/chainsformer/{blockchain}/{network}/{environment}.yml
.
CHAINSFORMER_CONFIG
: This env var, in the format of{blockchain}-{network}
, determines the blockchain and network managed by the service. The naming is defined in chainstorage/protos/coinbase/c3/common/common.protpCHAINSFORMER_ENVIRONMENT
: This env var controls the{environment}
in which the service is deployed. Possible values includeproduction
,development
, andlocal
(which is also the default value).
Asset specific configurations are stored in the config
directory under the Chainsformer service repo. The config folder structure follows the following form ./config/chainsformer/{blockchain}/{network}/base.yml
- Simply follow the config folder structure to add new configurations for any new blockchains or new networks of existing blockchains.
- Add new tests in the config_test.go
- Add new test configs in teh testapp.go
Clone the Chainsformer service repo:
git clone https://github.com/coinbase/chainsformer.git
Change directory to the Chainsformer service repo:
cd chainsformer
Setup Chainstorage SDK credentials
export CHAINSTORAGE_SDK_AUTH_HEADER=cb-nft-api-token
export CHAINSTORAGE_SDK_AUTH_TOKEN=****
To set up Chainsformer for the first time (only done once):
make bootstrap
Rebuild Chainsformer:
make build
Start the Chainsformer service with default CHAINSFORMER_CONFIG=ethereum-mainnet
:
make server
Query Chainsformer for a range of blocks
go run ./cmd/client --env local --blockchain ethereum --network mainnet --start 0 --end 10 --table blocks
Query Chainsformer for a range of block events
go run ./cmd/client --env local --blockchain ethereum --network mainnet --start 0 --end 10 --table streamed_blocks
Calling the GetSchema
API
cmd=$(echo -n '{"table": "blocks"}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetSchema
Calling the GetFlightInfo
API to partition the data
cmd=$(echo -n '{"batch_query": {"start_height": 0, "end_height": 10, "table": "blocks"}}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetFlightInfo
Take one of the ticket
returned by the above command
...
"endpoint": [
{
"ticket": {
"ticket": "eyJiYXRjaF9xdWVyeSI6eyJlbmRfaGVpZ2h0IjoiMTAiLCJ0YWJsZSI6ImJsb2NrcyJ9fQ=="
}
}
]
...
Calling the DoGet
API to get data for one of the partition
grpcurl --plaintext -d '{"ticket": "eyJiYXRjaF9xdWVyeSI6eyJlbmRfaGVpZ2h0IjoiMTAiLCJ0YWJsZSI6ImJsb2NrcyJ9fQ=="}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoGet
API to get data of a specific partition
cmd=$(echo -n '{"batch_query":{"start_height":"1", "end_height":"2", "table":"blocks"}}' | base64)
grpcurl --plaintext -d '{"ticket": '"\"$cmd\""'}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoAction
API to get the tip in ChainStorage via Chainsformer
grpcurl --plaintext -d '{"type": "TIP"}' localhost:9090 arrow.flight.protocol.FlightService.DoAction | jq '.body | @base64d'
Calling the GetSchema
API
cmd=$(echo -n '{"table": "streamed_blocks"}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetSchema
Calling the GetFlightInfo
API to partition the data
cmd=$(echo -n '{"stream_query": {"start_sequence": 0, "end_sequence": 10, "table": "streamed_blocks"}}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetFlightInfo
Take one of the ticket
returned by the above command
...
"endpoint": [
{
"ticket": {
"ticket": "eyJzdHJlYW1fcXVlcnkiOnsic3RhcnRfc2VxdWVuY2UiOiIxIiwiZW5kX3NlcXVlbmNlIjoiMTAiLCJ0YWJsZSI6InN0cmVhbWVkX2Jsb2NrcyJ9fQ=="
}
}
]
...
Calling the DoGet
API to get data for one of the partition
grpcurl --plaintext -d '{"ticket": "eyJzdHJlYW1fcXVlcnkiOnsic3RhcnRfc2VxdWVuY2UiOiIxIiwiZW5kX3NlcXVlbmNlIjoiMTAiLCJ0YWJsZSI6InN0cmVhbWVkX2Jsb2NrcyJ9fQ=="}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoGet
API to get data of a specific partition
cmd=$(echo -n '{"stream_query":{"start_sequence":"1", "end_sequence":"2", "table":"streamed_blocks"}}' | base64)
grpcurl --plaintext -d '{"ticket": '"\"$cmd\""'}' localhost:9090 arrow.flight.protocol.FlightService.DoGet
Calling the DoAction
API to get the tip in ChainStorage via Chainsformer
grpcurl --plaintext -d '{"type": "STREAM_TIP"}' localhost:9090 arrow.flight.protocol.FlightService.DoAction | jq '.body | @base64d'
# Run everything
make test
Under development
Under development