CAUTION: Code is now maintained in repository Graphsense Spark
The GraphSense Transformation Pipeline reads raw block data, which is ingested into Apache Cassandra by the graphsense-blocksci / graphsense-bitcoin-etl component. The transformation pipeline computes de-normalized views using Apache Spark, which are again stored in Cassandra.
Access to computed de-normalized views is subsequently provided by the GraphSense REST interface, which is used by the graphsense-dashboard component.
This component is implemented in Scala using Apache Spark.
Make sure Java 8 and sbt >= 1.0 is installed:
java -version
sbt about
Download, install, and run Apache Spark (version 3.2.1)
in $SPARK_HOME
:
$SPARK_HOME/sbin/start-master.sh
Download, install, and run Apache Cassandra
(version >= 3.11) in $CASSANDRA_HOME
$CASSANDRA_HOME/bin/cassandra -f
Run the following script for ingesting raw block test data
./scripts/ingest_test_data.sh
This should create a keyspace btc_raw
(tables exchange_rates
,
transaction
, block
, block_transactions
). Check as follows
cqlsh localhost
cqlsh> USE btc_raw;
cqlsh:btc_raw> DESCRIBE tables;
Create the target keyspace for transformed data
cqlsh -f scripts/schema_transformed.cql
Compile and test the implementation
sbt test
Package the transformation pipeline
sbt package
Run the transformation pipeline on localhost
./submit.sh
macOS only: make sure gnu-getopt
is installed (brew install gnu-getopt
).
Check the running job using the local Spark UI at http://localhost:4040/jobs
Use the submit.sh
script and specify the Spark master node
(e.g., -s spark://SPARK_MASTER_IP:7077
) and other options:
./submit.sh -h
Usage: submit.sh [-h] [-m MEMORY_GB] [-c CASSANDRA_HOST] [-s SPARK_MASTER]
[--currency CURRENCY]
[--raw_keyspace RAW_KEYSPACE]
[--tgt_keyspace TGT_KEYSPACE]
[--bucket_size BUCKET_SIZE]
[--bech32-prefix BECH32_PREFIX]
[--checkpoint-dir CHECKPOINT_DIR]
[--coinjoin-filtering]