GraphSense Transformation Pipeline (Moved to graphsense-spark)

CAUTION: Code is now maintained in repository Graphsense Spark

The GraphSense Transformation Pipeline reads raw block data, which is ingested into Apache Cassandra by the graphsense-blocksci / graphsense-bitcoin-etl component. The transformation pipeline computes de-normalized views using Apache Spark, which are again stored in Cassandra.

Access to computed de-normalized views is subsequently provided by the GraphSense REST interface, which is used by the graphsense-dashboard component.

This component is implemented in Scala using Apache Spark.

Local Development Environment Setup

Prerequisites

Make sure Java 8 and sbt >= 1.0 is installed:

java -version
sbt about

Download, install, and run Apache Spark (version 3.2.1) in $SPARK_HOME:

$SPARK_HOME/sbin/start-master.sh

Download, install, and run Apache Cassandra (version >= 3.11) in $CASSANDRA_HOME

$CASSANDRA_HOME/bin/cassandra -f

Ingest Raw Block Data

Run the following script for ingesting raw block test data

./scripts/ingest_test_data.sh

This should create a keyspace btc_raw (tables exchange_rates, transaction, block, block_transactions). Check as follows

cqlsh localhost
cqlsh> USE btc_raw;
cqlsh:btc_raw> DESCRIBE tables;

Execute Transformation Locally

Create the target keyspace for transformed data

cqlsh -f scripts/schema_transformed.cql

Compile and test the implementation

sbt test

Package the transformation pipeline

sbt package

Run the transformation pipeline on localhost

./submit.sh

macOS only: make sure gnu-getopt is installed (brew install gnu-getopt).

Check the running job using the local Spark UI at http://localhost:4040/jobs

Submit on a standalone Spark Cluster

Use the submit.sh script and specify the Spark master node (e.g., -s spark://SPARK_MASTER_IP:7077) and other options:

./submit.sh -h
Usage: submit.sh [-h] [-m MEMORY_GB] [-c CASSANDRA_HOST] [-s SPARK_MASTER]
                 [--currency CURRENCY]
                 [--raw_keyspace RAW_KEYSPACE]
                 [--tgt_keyspace TGT_KEYSPACE]
                 [--bucket_size BUCKET_SIZE]
                 [--bech32-prefix BECH32_PREFIX]
                 [--checkpoint-dir CHECKPOINT_DIR]
                 [--coinjoin-filtering]

Name		Name	Last commit message	Last commit date
Latest commit History 363 Commits
.github/workflows		.github/workflows
project		project
scripts		scripts
src		src
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sbt		build.sbt
requirements.txt		requirements.txt
scalastyle-config.xml		scalastyle-config.xml
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphSense Transformation Pipeline (Moved to graphsense-spark)

Local Development Environment Setup

Prerequisites

Ingest Raw Block Data

Execute Transformation Locally

Submit on a standalone Spark Cluster

About

Releases 4

Packages

Contributors 4

Languages

License

graphsense/graphsense-transformation

Folders and files

Latest commit

History

Repository files navigation

GraphSense Transformation Pipeline (Moved to graphsense-spark)

Local Development Environment Setup

Prerequisites

Ingest Raw Block Data

Execute Transformation Locally

Submit on a standalone Spark Cluster

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages