See accompanying blog post: https://anant.us/blog/visualize-data-from-cassandra-in-superset/
IMPORTANT: Remember to make the ports public when the dialog shows in the bottom righthand corner!
docker-compose up -d
docker ps
Set a bash variable to make following commands easier:
PRESTO_CTR=$(docker container ls | grep 'presto_1' | awk '{print $1}')
docker cp cassandra.properties $PRESTO_CTR:/opt/presto-server/etc/catalog/cassandra.properties
docker exec -it $PRESTO_CTR sh -c "ls /opt/presto-server/etc/catalog"
docker exec -it $PRESTO_CTR presto-cli
show catalogs ;
If you do not see cassandra, then we need to restart the container
docker restart $PRESTO_CTR
Set a bash variable to make following commands easier:
CASSANDRA_CTR=$(docker container ls | grep 'cassandra_1' | awk '{print $1}')
docker cp setup.cql $CASSANDRA_CTR:/
docker cp sensor_data.cql $CASSANDRA_CTR:/
docker exec -it $CASSANDRA_CTR cqlsh -f setup.cql
docker exec -it $CASSANDRA_CTR cqlsh -f sensor_data.cql
You can test for successful ingestion using CQLSH:
docker exec -it $CASSANDRA_CTR cqlsh -e 'expand on; SELECT spacecraft_name,start,summary FROM demo.spacecraft_journey_catalog limit 30'
Or using Presto CLI:
docker exec -it $PRESTO_CTR presto-cli
Then within the CLI:
SELECT * FROM cassandra.demo.spacecraft_journey_catalog limit 30;
- For more information on connecting Superset and Presto, see this guide. Though the guide focuses on Trino instead of Presto, the concepts are close enough that they will be basically interchangeable for what we are doing here.
- For actually connecting Superset to Trino, see: https://trino.io/episodes/12.html
Docker-compose.yml and Dockerfile resources are in the superset github repo, so following example given by other guides, we will just clone the superset github repo:
git clone https://github.com/apache/superset.git
cd superset
docker-compose -f docker-compose-non-dev.yml pull
docker-compose -f docker-compose-non-dev.yml up
Login using:
- username:
admin
- pass:
admin
In order to use charts that use Mapbox, you will have to add a mapbox api key. Create a token if you don't have one already. However the default public api key works just fine.
Then add it to the env file for superset docker images:
vim ../superset/docker/.env-non-dev
Then in your favorite text editor add this:
MAPBOX_API_KEY='XXYYZZ'
Our spaceship dataset is based on the SparkSQL notebook from Datastax Studio. For the basic schema which this was based on see examples provided here.
Data entries and schema modified slightly by Arpan Patel and made available by his demo for Presto, Airflow, and Cassandra here. Subsequently modified some again for this current demo.
We also borrowed from Datastax Academy Sample sensor data. Slight modifications made to make sure it runs in correct keyspace. See https://github.com/DataStax-Academy/data-modeling-sensor-data/blob/main/step2.md