Skip to content

Anant/example-cassandra-presto-superset

Repository files navigation

Connect Superset and Cassandra using Presto

demo-chart superset-cassandra-presto-C_+Presto+Superset drawio

See accompanying blog post: https://anant.us/blog/visualize-data-from-cassandra-in-superset/

1. Start Presto and Cassandra Docker Containers

IMPORTANT: Remember to make the ports public when the dialog shows in the bottom righthand corner!

1.1 Run setup script

docker-compose up -d 

2. Open a new terminal and confirm services are running

2.1 Confirm Docker containers are running

docker ps

2.3 Presto UI on port 8080

3. Create Cassandra Catalog in Presto

Set a bash variable to make following commands easier:

PRESTO_CTR=$(docker container ls | grep 'presto_1' | awk '{print $1}')

3.2 Copy cassandra.properties to Presto container

docker cp cassandra.properties $PRESTO_CTR:/opt/presto-server/etc/catalog/cassandra.properties

3.3 Confirm cassandra.properties was moved to Presto container

docker exec -it $PRESTO_CTR sh -c "ls /opt/presto-server/etc/catalog"

4. Confirm Presto CLI can see Cassandra catalog

4.1 Start Presto CLI

docker exec -it $PRESTO_CTR presto-cli

4.2 Run show command

show catalogs ;

If you do not see cassandra, then we need to restart the container

4.3 Restart Presto container

docker restart $PRESTO_CTR

4.4 Repeat 4.1 and 4.2 and confirm if you can now see the cassandra catalog

5. Set up Cassandra data

Set a bash variable to make following commands easier:

CASSANDRA_CTR=$(docker container ls | grep 'cassandra_1' | awk '{print $1}')

6.1 Copy CQL file onto Cassandra Container

docker cp setup.cql $CASSANDRA_CTR:/
docker cp sensor_data.cql $CASSANDRA_CTR:/

6.2 Run CQL file

docker exec -it $CASSANDRA_CTR cqlsh -f setup.cql
docker exec -it $CASSANDRA_CTR cqlsh -f sensor_data.cql

6.3 Confirm Successful Data Ingestion

You can test for successful ingestion using CQLSH:

docker exec -it $CASSANDRA_CTR cqlsh -e 'expand on; SELECT spacecraft_name,start,summary FROM demo.spacecraft_journey_catalog limit 30'

Or using Presto CLI:

docker exec -it $PRESTO_CTR presto-cli

Then within the CLI:

SELECT * FROM cassandra.demo.spacecraft_journey_catalog limit 30;

Setup Superset

  • For more information on connecting Superset and Presto, see this guide. Though the guide focuses on Trino instead of Presto, the concepts are close enough that they will be basically interchangeable for what we are doing here.
  • For actually connecting Superset to Trino, see: https://trino.io/episodes/12.html

Startup Superset

Docker-compose.yml and Dockerfile resources are in the superset github repo, so following example given by other guides, we will just clone the superset github repo:

git clone https://github.com/apache/superset.git
cd superset
docker-compose -f docker-compose-non-dev.yml pull
docker-compose -f docker-compose-non-dev.yml up

http://localhost:8088

Login using:

  • username: admin
  • pass: admin

Using MapBox Diagrams

In order to use charts that use Mapbox, you will have to add a mapbox api key. Create a token if you don't have one already. However the default public api key works just fine.

Then add it to the env file for superset docker images:

vim ../superset/docker/.env-non-dev

Then in your favorite text editor add this:

MAPBOX_API_KEY='XXYYZZ'

Credits

Our spaceship dataset is based on the SparkSQL notebook from Datastax Studio. For the basic schema which this was based on see examples provided here.

Data entries and schema modified slightly by Arpan Patel and made available by his demo for Presto, Airflow, and Cassandra here. Subsequently modified some again for this current demo.

We also borrowed from Datastax Academy Sample sensor data. Slight modifications made to make sure it runs in correct keyspace. See https://github.com/DataStax-Academy/data-modeling-sensor-data/blob/main/step2.md

Releases

No releases published

Packages

No packages published

Languages