Skip to content

josephmachado/analytical_dp_with_sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Setup

Please install the following software:

  1. git version >= 2.37.1
  2. Docker version >= 20.10.17 and Docker compose v2 version >= v2.10.2. Make sure that docker is running using docker ps.

Windows users: please setup WSL and a local Ubuntu Virtual machine following the instructions here. Install the above prerequisites on your ubuntu terminal; if you have trouble installing docker, follow the steps here (only Step 1 is necessary). Please install the make command with sudo apt install make -y (if its not already present).

All the commands shown below are to be run via the terminal (use the Ubuntu terminal for WSL users). We will use docker to set up our containers. Clone and move into the lab repository, as shown below.

git clone \
https://github.com/josephmachado/analytical_dp_with_sql.git 
cd analytical_dp_with_sql

Note: If you are using Macbook M1, please follow the instructions here to use the appropriate docker image.

We have some helpful make commands to make working with our systems more accessible. Shown below are the make commands and their definitions

  1. make up: Spin up the docker containers.
  2. make trino: Open trino cli; Use exit to quit the cli. This is where you will type your SQL queries.
  3. make down: Stop the docker containers.

You can see the commands in this Makefile. If your terminal does not support make commands, please use the commands in the Makefile directly. All the commands in this book assume that you have the docker containers running.

In your terminal, do the following:

# Make sure docker is running using docker ps
make up # starts the docker containers
# If you are having issues with existing containers
# stop them all with the following command
# docker rm -f $(docker ps -a -q)

sleep 60 # wait 1 minute for all the containers to set up
make trino # opens the trino cli

In Trino, we can connect to multiple databases (called catalogs in Trino). TPC-H is a dataset used to benchmark analytical database performance. Trino's tpch catalog comes with preloaded tpch datasets of different sizes tiny, sf1, sf100, sf100, sf300, and so on, where sf = scaling factor.

-- run "make trino" or 
-- "docker container exec -it trino-coordinator trino" 
-- to open trino cli

USE tpch.tiny;
SHOW tables;
SELECT * FROM orders LIMIT 5;
-- shows five rows, press q to quit the interactive results screen
exit -- quit the cli

Note: Run make trino or docker container exec -it trino-coordinator trino on your terminal to open the trino cli. The SQL code shown throughout the book assumes you are running it in trino cli.

Starting the docker containers will also start Minio(S3 alternative); we will use Minio as our data store to explain efficient data storage.

UI: Open the Trino UI at http://localhost:8080 (username: any word) and Minio (S3 alternative) at http://localhost:9001 (username: minio, password: minio123) in a browser of your choice.

If you prefer to connect to Trino via a SQL IDE, download DBeaver (addresses issues mentioned here). Open DBeaver,

  1. Click on Database -> New Database Connection
  2. A Connect to a database box will open; search for, select Trino, and press Next.
  3. Do not change settings; use user as the user name. You will get a connected text box if you test the connection. Click Finish, and you will be able to explore our Trino database.

DBeaver

Data Model

The TPC-H data represents a car parts seller's data warehouse, where we record orders, items that make up that order (lineitem), supplier, customer, part(parts sold), region, nation, and partsupp.

Note: Have a copy of the data model as you follow along; this will help with the examples provided and answering exercise questions.

TPC-H data model

Acknowledgments

We use the TPC-H dataset and Trino as our OLAP DB.

About

Code for my "Efficient Data Processing in SQL" book.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published