Skip to content

RumbleDB/docker-presto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Dockerized Presto

This projects aims to make it easy to get started with Presto. It is based on Docker and Docker compose. Currently, the following features are supported:

Starting Presto

The following should be enough to bring up all required services:

docker-compose up

Varying the Number of Workers and Data Nodes

To change the number of Presto worker nodes or HDFS data nodes, use the --scale flag of docker-compose:

docker-compose up --scale datanode=3 --scale presto-worker=3

Building the Image Locally

Above command uses a pre-built docker image. If you want the image to be build locally, do the following instead:

docker-compose --file docker-compose-local.yml up

If you are behind a corporate firewall, you will have to configure Maven (which is used to build part of Presto) as follows before running above command:

export MAVEN_OPTS="-Dhttp.proxyHost=your.proxy.com -Dhttp.proxyPort=3128 -Dhttps.proxyHost=your.proxy.com -Dhttps.proxyPort=3128"

Uploading Data to HDFS

The data/ folder is mounted into the HDFS namenode container, from where you can upload it using the HDFS client in that container (docker-presto_presto_1 may have a different name on your machine; run docker ps to find out):

docker exec -it docker-presto_namenode_1 hadoop fs -mkdir /dataset
docker exec -it docker-presto_namenode_1 hadoop fs -put /data/file.parquet /dataset/
docker exec -it docker-presto_namenode_1 hadoop fs -ls /dataset

Running Queries

You can use the Presto CLI included in the Docker containers of this project (adapt container name if necessary):

docker exec -it docker-presto_presto_1 presto-cli --catalog hive --schema default

Alternatively, you can download the Presto CLI, rename it, make it executable, and run the following:

./presto-cli --server localhost:8080 --catalog hive --schema default

Creating an External Table

Suppose you have the following file test.json:

{"s": "hello world", "i": 42}

Upload it to /test/test.csv on HDFS as described above. Then run the following in the Presto CLI:

CREATE TABLE test (s VARCHAR, i INTEGER) WITH (EXTERNAL_LOCATION = 'hdfs://namenode/test/', FORMAT = 'JSON');

For external tables from S3, spin up this service in an EC2 instance, set up an instance profile for that instance, and use the s3a:// protocol instead of hdfs://.

Adminstrating the MySQL Databases

In case you need to make manual changes or want to inspect the MySQL databases, you can connect to it like this:

docker exec -it docker-presto_mysql_1 mysql -ppassword

About

Docker package for the Presto benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published