Skip to content

overcoil/prestoX-cluster

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prestoX-cluster

prestoX-cluster is a package for for running a Presto cluster either for local test (using docker-compose) or large scale realistic benchmarking (using Kubernetes). This package is useable for both forks of "Presto": either the original PrestoDB or the newer PrestoSQL/Trino fork).

For disambiguation, I will write "Presto" to refer to either fork; I will use "PrestoDB" or "Trino" directly when I am referring to a particular fork.

This package is derived from Lewuathe/docker-trino-cluster and saj1th/docker-presto-cluster.

Prerequisite

The starting point for using this package are the build artifacts (e.g., presto-server-*.tar.gz/presto-cli-*-executable.jar or trino-server-*.tar.gz/trino-cli-*-executable.jar ) as output by either Presto builds (PrestoDB, Trino). This package is also useable with custom builds of either (say, from a private/local fork). Irrespective, keep tab of the fork you're working on as that drives the names for downstream outputs (e.g., Docker containers, Kubernetes manifests, etc). In the case of Trino, this package is useable for version 351 and onward which uses the new name (Trino). (See Release 351 for details on specifics.)

Usage

Use the Makefile to execute each step as required.

A typical flow will goes as:

  • Source the Presto artifacts
  • Build and optionally push images from these
  • Setup up the cluster host (Docker or Kubernetes)
  • Configure the cluster
  • Run the cluster

(I have a set of images based on Presto 0.266 and a Trino 359-based fork available for demo/exploration at:

1. Specify the Version

$ vi Makefile
# specify the version you are working with
# fill in TRINO_VER & PRESTO_VER as required

2. Source the build artifacts

The Makefile assumes that you have the Presto artifacts installed/available in your local Maven repo. If you are doing otherwise, skip the following and instead place the two binaries into presto-base directly. We require both the server run-time package and the executable CLI. Remember to also set the permissions.

Presto Variant Server run-time CLI
PrestoDB presto-server-<version>.tar.gz presto-cli-<version>-executable.jar
Trino trino-server-<version>.tar.gz trino-cli-<version>-executable.jar
# to extract the PrestoDB .tar.gz from your Maven repo
$ make pcopy
# to extract the Trino .tar.gz from your Maven repo
$ make tcopy

3. Build the Docker images

To build the Docker images, decide on the container registry and user id you will use and set the value of DOCKERHUB_ID appropriately. This value must be supplied even if you plan to run a local cluster. (In that case, you can skip the push to the container registry.) If you are using Kubernetes, you must push your images to a container registry for your Kubernetes cluster to find your images. The example below is for my id (overcoil) in DockerHub (docker.io).

# to build PrestoDB images
$ DOCKERHUB_ID=docker.io/overcoil make pdev
# to build Trino images
$ DOCKERHUB_ID=docker.io/overcoil make tdev

The CLI executable is installed into the coordinator node's image for your convenience. You will be able to docker exec into the node to use it. See [] below.

4. Push your Docker images (optional)

If you plan to run your cluster from Kubernetes, you must push your images to a container registry for your Kubernetes cluster to pull from. You will also need to push your images if you wish to share them with other. Remember to set the permission of your images and configure the authentication you require in your cluster.

# to push your PrestoDB images
$ DOCKERHUB_ID=docker.io/overcoil make ppush
# to push your Trino images
$ DOCKERHUB_ID=docker.io/overcoil make tpush

Each invocation of a Docker image (corresponding to one node of your Presto cluster) is invoked with up to six arguments:

Index Argument Description Default Value
1 discovery_uri Required parameter to specify the URI to coordinator host N/A
2 node_id Optional parameter to specify the node identity. generated UUID
3 querymaxmemorypernode Parameter to specify the node's query.max-memory-per-node setting inside its config.properties 8GB REVISIT
4 querymaxtotalmemorypernode Parameter to specify the node's query.max-total-memory-per-node setting inside its config.properties 8GB
5 querymaxmemory Parameter to specify the node's query.max-memory setting inside its config.properties 8GB
6 querymaxtotalmemory Parameter to specify the node's query.max-total-memory setting inside its config.properties 8GB

The 4 query.*memory* settings control the size of each node. Refer to the memory management properties documentation in PrestoDB & Trino for details on these.

5. Running a local cluster

docker-compose coordinates multiple containers from a single YAML file. The pre-built target prun and trun uses docker-compose.yml to start up a multi-node cluster.

# to start up a local PrestoDB cluster
$ PRESTVAR=presto DOCKERHUB_ID=docker.io/overcoil make prun
# to start up a local Trino cluster
$ PRESTVAR=trino DOCKERHUB_ID=docker.io/overcoil make trun

In the case of the demo images above (docker.io/overcoil/presto-*, and trino-*), a parameterized deltas3 catalog is included for convenient access to your S3 bucket. (You supply an AWS key pair with the read privilege for your bucket.)

Custom Catalogs

While the image provides several default connectors (i.e. JMX, Memory, TPC-H and TPC-DS), you may want to override the catalog property with your own ones. You can use mount a Docker volume to replace /usr/local/presto/etc/catalog of all nodes of your cluster. Refer to volumes for details.

services:
  coordinator:
    image: "saj1th/presto-dbx-coordinator:${PRESTO_VERSION}"
    ports:
      - "8080:8080"
    container_name: "coordinator"
    command: http://coordinator:8080 coordinator
    volumes:
      - ./example/etc/catalog:/usr/local/presto/etc/catalog

6. Running on Kubernetes

See the k8s folder and its README.md.

Development

Build Image

$ make build

Snapshot Image

You may want to run presto with custom build

$ make snapshot

LICENSE

Apache v2 License

About

Multiple node presto cluster on docker container

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Makefile 53.3%
  • Shell 31.8%
  • Python 10.3%
  • Dockerfile 4.1%
  • HCL 0.5%