prestoX-cluster is a package for for running a Presto cluster either for local test (using docker-compose) or large scale realistic benchmarking (using Kubernetes). This package is useable for both forks of "Presto": either the original PrestoDB or the newer PrestoSQL/Trino fork).
For disambiguation, I will write "Presto" to refer to either fork; I will use "PrestoDB" or "Trino" directly when I am referring to a particular fork.
This package is derived from Lewuathe/docker-trino-cluster and saj1th/docker-presto-cluster.
The starting point for using this package are the build artifacts (e.g., presto-server-*.tar.gz
/presto-cli-*-executable.jar
or trino-server-*.tar.gz
/trino-cli-*-executable.jar
) as output by either Presto builds (PrestoDB, Trino). This package is also useable with custom builds of either (say, from a private/local fork). Irrespective, keep tab of the fork you're working on as that drives the names for downstream outputs (e.g., Docker containers, Kubernetes manifests, etc). In the case of Trino, this package is useable for version 351 and onward which uses the new name (Trino). (See Release 351 for details on specifics.)
Use the Makefile to execute each step as required.
A typical flow will goes as:
- Source the Presto artifacts
- Build and optionally push images from these
- Setup up the cluster host (Docker or Kubernetes)
- Configure the cluster
- Run the cluster
(I have a set of images based on Presto 0.266 and a Trino 359-based fork available for demo/exploration at:
-
https://hub.docker.com/repository/docker/overcoil/presto-base
-
https://hub.docker.com/repository/docker/overcoil/presto-dbx-coordinator
-
https://hub.docker.com/repository/docker/overcoil/presto-dbx-worker )
-
https://hub.docker.com/repository/docker/overcoil/trino-base
-
https://hub.docker.com/repository/docker/overcoil/trino-dbx-coordinator
-
https://hub.docker.com/repository/docker/overcoil/trino-dbx-worker )
$ vi Makefile
# specify the version you are working with
# fill in TRINO_VER & PRESTO_VER as required
The Makefile assumes that you have the Presto artifacts installed/available in your local Maven repo. If you are doing otherwise, skip the following and instead place the two binaries into presto-base
directly. We require both the server run-time package and the executable CLI. Remember to also set the permissions.
Presto Variant | Server run-time | CLI |
---|---|---|
PrestoDB | presto-server-<version>.tar.gz |
presto-cli-<version>-executable.jar |
Trino | trino-server-<version>.tar.gz |
trino-cli-<version>-executable.jar |
# to extract the PrestoDB .tar.gz from your Maven repo
$ make pcopy
# to extract the Trino .tar.gz from your Maven repo
$ make tcopy
To build the Docker images, decide on the container registry and user id you will use and set the value of DOCKERHUB_ID
appropriately. This value must be supplied even if you plan to run a local cluster. (In that case, you can skip the push to the container registry.) If you are using Kubernetes, you must push your images to a container registry for your Kubernetes cluster to find your images. The example below is for my id (overcoil
) in DockerHub (docker.io
).
# to build PrestoDB images
$ DOCKERHUB_ID=docker.io/overcoil make pdev
# to build Trino images
$ DOCKERHUB_ID=docker.io/overcoil make tdev
The CLI executable is installed into the coordinator node's image for your convenience. You will be able to docker exec
into the node to use it. See [] below.
If you plan to run your cluster from Kubernetes, you must push your images to a container registry for your Kubernetes cluster to pull from. You will also need to push your images if you wish to share them with other. Remember to set the permission of your images and configure the authentication you require in your cluster.
# to push your PrestoDB images
$ DOCKERHUB_ID=docker.io/overcoil make ppush
# to push your Trino images
$ DOCKERHUB_ID=docker.io/overcoil make tpush
Each invocation of a Docker image (corresponding to one node of your Presto cluster) is invoked with up to six arguments:
Index | Argument | Description | Default Value |
---|---|---|---|
1 | discovery_uri | Required parameter to specify the URI to coordinator host | N/A |
2 | node_id | Optional parameter to specify the node identity. | generated UUID |
3 | querymaxmemorypernode | Parameter to specify the node's query.max-memory-per-node setting inside its config.properties |
8GB REVISIT |
4 | querymaxtotalmemorypernode | Parameter to specify the node's query.max-total-memory-per-node setting inside its config.properties |
8GB |
5 | querymaxmemory | Parameter to specify the node's query.max-memory setting inside its config.properties |
8GB |
6 | querymaxtotalmemory | Parameter to specify the node's query.max-total-memory setting inside its config.properties |
8GB |
The 4 query.*memory*
settings control the size of each node. Refer to the memory management properties documentation in PrestoDB & Trino for details on these.
docker-compose
coordinates multiple containers from a single YAML file. The pre-built target prun
and trun
uses docker-compose.yml to start up a multi-node cluster.
# to start up a local PrestoDB cluster
$ PRESTVAR=presto DOCKERHUB_ID=docker.io/overcoil make prun
# to start up a local Trino cluster
$ PRESTVAR=trino DOCKERHUB_ID=docker.io/overcoil make trun
In the case of the demo images above (docker.io/overcoil/presto-*
, and trino-*
), a parameterized deltas3
catalog is included for convenient access to your S3 bucket. (You supply an AWS key pair with the read privilege for your bucket.)
While the image provides several default connectors (i.e. JMX, Memory, TPC-H and TPC-DS), you may want to override the catalog property with your own ones. You can use mount a Docker volume to replace /usr/local/presto/etc/catalog
of all nodes of your cluster. Refer to volumes
for details.
services:
coordinator:
image: "saj1th/presto-dbx-coordinator:${PRESTO_VERSION}"
ports:
- "8080:8080"
container_name: "coordinator"
command: http://coordinator:8080 coordinator
volumes:
- ./example/etc/catalog:/usr/local/presto/etc/catalog
See the k8s folder and its README.md.
$ make build
You may want to run presto with custom build
$ make snapshot