This is a docker compose environment to quickly get up and running with a Spark environment and a external hive catalog using Postgre , and MinIO as a storage backend.
note: If you don't have docker installed, you can head over to the Get Docker page for installation instructions.
On windows platforms change the line endings for spart/entrypoint.sh and postgres/init-db.sh into LF
Start up the notebook server by running the following.
docker-compose up
The notebook server will then be available at http://localhost:8888
Minio will be available at http://localhost:9001/browser
While the notebook server is running, you can use any of the following commands if you prefer to use spark-shell, spark-sql, or pyspark.
docker exec -it spark-iceberg spark-shell
docker exec -it spark-iceberg spark-sql
docker exec -it spark-iceberg pyspark
The dellta lake support is enabled in spark default catalog spark_default
while iceberg uses iceberg
catalog.
To stop everything, just run docker-compose down
. The data directories for metastore and the MinIO are mounted locally and changes will be persisted even after the shutdown.
Use cleanup.sh
to clean the directories for metastore and the storage.
For more information on getting started with using Iceberg, checkout the Getting Started guide in the official docs.
The repository for the docker image is located on dockerhub.