Skip to content

Docker Iceberg DeltaLake playground with Jupyter Notebook

License

Notifications You must be signed in to change notification settings

udayaw/docker-spark-playground

 
 

Repository files navigation

Spark 3.5.0 + Iceberg 1.4.1 + Delta Lake 3.0.0 + Jupyter Notebook (Python, Scala)

This is a docker compose environment to quickly get up and running with a Spark environment and a external hive catalog using Postgre , and MinIO as a storage backend.

note: If you don't have docker installed, you can head over to the Get Docker page for installation instructions.

On windows platforms change the line endings for spart/entrypoint.sh and postgres/init-db.sh into LF

Usage

Start up the notebook server by running the following.

docker-compose up

The notebook server will then be available at http://localhost:8888

Minio will be available at http://localhost:9001/browser

While the notebook server is running, you can use any of the following commands if you prefer to use spark-shell, spark-sql, or pyspark.

docker exec -it spark-iceberg spark-shell
docker exec -it spark-iceberg spark-sql
docker exec -it spark-iceberg pyspark

The dellta lake support is enabled in spark default catalog spark_default while iceberg uses iceberg catalog.

To stop everything, just run docker-compose down. The data directories for metastore and the MinIO are mounted locally and changes will be persisted even after the shutdown.

Cleanup

Use cleanup.sh to clean the directories for metastore and the storage.


For more information on getting started with using Iceberg, checkout the Getting Started guide in the official docs.

The repository for the docker image is located on dockerhub.

About

Docker Iceberg DeltaLake playground with Jupyter Notebook

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 70.7%
  • Dockerfile 10.1%
  • Java 8.4%
  • Shell 7.8%
  • Python 3.0%