Skip to content

Dockerized Environment for developing Geospatial applications in Python using Apache Spark, Apache Sedona and Delta Lake.

License

Notifications You must be signed in to change notification settings

Raychani1/PySpark_Sedona_Delta_Docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark-Sedona-Delta Docker Environment

About The Project

This project provides an easy-to-deploy environment for running geospatial processing jobs using the Python programming language with the power of Apache Spark, enhanced by the Sedona library for geospatial analytics and Delta Lake for reliable data storage.

Built With

Docker

Dependencies

Getting Started

Check out the following project for usage example.

Prerequisites

  • Docker - To use this environment an installed copy of Docker is required. For this purpose Docker or Docker Desktop is recommended. The following product can be downloaded from their website or installed through a package manager.

Building the image locally

  1. Clone the repo and navigate to the Project folder

    git clone https://github.com/Raychani1/PySpark_Sedona_Delta_Docker
  2. Build the Docker Image

    docker build -t pyspark_sedona_delta_docker .
  3. Navigate to your Project directory and create a Project related Dockerfile based on the new Image with the following content:

    FROM pyspark_sedona_delta_docker:latest
    
    WORKDIR /app
    
    # Install Project related Python libraries
    COPY requirements.txt .
    RUN pip install -r requirements.txt --no-cache-dir
  4. Build the Project related Docker Image

    docker build -t my_project .
  5. Create alternative way for more convenient execution

    On Linux:

    alias My_Project="docker run --rm -it -v $(pwd):/app my_project:latest"

    On Windows:

    1. Create a PowerShell script file with the following content:

      function My_Project {
         docker run --rm -it -v ${pwd}:/app my_project:latest $args    
      }
    2. Load the new function using the dot notation

      . .\my_project.ps1
  6. Run your project through the environment

    My_Project python main.py

Pulling image from Docker Hub

  1. Navigate to your Project directory and create a Project related Dockerfile based on the new Image with the following content:

    FROM rajcsanyiladislavit/local_geo_analysis:latest 
    
    WORKDIR /app
    
    # Install Project related Python libraries
    COPY requirements.txt .
    RUN pip install -r requirements.txt --no-cache-dir
  2. Follow steps 4 - 6 in the previous section.

Developing your code in a Dev Container

  1. Navigate to your Project directory and create a Project related Dockerfile based on the description in previous section(s).

  2. Start up your Project container and connect to it using the official documentation for VS Code and PyCharm.

License

Distributed under the MIT License. See LICENSE for more information.


About

Dockerized Environment for developing Geospatial applications in Python using Apache Spark, Apache Sedona and Delta Lake.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published