Skip to content

A custom EMR-Serverless execution environment with GDAL

Notifications You must be signed in to change notification settings

moradology/gdal-emr-serverless

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GDAL on EMR-Serverless (gdal-emr-serverless)

Overview

gdal-emr-serverless is a project focused on deploying and managing custom EMR-Serverless applications for geospatial processing with GDAL. It includes a Terraform management script (tf) and a Python CLI script for job submission to EMR-Serverless. The tf script streamlines Terraform operations across multiple environments, while the Python CLI script demonstrates one approach for the submission of Spark jobs to EMR-Serverless.

Key Components

  • Terraform Script (tf): Manages infrastructure applications on EMR-Serverless using Terraform.
  • Workspace Management: Utilizes Terraform workspaces for segregating environments like dev, prod, etc.
  • Docker Integration: Handles building and pushing Docker images required for serverless applications that depend on GDAL.
  • EMR-Serverless Job Submission Script: A Python CLI tool for submitting jobs to AWS EMR-Serverless.

Prerequisites

Usage

Terraform Script (tf)

Execute Terraform commands within the project's infrastructure context:

./tf [terraform_command] [options]

Examples:

./tf plan
./tf apply
./tf destroy

If you run into permissions issues, don't forget to supply credentials. There are a lot of different mechanisms available here, so review the docs as needed. Here's an example of using a pre-configured profile named "your-aws-profile":

AWS_PROFILE=your-aws-profile ./tf apply

Managing Workspaces

Manage different deployment environments using workspaces. Automatically selects the appropriate variable file for the active workspace:

./tf workspace new [workspace_name]
./tf workspace select [workspace_name]

Docker Image Management

Build and push Docker images as part of the infrastructure setup:

./tf update_image

EMR-Serverless Job Submission Script

Submit jobs to EMR-Serverless using the Python CLI script:

python emr_job_cli.py \
  --application-id "app-id" \
  --execution-role-arn "arn:aws:iam::123456789012:role/MyRole" \
  --entry-point "s3://path/to/assembly.jar" \
  --entry-point-arguments "arg1 arg2 arg3" \
  --spark-submit-parameters "--executor-memory 1G --total-executor-cores 2" \
  --name "MySparkGDALJob"

Replace the placeholders with actual job details. entryPointArguments should be a space-separated list of arguments.

Terraform Workspace Usage and Requirements

Overview

Terraform workspaces are extensively used to manage and isolate configurations for different environments in gdal-emr-serverless.

Important Notes

  • Avoid Default Workspace: The project contains custom logic to prevent the use of Terraform's default workspace.
  • Workspace-Specific Configuration: Each workspace requires a terraform.[workspace].tfvars file for environment-specific configurations.
  • Credentials and Secrets: Handle AWS credentials and sensitive data securely, especially when using the job submission script.

About

A custom EMR-Serverless execution environment with GDAL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published