GitHub - datapunchorg/punch: This project provides fully automated one-click experience to create Cloud and Kubernetes environment to run Data Analytics workload like Apache Spark.

Introduction

This project provides a fully automated one-click tool to create Data Analytics platform in Cloud and Kubernetes environment:

Single script to deploy a full stack data platform: Kafka, Hive Metastore, Spark, and Data Ingestion Job.
Spark API Gateway to run Spark platform as a service.
Extensible design to support customization and new service deployment.

Use Cases

Deploy Spark as a Service on EKS

Use command like punch install SparkOnEks to get a ready-to-use Spark Service within minutes. That single command will do following automatically:

Create an AWS EKS cluster and set up required IAM roles
Deploy Nginx Ingress Controller and a Load Balancer
Deploy Spark Operator and a REST API Gateway to accept application submission
Deploy Spark History Server
Enable Cluster AutoScaler

When the punch command finishes, the Spark Service is ready to use. People could use curl or the command line tool (sparkcli) to submit Spark application.

Deploy a Data Ingestion Platform

punch also supports chaining multiple install commands to deploy a complex data platform.

For example, we could create a single script file with multiple commands:

punch install Eks -> punch install KafkaBridge -> punch install HiveMetastore -> punch install SparkOnEks

The script will deploy a data ingestion platform with all the components in the green area in the following diagram:

After it is deployed, user could send data to the REST API. Then the data will get into Kafka and automatically ingested into AWS S3 by the Spark Streaming application. People could write further Spark applications to query Hive table or compute metrics/insights from the table.

How to build (on MacBook)

The following command will create dist folder and dist.zip file for Punch.

make release

Go to dist folder, check User Guide to see how to run punch command.

User Guide - Run Spark on AWS EKS

Use single command like punch install SparkOnEks to deploy a runnable Spark environment on EKS.

See User Guide for more details in section: Run punch on AWS.

Thanks for support from JetBrains with the great development tool and licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 427 Commits
.github/workflows		.github/workflows
cmd/punch		cmd/punch
docs		docs
hack		hack
pkg		pkg
script		script
third-party		third-party
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
QuickStart_CreateEks.md		QuickStart_CreateEks.md
QuickStart_Spark_Minikube.md		QuickStart_Spark_Minikube.md
README.md		README.md
UserGuide.md		UserGuide.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Use Cases

Deploy Spark as a Service on EKS

Deploy a Data Ingestion Platform

How to build (on MacBook)

User Guide - Run Spark on AWS EKS

About

Releases 6

Packages

Contributors 5

Languages

License

datapunchorg/punch

Folders and files

Latest commit

History

Repository files navigation

Introduction

Use Cases

Deploy Spark as a Service on EKS

Deploy a Data Ingestion Platform

How to build (on MacBook)

User Guide - Run Spark on AWS EKS

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 5

Languages

Packages