Skip to content

abhibond/spark-emr-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

A simple tutorial for getting started with Spark on EMR. This can also be used as a templete for setting up the local dev environmemnt.

Dependencies

Building

Get started with Spark Dev Environment

git clone https://github.com/abhibond/spark-emr-example.git
cd spark-emr-example
sbt assembly

Running

Run it locally

YOUR_SPARK_HOME/bin/spark-submit \
--class "com.abhibon.spark.WordCount" \
--master local[4] \
/Path-To-Your-Fat-Jar Arguments

Run it on EMR

Launching EMR with Spark

Always try to launch emr cluster with the latest Spark version. For more details check [here] (https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark).

aws emr create-cluster --name abond-spark --ami-version 3.3.1 --instance-type m3.xlarge --instance-count 1 \
--ec2-attributes KeyName=abond-kp2 \
--bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark -v 1.2 -g

Executing app on EMR

We submit a job to Spark via spark-submit. This can be done by adding a step to the EMR using the [script-runner] (http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-script.html) interface.

aws emr add-steps --cluster-id j-345VNW4E90THA --steps \
Name=WordCount,Jar=s3://elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--class,com.abhibon.spark.WordCount,s3://abond-dev/spark-demo/bin/spark-emr-example.jar,s3://abond-dev/spark-demo/input/README.md,s3://abond-dev/spark-demo/output/],ActionOnFailure=CONTINUE

For more examples check [here] (https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/spark-submit-via-step.md)

To-DO

  • Intro to SparkSQL
  • Tutorial on using Parquet and Avro with Spark
  • Using Parquet with SparkSQL

Resources

About

A short template to get started with Spark on EMR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages