-
Notifications
You must be signed in to change notification settings - Fork 706
Scalding on amazon elastic mapreduce
alexanderdean edited this page Aug 12, 2012
·
3 revisions
I copied this from the Google Group Discussion about how to get Scalding running in EMR:
I was able to successfully execute the WordCountJob Scalding's example on Amazon EMR. To recap, here are the steps I took:
- I excluded Hadoop from the scalding assembly by changing the last lines in build.sbt: excludedJars in assembly <<= (fullClasspath in assembly) map { cp => cp filter { Set("janino-2.5.16.jar", "hadoop-core-0.20.2.jar") contains _.data.getName} }
- I uploaded the resulting scalding jar and the hello.txt file to Amazon S3
- I created an EMR job using a custom jar. From the command line, it looks like this: elastic-mapreduce --create --name "Test Scalding" --jar s3n://<bucket-and-path-to-scalding-assembly-0.3.5.jar> --arg com.twitter.scalding.examples.WordCountJob --arg --hdfs --arg --input --arg s3n://<bucket-and-path-to-hello.txt> --arg --output --arg s3n://
Voilà! :-)
For an example of a standalone Scalding job which can be run on Amazon EMR, please see:
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding