-
Notifications
You must be signed in to change notification settings - Fork 706
Why pack unpack and not toList[]
oleksii iepishkin edited this page Feb 6, 2014
·
8 revisions
The field based API toList should not be used if the size of the list in a groupBy is very large/not known in advance. toList doesn't decrease the data size significantly, and it stands a good chance of creating OOM errors if the lists get too long.A good alternative to toList is to use pack/unpack and reduce. Use pack to convert the tuples into an object, then do a groupBy with a reduce function inside it and have your logic to process the grouped items, combine them etc.
Example 1:
val res_pipe= inputpipe.groupBy('firstname){
.toList['lastname]
}
Example 2:
case class Person(firstname: String="", lastname: String = "")
val res_pipe= inputpipe.flatMap(('firstname,'lastname)->('firstname,'person)){
in: (String, String) =>
val (firstname,lastname) = in
val person= Person(firstname= firstname,lastname= lastname)
(firstname,person)
}
.groupBy('firstname){
.reduce('person->'combinedperson){
(personAccumulated: Person, person: Person) =>
val combined_lastname_person= Person(
firstname= personAccumulated.firstname,
lastname= personAccumulated.lastname + ","+ person.lastname,
)
combined_lastname_person
}.unpack"["Person"]"('combinedperson->('firstname,'lastname))
//comma separated last names
}
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding