JAVA Version 7
SBT Version 0.13.8
SCALA Version 2.10.5
SPARK Version 1.3.1
spark-submit \
--class com.wordcount.example.WordCount \
--driver-memory "3g" \
target/scala-2.10/dfw-spark-meetup_2.10-0.0.1.jar \
/Users/shona/IdeaProjects/apache-spark-examples/data/pg4300.txt \
/Users/shona/IdeaProjects/apache-spark-examples/data/wordcount
spark-submit \
--class com.stackoverflow.example.UserCount \
--master "local[1]" \
--driver-memory "3g" \
target/scala-2.10/dfw-spark-meetup_2.10-0.0.1.jar \
/Users/shona/IdeaProjects/apache-spark-examples/data/Users.xml
spark-submit \
--class com.twitter.example.NaiveBayesClassifier \
--driver-memory "5g" \
target/scala-2.10/dfw-spark-meetup-assembly-0.0.1.jar
spark-submit \
--class org.apache.spark.examples.streaming.TwitterPopularTags \
--master "local[2]" \
--driver-memory "3g" \
lib/spark-examples-1.3.1-hadoop2.6.0.jar \
Consumer Key , Consumer Secret , Access Token ,Access Token Secret
val numbers = sc.parallelize(List(1,2,3,4,5,6), 2)
numbers.aggregate(0)(math.max(_, _), _ + _)
val file = sc.textFile("README.md")
val containsSpark = file.filter(line => line.contains("Spark"))
val words = containsSpark.flatMap(line => line.split(" "))
val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
counts.toDebugString
counts.collect()
val accum = sc.accumulator(0, "Test Accumulator")
sc.parallelize(Array(7,8, 9, 10)).foreach(x => accum += x)
val broadcastVar = sc.broadcast(Array(1, 2, 3))
broadcastVar.value
- Stack Exchange - Stackoverflow data download.
- Sample Text File - Sample Large Text file download.
- Sentiment Analysis Dataset - Sample Tweets for training the Naive Bayes model.