This is an example of using Spark MLlib's Naive Bayes model in R which I used as a demo at Singapore's Spark user group's first meetup http://www.meetup.com/Spark-Singapore/events/218794905/
Slides: http://www.slideshare.net/KienDang5/introduction-to-sparkr
Data source: https://archive.ics.uci.edu/ml/datasets/Spambase
Currently access to MLlib in SparkR is still in development. Thus use this method to run MLlib in R until MLlib is officially integrated into SparkR.
-
Download SparkR:
$ git clone https://github.com/amplab-extras/SparkR-pkg.git
-
Add "org.apache.spark" % "spark-mllib_2.10" % "1.1.0", "org.scalanlp" % "breeze_2.10" % "0.10", "net.rforge" % "Rserve" % "0.6-8.1" to libraryDependencies in SparkR-pkg/pkg/src/build.sbt
-
Copy src/RToScalaRDD.scala in this repo to SparkR-pkg/pkg/src/src
-
Install SparkR:
R devtools::install_local("path/to/SparkR-pkg/pkg")
$ path/to/SparkR/sparkR
$ source("./R/1_naivebayes.R")
$ source("./R/2_spam_naivebayes.R")