Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5654] Integrate SparkR #5096

Closed
wants to merge 940 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
940 commits
Select commit Hold shift + click to select a range
71d66a1
fix first(0
davies Mar 2, 2015
e998356
define generic for 'first' in RDD API
Mar 2, 2015
f585929
Fix brackets
Mar 2, 2015
1955a09
return object instead of a list of one object
Mar 2, 2015
76cf2e0
Merge pull request #192 from cafreeman/sparkr-sql
shivaram Mar 3, 2015
03402eb
Updates as per feedback on sparkR-submit
Mar 3, 2015
1d0f2ae
Update DataFrame.R
Mar 3, 2015
f798402
Update column.R
Mar 3, 2015
524c122
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 3, 2015
8a676b1
Merge pull request #188 from davies/column
shivaram Mar 3, 2015
06cbc2d
launch R worker by a daemon
davies Mar 3, 2015
3beadcf
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 3, 2015
e2d144a
Fixed small typos
Mar 3, 2015
98cc97a
fix test and docs
davies Mar 3, 2015
39c253d
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 3, 2015
03bcf20
Merge branch 'group' of github.com:davies/SparkR-pkg into group
davies Mar 3, 2015
ed9a89f
address comments
davies Mar 3, 2015
e8639c3
New 1.3 repo and updates to `column.R`
Mar 3, 2015
2b6f980
shutdown the JVM after R process die
davies Mar 3, 2015
3f22c8d
Merge pull request #195 from cafreeman/sparkr-sql
shivaram Mar 3, 2015
4fa6343
Refactor `join` generic for use with `DataFrame`
Mar 3, 2015
294ca4a
`join`, `sort`, and `filter`
Mar 3, 2015
12a6db2
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 3, 2015
8ff29d6
fix tests
davies Mar 3, 2015
2e7b190
small update on yarn deploy mode.
Mar 3, 2015
32b37d1
Fixed indent in `join` test.
Mar 3, 2015
e14c328
`selectExpr`
Mar 3, 2015
494a4dd
update export
Mar 3, 2015
cd7ac8a
Merge pull request #197 from cafreeman/sparkr-sql
shivaram Mar 3, 2015
74269f3
Merge branch 'dfMethods' into sparkr-sql
Mar 3, 2015
acea146
remove extra line
Mar 3, 2015
7918634
Fix test
Mar 3, 2015
5073e07
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 4, 2015
32aa01d
Merge pull request #191 from felixcheung/doc
shivaram Mar 4, 2015
197a79b
add HiveContext (commented)
davies Mar 4, 2015
4e0becc
Merge pull request #194 from davies/api
shivaram Mar 4, 2015
8de958d
Update SparkRBackend.scala
Mar 4, 2015
d18f9d3
Remove SparkR snapshot build
shivaram Mar 4, 2015
a37fd80
Update sparkR.R
Mar 4, 2015
3865f39
[SPARKR-156] phase 1: implement zipWithUniqueId() of the RDD class.
Mar 4, 2015
bc90115
Fixed docs
Mar 4, 2015
8b9a963
Merge branch 'sparkr-sql' of https://github.com/amplab-extras/SparkR-…
Mar 4, 2015
870acd4
Use rc2 explicitly
shivaram Mar 4, 2015
198c130
Merge pull request #200 from shivaram/sparkr-sql-build
shivaram Mar 4, 2015
6a1fe64
Merge pull request #198 from cafreeman/sparkr-sql
shivaram Mar 4, 2015
20242c4
clean up docs
Mar 4, 2015
0ac4abc
'explain`
Mar 4, 2015
68b11cf
`toJSON`
Mar 4, 2015
779c102
`isLocal`
Mar 4, 2015
3fab0f8
`showDF`
Mar 4, 2015
a5c2887
fix test
Mar 4, 2015
ff8b005
'saveAsParquetFile`
Mar 4, 2015
f10a24e
address comments
davies Mar 4, 2015
6fac596
support Column expression in agg()
davies Mar 4, 2015
5fd9575
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 4, 2015
3675fcf
Update `explain` and fixed doc for `toJSON`
Mar 4, 2015
de2abfa
Merge pull request #202 from cafreeman/sparkr-sql
shivaram Mar 5, 2015
b875b4f
fix style
davies Mar 5, 2015
e9e2a03
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 5, 2015
bb46832
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 5, 2015
8b7fb67
fix HiveContext
davies Mar 5, 2015
d0d4626
Merge pull request #199 from davies/load
shivaram Mar 5, 2015
fb3b139
fix tests
davies Mar 5, 2015
47a613f
Fix HiveContext package name
shivaram Mar 5, 2015
62b0760
Fix test hive context package name
shivaram Mar 5, 2015
4d0fb56
Merge pull request #203 from shivaram/sparkr-hive-fix
shivaram Mar 5, 2015
ac8a852
close monitor connection in sparkR.stop()
davies Mar 5, 2015
3c7674f
Merge branch 'die' of github.com:davies/SparkR-pkg into die
davies Mar 5, 2015
f06ccec
Use mapply() instead of for statement.
Mar 5, 2015
dfb399a
address comments
davies Mar 5, 2015
18c6004
Merge pull request #201 from sun-rui/SPARKR-156_1
shivaram Mar 5, 2015
d8c1c09
add test to start and stop context multiple times
davies Mar 5, 2015
9d01bcd
`dropTempTable`
Mar 5, 2015
befbd32
`insertInto`
Mar 5, 2015
fef99de
`intersect`, `subtract`, `unionAll`
Mar 5, 2015
428a99a
remove test, catch exception
davies Mar 5, 2015
e6fb8d8
improve logging
davies Mar 5, 2015
9dd6a5a
Update SparkRBackendHandler.scala
Mar 5, 2015
bcb0bf5
Merge pull request #180 from davies/group
shivaram Mar 5, 2015
c5fa3b9
New `select` method
Mar 6, 2015
7a5d6fd
`withColumn` and `withColumnRenamed`
Mar 6, 2015
a582810
Merge branch 'dfMethods' into sparkr-sql
Mar 6, 2015
f3d99a6
[SPARKR-156] phase 2: implement zipWithIndex() of the RDD class.
Mar 6, 2015
5e3a576
Fix indentation.
Mar 6, 2015
a8cebf0
Remove print statement in SparkRBackendHandler
shivaram Mar 6, 2015
3f7aed6
Fix minor typos in the function description.
Mar 6, 2015
5eec6fc
Merge pull request #206 from sun-rui/SPARKR-156_2
shivaram Mar 6, 2015
e60578a
update tests to guarantee row order
Mar 6, 2015
789be97
Merge pull request #207 from shivaram/err-remove
shivaram Mar 6, 2015
3db5649
Merge branch 'sparkr-sql' of https://github.com/amplab-extras/SparkR-…
Mar 6, 2015
b4c0b2e
use fork package
davies Mar 6, 2015
dc1291b
Add checks for namespace access operators in cleanClosure.
hlin09 Mar 8, 2015
15a713f
Fix example for `dropTempTable`
Mar 9, 2015
09ff163
Merge pull request #204 from cafreeman/sparkr-sql
shivaram Mar 9, 2015
97dde1a
Add a test for access operators.
hlin09 Mar 9, 2015
89b886d
Move setGeneric() to 00-generics.R.
hlin09 Mar 9, 2015
471c794
Move getJRDD and broadcast's value to 00-generic.R.
hlin09 Mar 9, 2015
ffd6e8e
Merge pull request #210 from hlin09/hlin09
shivaram Mar 9, 2015
6bccbbf
Move roxygen doc back to implementation.
hlin09 Mar 9, 2015
8f8813f
switch back to use parallel
davies Mar 9, 2015
411b751
make RStudio happy
davies Mar 9, 2015
ecdfda1
Remove duplication.
hlin09 Mar 10, 2015
ff948db
Remove missingOrInteger.
hlin09 Mar 10, 2015
01aa5ee
add config for using daemon, refactor
davies Mar 10, 2015
46cea3d
retry
davies Mar 10, 2015
8583968
readFully()
davies Mar 10, 2015
90f2692
Merge pull request #211 from hlin09/generics
shivaram Mar 10, 2015
5757b95
Merge pull request #196 from davies/die
shivaram Mar 11, 2015
4e4908a
createDataFrame from rdd
davies Mar 11, 2015
26a3621
support date.frame and Date/Time
davies Mar 12, 2015
a6dc435
remove dependency of jsonlite
davies Mar 12, 2015
e87bb98
improve comment and logging
davies Mar 12, 2015
9a6be74
include grouping columns in agg()
davies Mar 12, 2015
0467474
add more selecter for DataFrame
davies Mar 12, 2015
3e0555d
Merge pull request #193 from davies/daemon
shivaram Mar 13, 2015
55c38bc
Merge pull request #216 from davies/select2
shivaram Mar 13, 2015
66cc92a
address commets
davies Mar 13, 2015
72adb14
Update SQLContext.R
Mar 13, 2015
8e1497d
Update DataFrame.R
Mar 13, 2015
05b9126
Merge pull request #215 from davies/agg
shivaram Mar 13, 2015
e52258f
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 13, 2015
8bff523
Remove staging repo now that 1.3 is released
shivaram Mar 13, 2015
963c7ee
Merge branch 'master' into merge
davies Mar 13, 2015
d8c8fcc
Merge pull request #219 from shivaram/sparkr-build-final
shivaram Mar 13, 2015
8190127
fixed parquetFile signature
evertlammerts Feb 25, 2015
7695d36
added tests
evertlammerts Feb 25, 2015
1bc2998
Merge pull request #179 from evertlammerts/sparkr-sql
shivaram Mar 13, 2015
662938a
Include utils before SparkR for `head` to work
shivaram Mar 14, 2015
dd52cbc
Merge pull request #220 from shivaram/sparkr-utils-include
shivaram Mar 14, 2015
46454e4
address comments
davies Mar 14, 2015
7f5e70c
Update SerDe.scala
Mar 14, 2015
bc2ff38
handle NULL
davies Mar 14, 2015
6122e0e
handle NULL
davies Mar 14, 2015
3139325
Merge pull request #212 from davies/toDF
shivaram Mar 14, 2015
4b1628d
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
davies Mar 14, 2015
70f620c
address comments
davies Mar 14, 2015
f5d3355
Merge pull request #218 from davies/merge
shivaram Mar 16, 2015
3214c6d
Merge pull request #217 from hlin09/cleanClosureFix
shivaram Mar 16, 2015
6f95d49
Merge pull request #221 from shivaram/sparkr-stop-start
shivaram Mar 16, 2015
5e610cb
add more API for Column
davies Mar 16, 2015
b043876
fix test
davies Mar 16, 2015
2fc553f
Merge pull request #222 from davies/column2
shivaram Mar 17, 2015
44994c2
Moved files to R/
Mar 17, 2015
49a8133
Merge branch 'remote_r' into R
Mar 17, 2015
014d253
delete man pages
Mar 17, 2015
180fc9c
move scala
Mar 17, 2015
3415cc7
move Scala source into core/ and sql/
Mar 17, 2015
df3eeea
move R/examples into examples/src/main/r
Mar 17, 2015
a76472f
fix path of assembly jar
Mar 17, 2015
0a0e632
move sparkR into bin/
Mar 17, 2015
18e5eed
update docs
Mar 17, 2015
facb6e0
add .gitignore for .o, .so, .Rd
Mar 17, 2015
50bff63
add LICENSE header for R sources
Mar 17, 2015
35e5755
reduce size of example data
Mar 17, 2015
e8fc7ca
fix .gitignore
Mar 17, 2015
c4a5bdf
run sparkr tests in Spark
Mar 17, 2015
f403b4a
rm .travis.yml
Mar 17, 2015
ba53b09
support R in spark-submit
Mar 17, 2015
043959e
cleanup
Mar 17, 2015
f7b6936
remove Spark prefix for class
Mar 17, 2015
e4f1937
Remove DFC example
shivaram Mar 17, 2015
ff776aa
Fix style
shivaram Mar 17, 2015
aae881b
fix rat
Mar 17, 2015
2d235d4
Build SparkR with Maven profile
shivaram Mar 17, 2015
716b16f
Merge branch 'R' of https://github.com/amplab-extras/spark into R
shivaram Mar 17, 2015
52ca6e5
Add missing comma
shivaram Mar 17, 2015
479e3fe
change println() to logging
Mar 17, 2015
9f6aa1f
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 17, 2015
0e2412c
fix bin/sparkR
Mar 17, 2015
ea90fab
fix spark-submit with R path and sparkR -h
Mar 18, 2015
d6f2bdd
Fix run-tests path
shivaram Mar 18, 2015
baefd9e
Make bin/sparkR use spark-submit
shivaram Mar 18, 2015
95d2de3
fix spark-submit with R scripot
Mar 18, 2015
05afef0
Only stop backend JVM if R launched it
shivaram Mar 18, 2015
8030847
Set windows file separators, install dirs
shivaram Mar 18, 2015
cb6e5e3
Add scripts to start SparkR on windows
shivaram Mar 18, 2015
a1870e8
Merge pull request #214 from sun-rui/SPARKR-156_3
shivaram Mar 18, 2015
ef26015
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 18, 2015
42d8b4c
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 18, 2015
028cbfb
fix exit code of sparkr unit test
Mar 18, 2015
f04080c
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
Mar 18, 2015
ba4b80b
Merge branch 'remote_r' into R
Mar 18, 2015
56670ef
rm man page
Mar 18, 2015
1a16cd6
rm PROJECT_HOME
Mar 18, 2015
40d193a
Merge pull request #224 from sun-rui/SPARKR-224-new
shivaram Mar 18, 2015
d436f26
add missing files
Mar 18, 2015
756ece0
Update README remove outdated TODO
shivaram Mar 18, 2015
38cbf59
fix test of zipRDD()
Mar 18, 2015
ebd4d07
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 18, 2015
2892e29
support R in YARN cluster
Mar 18, 2015
7da0049
fix build
Mar 18, 2015
ce3ca62
fix license check
Mar 18, 2015
d8b24fc
disable spark and python tests temporary
Mar 18, 2015
410ec18
fix zipRDD() tests
Mar 18, 2015
974e4ea
fix flaky test
Mar 18, 2015
423ea3c
Ignore unknown jobj in cleanup
shivaram Mar 18, 2015
9fb6af3
mark R classes/objects are private
Mar 18, 2015
855537f
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 18, 2015
b44e371
Include RStudio instructions in README
shivaram Mar 18, 2015
05e7375
sort generics
Mar 18, 2015
bdf3a14
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 18, 2015
7100fb9
Fix libPaths in README
shivaram Mar 18, 2015
d87a181
fix flaky tests
Mar 19, 2015
afd8a77
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 19, 2015
0e5a83f
fix code style
Mar 19, 2015
d6d3729
enable spark and pyspark tests
Mar 19, 2015
02b4833
Add a script to generate R docs (Rd, html)
shivaram Mar 19, 2015
1f478c5
Merge branch 'R' of https://github.com/amplab-extras/spark into R
shivaram Mar 19, 2015
6ff5ea2
Add instructions to generate docs
shivaram Mar 19, 2015
52cc92d
Add license to create-docs.sh
shivaram Mar 19, 2015
58276f5
Merge branch 'R' of https://github.com/amplab-extras/spark into R
shivaram Mar 19, 2015
e1f83ab
Send Spark INFO logs to a file in SparkR tests
shivaram Mar 19, 2015
a1493d7
Address comments
shivaram Mar 19, 2015
b21a0da
Merge pull request #1 from shivaram/log4j-tests
Mar 19, 2015
733380d
update R examples (remove master from args)
Mar 19, 2015
3eacfc0
fix flaky test
Mar 19, 2015
85a50ec
Merge pull request #226 from RevolutionAnalytics/master
shivaram Mar 19, 2015
cf5cd99
Remove unused numCols argument
shivaram Mar 20, 2015
104ad4e
Check the right env in exists
shivaram Mar 20, 2015
1f1a7e0
Some fixes to DataFrame, RDD, SQLContext docs
shivaram Mar 20, 2015
d425363
Some doc fixes for column, generics, group
shivaram Mar 20, 2015
bc2d6d8
Remove arg from sparkR.stop and update docs
shivaram Mar 20, 2015
463e28c
Merge pull request #2 from shivaram/doc-fixes
Mar 20, 2015
e089151
Merge pull request #225 from sun-rui/SPARKR-154_2
Mar 19, 2015
a1cedad
Merge pull request #228 from felixcheung/doc
shivaram Mar 21, 2015
b433817
Merge branch 'master' of github.com:apache/spark into R
Mar 25, 2015
6e20e71
address comments
Mar 25, 2015
e88b649
Merge branch 'R' of github.com:amplab-extras/spark into R
Mar 25, 2015
a1777eb
move rules into R/.gitignore
Mar 25, 2015
e7104b6
remove ::: in SparkR
Mar 25, 2015
f8fa8af
mute logging when start/stop context
Mar 25, 2015
19c9368
Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into…
Mar 25, 2015
b045701
Merge branch 'remote_r' into R
Mar 25, 2015
c300e08
remove duplicated file
Mar 25, 2015
11981b7
Update R to fail early if SparkR package is missing
felixcheung Mar 30, 2015
1d1802e
Merge pull request #4 from felixcheung/r-require
shivaram Mar 30, 2015
3487461
Add tests log in .gitignore.
hlin09 Mar 31, 2015
0e788c0
Merge pull request #5 from hlin09/doc-fix
shivaram Mar 31, 2015
940b631
[SPARKR-92] Phase 2: implement sum(rdd)
hqzizania Apr 4, 2015
5133f3a
Merge pull request #7 from hqzizania/R3
shivaram Apr 6, 2015
a18ff5c
Update sparkR.R
Apr 6, 2015
eb5da53
Merge pull request #3 from davies/R2
shivaram Apr 6, 2015
377151f
Merge remote-tracking branch 'apache/master' into R
shivaram Apr 7, 2015
d7c3f22
Address code review comments
shivaram Apr 7, 2015
64eda24
Merge branch 'R' of https://github.com/amplab-extras/spark into R
shivaram Apr 7, 2015
f731b48
Only run SparkR tests if R is installed
shivaram Apr 7, 2015
5581c75
update author of SparkR
Apr 7, 2015
55808e4
fix tests
Apr 7, 2015
59266d1
check exclusive of primary-py-file and primary-r-file
Apr 7, 2015
da64742
fix Date serialization
Apr 9, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ ec2/lib/
rat-results.txt
scalastyle.txt
scalastyle-output.xml
R-unit-tests.log
R/unit-tests.out

# For Hive
metastore_db/
Expand Down
2 changes: 2 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,5 @@ logs
.*scalastyle-output.xml
.*dependency-reduced-pom.xml
known_translations
DESCRIPTION
NAMESPACE
6 changes: 6 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.o
*.so
*.Rd
lib
pkg/man
pkg/html
12 changes: 12 additions & 0 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

You can verify if your changes are good by running

R CMD check pkg/
67 changes: 67 additions & 0 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO comment here to merge this into the existing spark docs?


### SparkR development

#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR

You can start using SparkR by launching the SparkR shell with

./bin/sparkR

The `sparkR` script automatically creates a SparkContext with Spark by default in
local mode. To specify the Spark master of a cluster for the automatically created
SparkContext, you can run

./bin/sparkR --master "local[2]"

To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge, we can update the wiki to include R-specific instructions.

If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/sparkR <filename> <args>`. For example:

./bin/sparkR examples/src/main/r/pi.R local[2]

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh

### Running on YARN
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/pi.R 4
```
13 changes: 13 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Building SparkR on Windows

To build SparkR on Windows, the following steps are required

1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
include Rtools and R in `PATH`.
2. Install
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
`JAVA_HOME` in the system environment variables.
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`
46 changes: 46 additions & 0 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Script to create API docs for SparkR
# This requires `devtools` and `knitr` to be installed on the machine.

# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html

# Figure out where the script is
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this script be wired up so that our normal doc generation invokes it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pushd $FWDIR

# Generate Rd file
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))'

# Install the package
./install-dev.sh

# Now create HTML files

# knit_rd puts html in current working directory
mkdir -p pkg/html
pushd pkg/html

Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")'

popd

popd
27 changes: 27 additions & 0 deletions R/install-dev.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
@echo off

rem
rem Licensed to the Apache Software Foundation (ASF) under one or more
rem contributor license agreements. See the NOTICE file distributed with
rem this work for additional information regarding copyright ownership.
rem The ASF licenses this file to You under the Apache License, Version 2.0
rem (the "License"); you may not use this file except in compliance with
rem the License. You may obtain a copy of the License at
rem
rem http://www.apache.org/licenses/LICENSE-2.0
rem
rem Unless required by applicable law or agreed to in writing, software
rem distributed under the License is distributed on an "AS IS" BASIS,
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

rem Install development version of SparkR
rem

set SPARK_HOME=%~dp0..

MKDIR %SPARK_HOME%\R\lib

R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\
36 changes: 36 additions & 0 deletions R/install-dev.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This scripts packages the SparkR source files (R and C files) and
# creates a package that can be loaded in R. The package is by default installed to
# $FWDIR/lib and the package can be loaded by using the following command in R:
#
# library(SparkR, lib.loc="$FWDIR/lib")
#
# NOTE(shivaram): Right now we use $SPARK_HOME/R/lib to be the installation directory
# to load the SparkR package on the worker nodes.


FWDIR="$(cd `dirname $0`; pwd)"
LIB_DIR="$FWDIR/lib"

mkdir -p $LIB_DIR

# Install R
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/
28 changes: 28 additions & 0 deletions R/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the file target/unit-tests.log
log4j.rootCategory=INFO, file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.append=true
log4j.appender.file.file=R-unit-tests.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
org.eclipse.jetty.LEVEL=WARN
35 changes: 35 additions & 0 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Package: SparkR
Type: Package
Title: R frontend for Spark
Version: 1.4.0
Date: 2013-09-09
Author: The Apache Software Foundation
Maintainer: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Imports:
methods
Depends:
R (>= 3.0),
methods,
Suggests:
testthat
Description: R frontend for Spark
License: Apache License (== 2.0)
Collate:
'generics.R'
'jobj.R'
'SQLTypes.R'
'RDD.R'
'pairRDD.R'
'column.R'
'group.R'
'DataFrame.R'
'SQLContext.R'
'broadcast.R'
'context.R'
'deserialize.R'
'serialize.R'
'sparkR.R'
'backend.R'
'client.R'
'utils.R'
'zzz.R'
Loading