Skip to content

Commit

Permalink
Merge pull request apache#117 from mesosphere/sparkr-docs
Browse files Browse the repository at this point in the history
Added documentation for SparkR
  • Loading branch information
susanxhuynh authored Jan 30, 2017
2 parents 20a00f8 + 7757a70 commit e2e9c64
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 7 deletions.
7 changes: 2 additions & 5 deletions docs/limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,15 @@ feature_maturity: stable
enterprise: 'no'
---

* DC/OS Spark only supports submitting jars and Python scripts. It
does not support R.

* Mesosphere does not provide support for Spark app development,
such as writing a Python app to process data from Kafka or writing
such as writing a Python app to process data from Kafka or writing
Scala code to process data from HDFS.

* Spark jobs run in Docker containers. The first time you run a
Spark job on a node, it might take longer than you expect because of
the `docker pull`.

* DC/OS Spark only supports running the Spark shell from within a
DC/OS cluster. See the Spark Shell section for more information.
DC/OS cluster. See the Spark Shell section for more information.
For interactive analytics, we recommend Zeppelin, which supports visualizations and dynamic
dependency management.
4 changes: 4 additions & 0 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ enterprise: 'no'

$ dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"

1. Run an R Spark job:

$ dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"

1. View your job:

Visit the Spark cluster dispatcher at
Expand Down
3 changes: 2 additions & 1 deletion docs/run-job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@ more][13].

$ dcos spark run --submit-args=`--class MySampleClass http://external.website/mysparkapp.jar 30`


$ dcos spark run --submit-args="--py-files mydependency.py http://external.website/mysparkapp.py 30"

$ dcos spark run --submit-args="http://external.website/mysparkapp.R"

`dcos spark run` is a thin wrapper around the standard Spark
`spark-submit` script. You can submit arbitrary pass-through options
to this script via the `--submit-args` options.
Expand Down
11 changes: 10 additions & 1 deletion docs/spark-shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ enterprise: 'no'
# Interactive Spark Shell

You can run Spark commands interactively in the Spark shell. The Spark shell is available
in either Scala or Python.
in either Scala, Python, or R.

1. SSH into a node in the DC/OS cluster. [Learn how to SSH into your cluster and get the agent node ID](https://dcos.io/docs/latest/administration/access-node/sshcluster/).

Expand All @@ -27,6 +27,10 @@ in either Scala or Python.

$ ./bin/pyspark --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.4-2.0.1 --conf spark.mesos.executor.home=/opt/spark/dist

Or, run the R Spark shell.

$ ./bin/sparkR --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.6 --conf spark.mesos.executor.home=/opt/spark/dist

1. Run Spark commands interactively.

In the Scala shell:
Expand All @@ -38,3 +42,8 @@ in either Scala or Python.

$ textFile = sc.textFile("/opt/spark/dist/README.md")
$ textFile.count()

In the R shell:

$ df <- as.DataFrame(faithful)
$ head(df)

0 comments on commit e2e9c64

Please sign in to comment.