Merge pull request apache#78 from mesosphere/add-pyspark-documentation

Documented Python support and Spark shell.
Adikteev · Oct 26, 2016 · d54418e · d54418e
2 parents 6433570 + 5e09c7c
commit d54418e
Showing 1 changed file with 48 additions and 3 deletions.
diff --git a/docs/user-docs.md b/docs/user-docs.md
@@ -18,6 +18,7 @@ DC/OS Spark includes:
 *   [Mesos Cluster Dispatcher][2]
 *   [Spark History Server][3]
 *   DC/OS Spark CLI
+*   Interactive Spark shell
 
 ## Benefits
 
@@ -59,6 +60,10 @@ dispatcher and the history server
 
         $ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.4.0-SNAPSHOT.jar 30"
 
+1.  Run a Python Spark job:
+
+        $ dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
+
 1.  View your job:
 
     Visit the Spark cluster dispatcher at
@@ -508,6 +513,10 @@ more][13].
 
         $ dcos spark run --submit-args=`--class MySampleClass http://external.website/mysparkapp.jar 30`
 
+    Or, for a Python job
+
+        $ dcos spark run --submit-args="http://external.website/mysparkapp.py 30"
+
     `dcos spark run` is a thin wrapper around the standard Spark
 `spark-submit` script. You can submit arbitrary pass-through options
 to this script via the `--submit-args` options.
@@ -555,6 +564,42 @@ To set Spark properties with a configuration file, create a
 `spark-defaults.conf` file and set the environment variable
 `SPARK_CONF_DIR` to the containing directory. [Learn more][15].
 
+<a name="pysparkshell"></a>
+# Interactive Spark Shell
+
+You can run Spark commands interactively in the Spark shell. The Spark shell is available
+in either Scala or Python.
+
+1. SSH into a node in the DC/OS cluster. [Learn how to SSH into your cluster and get the agent node ID](https://dcos.io/docs/latest/administration/access-node/sshcluster/).
+
+        $ dcos node ssh --master-proxy --mesos-id=<agent-node-id>
+
+1. Run a Spark Docker image.
+
+        $ docker pull mesosphere/spark:1.0.4-2.0.1
+
+        $ docker run -it --net=host mesosphere/spark:1.0.4-2.0.1 /bin/bash
+
+1. Run the Scala Spark shell from within the Docker image.
+
+        $ ./bin/spark-shell --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.4-2.0.1 --conf spark.mesos.executor.home=/opt/spark/dist
+
+    Or, run the Python Spark shell.
+
+        $ ./bin/pyspark --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.4-2.0.1 --conf spark.mesos.executor.home=/opt/spark/dist
+
+1. Run Spark commands interactively.
+
+    In the Scala shell:
+
+        $ val textFile = sc.textFile("/opt/spark/dist/README.md")
+        $ textFile.count()
+
+    In the Python shell:
+
+        $ textFile = sc.textFile("/opt/spark/dist/README.md")
+        $ textFile.count()
+
 <a name="uninstall"></a>
 # Uninstall
 
@@ -628,14 +673,14 @@ output:
 <a name="limitations"></a>
 # Limitations
 
-*   DC/OS Spark only supports submitting jars.  It does not support
-Python or R.
+*   DC/OS Spark only supports submitting jars and Python scripts. It
+does not support R.
 
 *   Spark jobs run in Docker containers. The first time you run a
 Spark job on a node, it might take longer than you expect because of
 the `docker pull`.
 
-*   Spark shell is not supported. For interactive analytics, we
+*   For interactive analytics, we
 recommend Zeppelin, which supports visualizations and dynamic
 dependency management.