Skip to content

Commit

Permalink
SPARK-1173. Improve scala streaming docs.
Browse files Browse the repository at this point in the history
Clarify imports to add implicit conversions to DStream and
fix other small typos in the streaming intro documentation.

Tested by inspecting output via a local jekyll server, c&p'ing the scala commands into a spark terminal.

Author: Aaron Kimball <aaron@magnify.io>

Closes #64 from kimballa/spark-1173-streaming-docs and squashes the following commits:

6fbff0e [Aaron Kimball] SPARK-1173. Improve scala streaming docs.
  • Loading branch information
Aaron Kimball authored and tdas committed Mar 20, 2014
1 parent 8849a96 commit 156bcd7
Showing 1 changed file with 33 additions and 5 deletions.
38 changes: 33 additions & 5 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,21 @@ do is as follows.

<div class="codetabs">
<div data-lang="scala" markdown="1" >
First, we import the names of the Spark Streaming classes, and some implicit
conversions from StreamingContext into our environment, to add useful methods to
other classes we need (like DStream).

First, we create a
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object,
which is the main entry point for all streaming
functionality. Besides Spark's configuration, we specify that any DStream will be processed
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
main entry point for all streaming functionality.

{% highlight scala %}
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
{% endhighlight %}

Then we create a
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
Besides Spark's configuration, we specify that any DStream will be processed
in 1 second batches.

{% highlight scala %}
Expand Down Expand Up @@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

// Print a few of the counts to the console
wordCount.print()
wordCounts.print()
{% endhighlight %}

The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
Expand Down Expand Up @@ -262,6 +272,24 @@ Time: 1357008430000 ms
</td>
</table>

If you plan to run the Scala code for Spark Streaming-based use cases in the Spark
shell, you should start the shell with the SparkConfiguration pre-configured to
discard old batches periodically:

{% highlight bash %}
$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell
{% endhighlight %}

... and create your StreamingContext by wrapping the existing interactive shell
SparkContext object, `sc`:

{% highlight scala %}
val ssc = new StreamingContext(sc, Seconds(1))
{% endhighlight %}

When working with the shell, you may also need to send a `^D` to your netcat session
to force the pipeline to print the word counts to the console at the sink.

***************************************************************************************************

# Basics
Expand Down

0 comments on commit 156bcd7

Please sign in to comment.