Skip to content

Running Apache Spark Structured Streaming job on the local machine with an HTTP web server as a streaming source.

License

Notifications You must be signed in to change notification settings

cchandurkar/spark-http-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark HTTP Streaming

This project demonstrates how you can use a local HTTP server as a streaming source to debug a Structured Streaming job on local machine. The idea is to have spark app start a local HTTP server and put the ingested data on MemoryStream and use it as a streaming source.

Note that this is for testing and running locally only. Since it uses Memory Stream underneath, it is not fault-tolerant. Refer to the fault-tolerance semantics in structured streaming.

For more details please refer to the blog post:
Spark Streaming with HTTP REST endpoint serving JSON data

How to use

  1. Run the HttpStreamApp spark application
  2. POST sample JSON data to http://localhost:9999

Demo

Watch: https://www.youtube.com/watch?v=Y9g4oj5GH5k
You will see that the spark app ingest that data in micro-batches of Structured Streaming and displays it.

About

Running Apache Spark Structured Streaming job on the local machine with an HTTP web server as a streaming source.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages