This project demonstrates how you can use a local HTTP server as a streaming source to debug a Structured Streaming job on local machine. The idea is to have spark app start a local HTTP server and put the ingested data on MemoryStream and use it as a streaming source.
Note that this is for testing and running locally only. Since it uses Memory Stream underneath, it is not fault-tolerant. Refer to the fault-tolerance semantics in structured streaming.
For more details please refer to the blog post:
Spark Streaming with HTTP REST endpoint serving JSON data
- Run the
HttpStreamApp
spark application POST
sample JSON data tohttp://localhost:9999
Watch: https://www.youtube.com/watch?v=Y9g4oj5GH5k
You will see that the spark app ingest that data in micro-batches of Structured Streaming and displays it.