The purpose of this project is develop and end to end solution using various tools that can stream stock market data in real time that can be used for reporting and analysis
- Create a new EC2 instance to install Java & Kafka on
- Create new Kafka Topic, and start Producer & Consumer
- Develop Python scripts to pass data for each stock ticker to a data frame using pandas and upload to S3 in JSON format
- Design AWS Glue Crawler to crawl the bucket in the data lake to check for new data in real time
- Utilize Athena to confirm data counts are increasing on tables as data streams in real time to AWS S3 & gets ingested from the Glue Crawler