Made by : Ines Achour / Safa Laabidi / Amal Sammari
In this project we made a pipeline to process the Global Terrorism Database from Kaggle.
The pipeline includes batch and stream processing that's why it's based on the Lambda Architecture.
- Kafka
- Streaming : Spark Streaming
- Batch : Hadoop MapReduce
- Streaming : MongoDB
- Batch : HDFS (data before processing) & MongoDB (data after processing)
- Dashboarding : MongoDB Charts
- GlobalTerrorism_Stream
- GlobalTerrorism_Batch
- GlobalTerrorism_Kafka_Stream
- GlobalTerrorism_Kafka_Batch : append the sent data from Kafka to the database csv file
- GlobalTerrorism_Batch_MongoDB : launch the batch process on the csv database and save the result in MongoDB database
- GlobalTerrorism_Kafka_MongoDB : receive streaming data, process them and save result in MongoDB database
We used MongoDB Charts for visualization.