Go by Bus is an application for storing/analyzing communication data from Warsaw's open data platform.
It uses current GPS location of trams, bus stops locations & timetables to achieve following goals:
- Store current & historical tram positions for analysis and ML [done]
- Store timetables data for analysing line delays [done]
- Visualize current & historical locations of queried tram using Google Maps API [done]
- Current tram locations as a stream of data from Kafka [done]
- Finding anomalies in traffic and calculate communication delays in Spark [TODO 1]
- Stores nearest & historical weather info thanks to yr.no API and enriches delay analysis [TODO 2]
- Visualizes timetables for selected line [TODO 3]
- We are using Microservices with Java 8 + Spring Cloud based on Docker
- CQRS architecture is applied (heart of system is Apache Kafka)
- Apache Kafka for real-time locations stream
- Configuration is stored in central Spring Config Service
- Service logs + GC logs are connected to ELK (ElasticSearch + LogStash + Kibana). But no visualisations yet.
- Data storing done in MongoDB
- Docker as a container service, and docker-compose for getting up the environment for now.
- Apache Spark as a main data analysis tool - module SparkPositionAnalyzer need a lot of development thought
- Simple long-time-running master version is deployed to AWS using docker-machine
- Gradle as a build tool
- Refactor and develop more completed Spark queries
- Introduce new datasource - Weather data from yr.no
- Create separate service for timetables data based on GraphQL
- Introduce cross-service user tracking with Zipkin
- Prepare Kibana log visualizations
- Introduce more complex orchestrating tool ei. Kubernetes
- Introduce node monitoring - Zabbix
- Install Docker and docker-compose
- Increase vm.max_map_count for your machine due to ELK requirements
- Create your account and generate API key on Warsaw's open data platform.
- Put your API key in
secret-keys.properties
file in main dir as aWARSAW_API_KEY=
property docker compose up -d
in main dir- Have fun :)
Bare in mind that solution is pretty complexed and drains a lot of resources. On i7 + 16 GB RAM it's ok. Some clustering will be introduced in future for sure