The aim of this full-stack project is to predict and visualize crowdedness for 1 week ahead in 3 metro stations of Amsterdam: Centraal Station, Station Zuid and Station Bijlmer ArenA. Except for the number of check-ins & check-outs for each station, external factors are considered such as weather, events, holidays, vacations and COVID-19 pandemic.
The project consists of the following components:
instagram-event-scraper
→ scraper for events from Instagram using instagram's public URLsticketmaster-event-fetcher
→ fetcher for events from Ticketmaster APImodel
→ back-end and front-end for making predictionsdata_utils.py
→ helper functions for data manipulation and loggingmodel_utils.py
→ functions for model pipelinepredictions.ipynb
→ notebook for running model pipelinepredictions_server.py
→ Flask server for running model pipelineUI
→ front-end for running model pipeline
- Read and preprocess data
- Merge data of external factors (e.g. weather) with check-ins & check-outs per hour
- Interpolate missing check-ins & check-outs by using Random Forest algorithm
- Split dataset into training, validation and test set
- Create a separate Random Forest model for each of the 3 metro stations
- Train each model with historical data (X)
- Predict the check-ins & check-outs for each hour for 1 week ahead (Y)
- Python 3.7+
- All the libraries included in
requirements.txt
- Run
pip install -r requirements.txt
- Datasets for check-ins & check-outs (
model/data/gvb/
&model/data/gvb-herkomst/
), ** weather** (model/data/knmi/
) and events (model/data/events/
) are expected to be inmodel/
as per this directory structure:
model
└───data
└───gvb
│ └───<year>
│ │ └───<month_number>
│ │ │ └───<day_number>
│ │ │ │ <csv_or_json.gz>
│ │ │ │ ...
│ │ │
│ │ └───...
│ └───...
└───gvb-herkomst
│ └───<year>
│ │ └───<month_number>
│ │ │ └───<day_number>
│ │ │ │ <csv_or_json.gz>
│ │ │ │ ...
│ │ │
│ │ └───...
│ └───...
└───knmi
│ └───knmi
│ │ └───<year>
│ │ │ └───<month_number>
│ │ │ │ └───<day_number>
│ │ │ │ │ <json>
│ │ │ │ │ ...
│ │ │ │
│ │ │ └───...
│ │ └───...
│ └───knmi-observations
│ └───<year>
│ │ └───<month_number>
│ │ │ └───<day_number>
│ │ │ │ <json>
│ │ │ │ ...
│ │ │
│ │ └───...
│ └───...
└───events
│ events_zuidoost.xlsx
│
└───instagram
│ │ <csv>
│ │ ...
│
└───ticketmaster
│ <csv>
│ ...
- WARNING: For the model to produce valid predictions, check-ins & check-outs (
model/data/gvb/
&model/data/gvb-herkomst/
) and weather data (model/data/knmi/
) should be manually up-to-date
- Modify
usernames
array inscraper.py
to include the usernames of the accounts which you want to be scraped - Go to
instagram-event-scraper/
and runpython scraper.py
- After execution,
instagram-event-scraper/events.csv
will be updated with the scraped events
- Create
ticketmaster-event-fetcher/config.py
containingapi_key=EXAMPLE
whereEXAMPLE
is a placeholder for your Ticketmaster API key - Modify
year_to_fetch
variable infetcher.py
to fetch events for the year of your choice - Go to
ticketmaster-event-fetcher/
and runpython fetcher.py
- After execution, a file with format
ticketmaster-event-fetcher/events_amsterdam_center_DATE_TIME_UTC.csv
will be created with the fetched events
- Using
model/predictions.ipynb
:- Modify
config.ini
for the model to use the feature configuration of your choice - Run
model/predictions.ipynb
- See below bullet point "After execution"
- Modify
- Using front-end and back-end server:
- Go to
model/
, runpython predictions_server.py
and wait for the server output to show "Preprocessing finished" and be up - Go to
model/UI/
, runpython test.py
and wait for the front-end server to be up - Open the URL of the front-end server on a browser
- Choose your desired parameters for the model and press "Submit"
- After execution
- If you press to any of the 3 available metro stations in the map, the graph should be updated with the current predictions
- Each station's folder in
model/output/
will be updated with a new file with formatprediction_next_week_CURRENT-DATE.csv
which will contain the current predictions - NOTE: Only if you ran the model using
model/predictions.ipynb
notebook, thenmodel/output/models_log.csv
will be updated with the model's parameters and metrics
- Go to