Amsterdam Metro Crowdedness Prediction

The aim of this full-stack project is to predict and visualize crowdedness for 1 week ahead in 3 metro stations of Amsterdam: Centraal Station, Station Zuid and Station Bijlmer ArenA. Except for the number of check-ins & check-outs for each station, external factors are considered such as weather, events, holidays, vacations and COVID-19 pandemic.

Description

The project consists of the following components:

instagram-event-scraper → scraper for events from Instagram using instagram's public URLs
ticketmaster-event-fetcher → fetcher for events from Ticketmaster API
model → back-end and front-end for making predictions
- data_utils.py → helper functions for data manipulation and logging
- model_utils.py → functions for model pipeline
- predictions.ipynb → notebook for running model pipeline
- predictions_server.py → Flask server for running model pipeline
- UI → front-end for running model pipeline

Model Pipeline

Read and preprocess data
Merge data of external factors (e.g. weather) with check-ins & check-outs per hour
Interpolate missing check-ins & check-outs by using Random Forest algorithm
Split dataset into training, validation and test set
Create a separate Random Forest model for each of the 3 metro stations
Train each model with historical data (X)
Predict the check-ins & check-outs for each hour for 1 week ahead (Y)

Getting Started

Dependencies

Python 3.7+
All the libraries included in requirements.txt

Installing

Run pip install -r requirements.txt
Datasets for check-ins & check-outs (model/data/gvb/ & model/data/gvb-herkomst/), ** weather** (model/data/knmi/) and events (model/data/events/) are expected to be in model/ as per this directory structure:

model
└───data
    └───gvb
    │   └───<year>
    │   │   └───<month_number>
    │   │   │   └───<day_number>
    │   │   │       │   <csv_or_json.gz>
    │   │   │       │   ...
    │   │   │
    │   │   └───...
    │   └───...
    └───gvb-herkomst
    │   └───<year>
    │   │   └───<month_number>
    │   │   │   └───<day_number>
    │   │   │       │   <csv_or_json.gz>
    │   │   │       │   ...
    │   │   │
    │   │   └───...
    │   └───...
    └───knmi
    │   └───knmi
    │   │   └───<year>
    │   │   │   └───<month_number>
    │   │   │   │   └───<day_number>
    │   │   │   │       │   <json>
    │   │   │   │       │   ...
    │   │   │   │
    │   │   │   └───...
    │   │   └───...
    │   └───knmi-observations
    │       └───<year>
    │       │   └───<month_number>
    │       │   │   └───<day_number>
    │       │   │       │   <json>
    │       │   │       │   ...
    │       │   │
    │       │   └───...
    │       └───...
    └───events
        │   events_zuidoost.xlsx
        │
        └───instagram
        │   │   <csv>
        │   │   ...
        │
        └───ticketmaster
            │   <csv>
            │   ...

WARNING: For the model to produce valid predictions, check-ins & check-outs (model/data/gvb/ & model/data/gvb-herkomst/) and weather data (model/data/knmi/) should be manually up-to-date

Executing programs

instagram-event-scraper

Modify usernames array in scraper.py to include the usernames of the accounts which you want to be scraped
Go to instagram-event-scraper/ and run python scraper.py
After execution, instagram-event-scraper/events.csv will be updated with the scraped events

ticketmaster-event-fetcher

Create ticketmaster-event-fetcher/config.py containing api_key=EXAMPLE where EXAMPLE is a placeholder for your Ticketmaster API key
Modify year_to_fetch variable in fetcher.py to fetch events for the year of your choice
Go to ticketmaster-event-fetcher/ and run python fetcher.py
After execution, a file with format ticketmaster-event-fetcher/events_amsterdam_center_DATE_TIME_UTC.csv will be created with the fetched events

model

Using model/predictions.ipynb:
- Modify config.ini for the model to use the feature configuration of your choice
- Run model/predictions.ipynb
- See below bullet point "After execution"
Using front-end and back-end server:
- Go to model/, run python predictions_server.py and wait for the server output to show "Preprocessing finished" and be up
- Go to model/UI/, run python test.py and wait for the front-end server to be up
- Open the URL of the front-end server on a browser
- Choose your desired parameters for the model and press "Submit"
- After execution
  - If you press to any of the 3 available metro stations in the map, the graph should be updated with the current predictions
  - Each station's folder in model/output/ will be updated with a new file with format prediction_next_week_CURRENT-DATE.csv which will contain the current predictions
  - NOTE: Only if you ran the model using model/predictions.ipynb notebook, then model/output/models_log.csv will be updated with the model's parameters and metrics

Acknowledgments

instagram-scraper

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.idea		.idea
instagram-event-scraper		instagram-event-scraper
model		model
ticketmaster-event-fetcher		ticketmaster-event-fetcher
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amsterdam Metro Crowdedness Prediction

Description

Model Pipeline

Getting Started

Dependencies

Installing

Executing programs

instagram-event-scraper

ticketmaster-event-fetcher

model

Acknowledgments

About

Releases

Packages

Contributors 5

Languages

AntonisKl/Amsterdam-Metro-Crowdedness-Prediction

Folders and files

Latest commit

History

Repository files navigation

Amsterdam Metro Crowdedness Prediction

Description

Model Pipeline

Getting Started

Dependencies

Installing

Executing programs

instagram-event-scraper

ticketmaster-event-fetcher

model

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages