MTA Subway Alert Affected Riders

Project for the MTA Open Data Challenge 2024.

This project provides an interactive visualization platform (mta-subway-alert-affected-riders.vercel.app) that maps the relationship between MTA subway service disruptions and ridership patterns. By correlating service alerts with station entry data, we visualize the number of riders potentially affected by service disruptions through an interactive heatmap and station-level grid cells.

Watch the full timelapse on YouTube

The analysis spans 31 months (February 2022 through August 2024), allowing users to select any date and explore:

A dynamic heatmap showing potentially affected ridership across the subway system
A station-level grid cell map for detailed analysis
Interactive timeline views for each station showing relevant service alerts throughout the day
Hover functionality revealing detailed station and alert information

Important Disclaimer: My estimation provides an upper bound of affected riders. The actual number of riders affected is likely significantly lower because:

Not all stops along a disrupted line are necessarily affected by the reported incident

Some riders may have alternative routes available

The alert may affect only a specific segment of the line

Some riders may have been informed of the disruption before entering the station

Datasets

This repository contains code to process New York MTA (Metropolitan Transportation Authority) data from three main datasets:

MTA Subway Stations - Station locations and route information
MTA Service Alerts - Real-time service alerts and disruptions (2020-04-28 to 2024-08-30 when accessed)
MTA Subway Hourly Ridership - Hourly ridership data by station (2022-02-01 to 2024-10-01 when accessed)

Note: This project analyzes the overlapping period between the alerts and ridership datasets (2022-02-01 to 2024-08-30).

Data Preparation

Pre-processed Data

For convenience, a pre-processed version of the data is available in on Google Drive. You can use these files directly if you don't need to process the raw data yourself.

Processing Raw Data

Due to size limitations, the original datasets are not included in this repository. Please download them from the official sources linked above.

Install required dependencies:

pip install -r requirements.txt

Download the CSV files from the links above and place them in a datasets folder with the following names:
- MTA_Subway_Stations_20241024.csv
- MTA_Service_Alerts__Beginning_April_2020_20241014.csv
- MTA_Subway_Hourly_Ridership__Beginning_February_2022_20241014.csv
Run the data preparation script.

python data-preparation.py

What does the script do

The script data-preparation.py processes these CSV files and generates TSV files. The processed files will be created in the /data folder:

mta_stations.tsv
mta_subway_alerts.tsv
mta_subway_hourly_ridership.tsv

The script performs several key transformations:

Filters for subway lines of interest
Converts timestamps to standardized format
Structures station information for geospatial analysis
Prepares data in TSV format optimized for PostgreSQL import

Populating Database

Create Database Schema

First, create the database tables and indices by running the SQL schema:

psql -d your_database_name -f db-schema.sql

This creates:

Custom enum type for subway lines
Three main tables with appropriate constraints and indices
GIN indices for efficient array operations
B-tree indices for timestamp-based queries

Database Structure

The schema defines three main tables:

subway_stops: Stores station information
- Primary key: complex_id
- Contains: station coordinates and served subway lines
- Includes spatial validation constraints for coordinates
- GIN index on lines array for efficient line-based queries
mta_alerts: Stores service disruption alerts
- Primary key: alert_id
- Contains: alert details, timestamp, and affected subway lines
- Includes unique constraint on alert and event IDs
- Indexed on timestamp and affected_lines for efficient temporal and line-based queries
hourly_ridership: Stores station entry data
- Composite primary key: (timestamp, complex_id)
- Contains: hourly ridership counts for each station
- Foreign key relationship with subway_stops
- Indexed for efficient temporal and station-based queries

Loading Data

After creating the schema, populate the tables with the TSV files generated from the data preparation step:

-- Load subway stations data
\copy subway_stops FROM 'data/mta_stations.tsv' WITH DELIMITER E'\t' CSV HEADER;

-- Load service alerts
\copy mta_alerts FROM 'data/mta_subway_alerts.tsv' WITH DELIMITER E'\t' CSV HEADER;

-- Load hourly ridership data
\copy hourly_ridership FROM 'data/mta_subway_hourly_ridership.tsv' WITH DELIMITER E'\t' CSV HEADER;

The schema includes appropriate indices and constraints to ensure data integrity and query performance. The GIN indices on array columns (lines and affected_lines) are particularly important for efficiently finding stations affected by specific service disruptions.

Deploy the Website

The visualization platform is built with NextJS, which provides both the frontend interface and backend API endpoints to query the database.

Note: The core data aggregation logic for calculating potentially affected ridership is implemented in the backend API router at website/src/server/api/routers/mta-alert.ts. This TypeScript file contains the queries and algorithms for:

Correlating alerts with station ridership

Calculating temporal overlaps

Aggregating affected passenger counts

Setup Development Environment

Navigate to the website directory:

cd website

Install dependencies:

npm install

Configure database connection:
- Create a .env file in the website directory
- Add your database connection URL to the .env file:

DATABASE_URL="postgresql://username:password@host:port/database"

Development

Run the development server:

npm run dev

The site will be available at http://localhost:3000. The development server includes:

Hot reloading for real-time code changes
API route testing
Development error messages

Production Deployment

Build the production version:

npm run build

After building, you can start the production server:

npm run start

The website (mta-subway-alerts-affected-rider.vercel.app) provides:

Interactive heatmap of potentially affected ridership
Station-level grid cell visualization
Timeline views of service alerts
Date selection for historical analysis

Note: Ensure your database is accessible from your deployment environment and the DATABASE_URL is properly configured in your production environment.

Preliminary Data Analysis

The /analysis directory contains exploratory visualizations examining the relationship between MTA service alerts and ridership patterns from February 2022 through August 2024.

While these visualizations offer insights into weekly patterns, monthly trends, and correlation between alerts and affected ridership, they use a simplified methodology that provides upper-bound estimates.

The analysis counts riders entering stations with disrupted lines within 30 minutes of alerts, but could be improved by considering specific line segments affected, alternative routes, alert severity, and transfer patterns. For a more detailed view, visit the interactive visualization platform at (mta-subway-alerts-affected-rider.vercel.app), which implements some of these methodological improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
analysis		analysis
assets		assets
data		data
datasets		datasets
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data-preparation.py		data-preparation.py
db-schema.sql		db-schema.sql
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTA Subway Alert Affected Riders

Datasets

Data Preparation

Pre-processed Data

Processing Raw Data

What does the script do

Populating Database

Create Database Schema

Database Structure

Loading Data

Deploy the Website

Setup Development Environment

Development

Production Deployment

Preliminary Data Analysis

About

Releases

Packages

Languages

yz3440/mta-subway-alert-affected-riders

Folders and files

Latest commit

History

Repository files navigation

MTA Subway Alert Affected Riders

Datasets

Data Preparation

Pre-processed Data

Processing Raw Data

What does the script do

Populating Database

Create Database Schema

Database Structure

Loading Data

Deploy the Website

Setup Development Environment

Development

Production Deployment

Preliminary Data Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages