Skip to content

easadler/capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pronto! Predicted

Galvanize Capstone Project

January 2016

Evan Sadler

Pronto! Predicted was built to help Seattle's cycle sharing program better reshuffle there bicycles. They have been having a difficult time and after hearing they do not use predictive analytics from an employee, I set out to make the single dispatcher's life a little easier using the power of data science!

Pronto! Predicting

Let's see how the application works. Using Pronto's JSON feed to get the number of available bikes at each station and scraping hourly weather forecasts, Pronto! Predicted is already prepared to predict the future as soon as it loads.

When you arrive on the website, you will have the option to look four time periods ahead. Then click "Go!".

Pronto! Predicted Viz

After you click "Go!" the circles will shift according to there predicted size.

Interpreting the Map

The circles represent a unique station and the sizes of the circles represent the number of bikes currently docked at a station. They are comparable across all stations, no matter the number of docks at a station. On the other hand, the colors represent how full each station is relative to the number of viable available docks. The colors mean as follows:

  • Blue: Full
  • Green: 50% full
  • Red: empty

The color scale slides between the colors, so use blended colors to indicate between the three states above. At anytime, hover on a circle to get the station ID and predicted count.

How it works

I started with the data from Pronto's data competition. They provide 1-year of trips and meta information about each station.

date-time Starting Station Ending Station
1/12/2015 8:00:00 CH-04 BS-01
1/12/2015 11:00:00 PS-02 SLU-03

I then transformed the data with the following steps in order to create a dataset for supply * a data set for demand:

  1. Scrub data
  2. Groupby by date, hour, ending station (supply) or starting station (demand)
  3. Impute date-times for every station with zero rentals
  4. Bin hours to balance classes
  5. Create & Download Features:
  • Create lagged variables
  • Hourly weather data (NOAA)
  • Elevations (Google API)
  1. Remove outliers

note

I did not try to determine when rides were intended to enter or leave from a full or empty station. There is another data set, which I could use to help. Getting user ID's for each trip would be very useful for this problem a well.

I then train my models using the data sets that I created. First, I use a binary classfier to classify if a ride occured, then pipe all the predicted rides over to a 4 way classifier, which is trained on the subset of the data with positive rides. The following images outline the models:

Forecasting Inventory

Forecasting Method

Predictive Model

Predictive Model

Scoring Model

  • The binary classifiers had AUC's of approximately 0.85.
  • The 4-way classifiers had accuracies of approximately 62%

About

Pronto bike sharing project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published