Working repository for CASA0007 written investigation assignment. The submitted report can be found here.
The analysis is conducted using data from Transport for London (TfL). Each part of the following is conducted in a separate Jupyter notebook.
- bike_points.ipynb collects docking station data from TfL Unified API, and converts into geospatial data for further analysis
- journeys_data_nogeom.ipynb collects journey data from TfL data storage, and converts into dataframe for analysis
- analysis.ipynb does the actual analysis
The datasets required for analysis are prepared with the 2 notebooks for data preparation, and should be run in this order to ensure reproducibility.
This report explores the relationship between the elevation and the journeys taken by the London Cycle Hire Scheme, otherwise known as the Santander Cycles.
2 perspectives considered for the analysis:
- The first half focuses on individual docking stations. We explored the ratio between departures and arrivals, assuming that places with high elevation has more journeys starting than terminating
- The second half focuses on origin-destination pairs of journeys. We considered the impact of height difference to the frequency of journeys from one MSOA to another, assuming downhill journeys occur more frequently than uphills.
2 main findings from analysis.
The higher the elevation, the ratio of departures over arrivals was higher, indicating usage taking advantage of downhill journeys.
No significance can be observed when considering height difference. Distance (or the negative exponential of it) had a dominant effect, and the residuals did not have significant correlation with the height difference.
Considering the cumulative difference, not the origin-destination difference in height, may impact the decisions.