This repository holds the source code for the semester project of group 26 for the course Big Data Analytics at Copenhagen Business School.
- PostgreSQL with the PostGIS extension installed
- PSQL command line interface
- Ruby with the 'pg' gem installed
- A Unix based operating system to run the shell scripts
- Clone, fork or download this repository.
- Set up a Postgres database and install the PostGIS extension.
Then set the database variables in
imports/zipcode_importer.rb
,import_green_data.sh
andimport_uber_data.sh
to be able to connect to your database. - Create the database schema by running
database_setup/create_tables.sql
. - Download the datasets by running
imports/download_green_data.sh
andimports/download_uber_data.sh
. This createsimports/green_data
andimports/uber_data
directories. - Import the Green Taxi and Uber data to your database by running
imports/import_green_data.sh
andimports/import_uber_data
. - Import the zip code polygons by running
imports/zipcode_importer.rb
. - To complete the setup, run
database_setup/combine_rides.sql
to populate the aggregated rides table that holds both Green and Uber rides in a normalised format.
- Run
analysis/create_aggregate_tables.sql
to aggregate and group ride counts by a number of variables. Refer to the file for more details. - The
exports
directory holds the resulting data from these tables in four CSV files.
NYC Prepared - NYC Zip Code Tabulation Areas