This is practice project using Kaggle data on NYC Taxi Trips in 2015.
The goal of this project is to predict the duration of taxi rides in NYC. Prediction is based on features like trip coordinates or pickup date and time. The data comes in the shape of 1.5 million training observations and 630k test observation. Each row contains one taxi trip.
We will first study and visualise the original data, engineer new features and examine potential outliers. Then we add two external data sets on the NYC weather and on the theoretically fastest routes. We visualise and analyse the new features within these data sets and their impact on the target trip_duration values.
Then, we will view this challenge as a classification problem and finish this project with a XGBoost prediction model.