Skip to content

Using Deep Learning with Keras to predict the taxi fare, and build a dashboard for showing the implementation using Flask.

Notifications You must be signed in to change notification settings

jacobceles/ChicagoTaxiTrips-DeepLearning-Dashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chicago Taxi Trips Deep Learning with Dashboard

In this project, we have built a deep learning model on top of the infamous Chicago City Taxi Trips dataset using Keras, and also built a dashboard for seeing the model in action using Flask and HTML. The dataset itself is open for public use and holds information since 2013, amounting to more than 100 Million rows with each record being around 1024 bytes in size.

Architecture

  After looking at various patterns in the data, we decided that 'Trip Seconds', 'Trip Miles' and 'Month' would be useful for predicting the fare for a given year. Months account for seasonal spikes and dips in taxi usage. Trip seconds and trip miles are obvious additions, but we included both because even though two trips may be 5 minutes long, one may be taking local roads to a close location while the other may be taking a highway and be very far away. Therefore, there was not a clear correlation between the two rates. There are a few hyperparameters we needed to decide on when creating the model:

  1. Loss metric
  2. Number of hidden layers
  3. Number of nodes in the input and hidden layers
  Since the training of these massive datasets takes a very long time, we tested these parameters on a small chunk of data. By extracting just 100000 rows from the data, we were able to create multiple quick models that could be used to evaluate what the best parameters would be for a model using all the data. We decided to use the following architectures to create 12 possible candidate models:
  1. The different levels for number of hidden layers would be 1, 2, and 3.
  2. There are two loss metrics we considered: Mean Absolute Error, and Mean Squared Error.
  3. Lastly, we considered two options for number of nodes: 16 nodes in input, 32 nodes in hidden layers or 32 nodes in input, 64 nodes in hidden layers.

Tuning the Model

The performance of the models are as follows:

MAE vs MSE for 16 i/p nodes & 32 hidden layers nodes MAE vs MSE for 32 i/p nodes & 64 hidden layers nodes
MAE vs MSE for 16-32 MAE vs MSE for 32-64

There are 12 total models and the above tables divide the models by the number of nodes in the input and hidden layers. Surprisingly, the performance between the models with more nodes and fewer nodes were very comparable and overall, the model with fewer nodes performed better than the model with more nodes. This is likely due to over fitting due to too many nodes. Another interesting thing to note was that overall the models with 3 hidden layers performed the worst out of the three hidden layer possibilities. Additionally, although the models that optimize MSE had a lower MSE than the models that optimize MAE, the models with MAE had a higher percentage increase in performance in MAE than MSE did in MSE. Therefore, the MAE performed relatively superior.

Evaluating the Model

The model with the best performance is the model with fewer nodes, loss metric of MAE and 2 hidden layers. After looking at the data, this result actually makes a lot of sense. For the node count and hidden layer count, it is probably simply a matter of over fitting or under fitting the model and that combination seems to be just right with a mean absolute error of 1.329 and a mean squared error of 354.460. However, for the loss metric the advantage lies in the nature of the loss metric itself. Since MSE is squared, the model is punished for making larger mistakes. When a very large number is squared then the error increases by a lot and this is likely due to outliers in the data. There were some outliers in the data and that is why we went with mean absolute error. Mean absolute error is robust to outliers since it is the average of the absolute differences between the actual and predicted values. The outliers in our data are thus not magnified. The following is a loss graph for the selected model and a graph depicting the predicted values vs the actual values for the model:

Loss Graph for 16 i/p nodes & 32 hidden layers nodes Scatter Plot for 16 i/p nodes & 32 hidden layers nodes
Loss Graph for 16-32 Scatter plot for 16-32

Deploying the Model

Once the models were completed, Flask was used to deploy it. Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. Since Flask is python based it enabled using Keras for reading, loading, and executing the model as expected. The flow of deployment is as shown below:

Deployment Flowchart

The WSGI application (Flask app) acts as the gateway interface, and Nginx is used as the web server. The setup is designed to handle multiple requests concurrently. Once a request comes in flask process the endpoint and returns the result back to the user. If the request for prediction is raised, it passes the parameters into the model and responds back with the result.

Output

A full video demo of the UI can be viewed in the YouTube Video.

Demo UI

Other screenshots can be found here.

Contributors

About

Using Deep Learning with Keras to predict the taxi fare, and build a dashboard for showing the implementation using Flask.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published