Skip to content

A big data project for predicting prices of Uber/Lyft rides depending on the weather

License

Notifications You must be signed in to change notification settings

MTShannon/scala-spark-cab-rides-predictions

 
 

Repository files navigation

scala-spark-cab-rides-predictions

A big data project for predicting prices of Uber/Lyft rides depending on the weather.

Dataset was compiled and uploaded to Kaggle. Can be found here https://www.kaggle.com/ravi72munde/uber-lyft-cab-prices

Contributors:

  • Ravi Munde
  • Karan Barai

Project Structure :

  • cab-price-connector - Data Collection Scala Project
  • Databricks_Prediction_code.html - Anlysis and Spark Model(From Databricks.com)
  • Cab_Price_Prediction.ipynb - Random Forrest Model in Python

Data Model:

CabPrice

root
    |- cab_type : String
    |- destination : String
    |- distance: Float
    |- id: String
    |- name: String
    |- price: Float
    |- product_id: String
    |- source: String
    |- surge_multiplier: String
    |- time_stamp:Long

Weather

root
    |- clouds : Float
    |- humidity : Float
    |- location : Float
    |- location : String
    |- temp : String
    |- pressure : Float
    |- wind : Float

Actor System

Sample log of Actor System Running on EC2

INFO [CabRideSystem-akka.actor.default-dispatcher-2] a.DynamoActor - received 12 number of weather records
INFO [CabRideSystem-akka.actor.default-dispatcher-4] a.DynamoActor - Weather Batch processed on DynamoDB
INFO [CabRideSystem-akka.actor.default-dispatcher-9] a.DynamoActor - received 156 number of cab price records
INFO [CabRideSystem-akka.actor.default-dispatcher-8] a.DynamoActor - Cab Prices Batch processed on DynamoDB
INFO [CabRideSystem-akka.actor.default-dispatcher-7] a.Master - Cab ride data piped to Dynamo Actor
INFO [CabRideSystem-akka.actor.default-dispatcher-13] a.DynamoActor - received 156 number of cab price records
INFO [CabRideSystem-akka.actor.default-dispatcher-15] a.DynamoActor - Cab Prices Batch processed on DynamoDB

NOTE: AWS Creditials need to be put in environment vairables

Model Evaluation Matrices

  • Regression R_squared = 0.62
  • Random Forrest Regression's Price Prediction Accuracy : 92.79 %
  • Random Forrest Classification Surge Prediction Accuracy: 77.69 %

Confusion Matrix for the Classifier
drawing

About

A big data project for predicting prices of Uber/Lyft rides depending on the weather

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 91.4%
  • Java 8.6%