A big data project for predicting prices of Uber/Lyft rides depending on the weather.
Dataset was compiled and uploaded to Kaggle. Can be found here https://www.kaggle.com/ravi72munde/uber-lyft-cab-prices
- Ravi Munde
- Karan Barai
- cab-price-connector - Data Collection Scala Project
- Databricks_Prediction_code.html - Anlysis and Spark Model(From Databricks.com)
- Cab_Price_Prediction.ipynb - Random Forrest Model in Python
root
|- cab_type : String
|- destination : String
|- distance: Float
|- id: String
|- name: String
|- price: Float
|- product_id: String
|- source: String
|- surge_multiplier: String
|- time_stamp:Long
root
|- clouds : Float
|- humidity : Float
|- location : Float
|- location : String
|- temp : String
|- pressure : Float
|- wind : Float
Sample log of Actor System Running on EC2
INFO [CabRideSystem-akka.actor.default-dispatcher-2] a.DynamoActor - received 12 number of weather records
INFO [CabRideSystem-akka.actor.default-dispatcher-4] a.DynamoActor - Weather Batch processed on DynamoDB
INFO [CabRideSystem-akka.actor.default-dispatcher-9] a.DynamoActor - received 156 number of cab price records
INFO [CabRideSystem-akka.actor.default-dispatcher-8] a.DynamoActor - Cab Prices Batch processed on DynamoDB
INFO [CabRideSystem-akka.actor.default-dispatcher-7] a.Master - Cab ride data piped to Dynamo Actor
INFO [CabRideSystem-akka.actor.default-dispatcher-13] a.DynamoActor - received 156 number of cab price records
INFO [CabRideSystem-akka.actor.default-dispatcher-15] a.DynamoActor - Cab Prices Batch processed on DynamoDB
NOTE: AWS Creditials need to be put in environment vairables
- Regression R_squared = 0.62
- Random Forrest Regression's Price Prediction Accuracy : 92.79 %
- Random Forrest Classification Surge Prediction Accuracy: 77.69 %