Automotive Industry , Data Science, Machine Learning
Objective: Assuming I'm a data scientist in Car Dheko, my aim is to enhance the customer experience and streamline the pricing process by leveraging machine learning. I need to create an accurate and user-friendly streamlit tool that predicts the prices of used cars based on various features. This tool should be deployed as an interactive web application for both customers and sales representatives to use seamlessly.
Project Scope: I have historical data on used car prices from CarDekho, including various features such as make, model, year, fuel type, transmission type, and other relevant attributes from different cities. My task as a data scientist is to develop a machine learning model that can accurately predict the prices of used cars based on these features. The model should be integrated into a Streamlit-based web application to allow users to input car details and receive an estimated price instantly.
Data Cleaning and Preprocessing, Exploratory Data Analysis, Machine Learning Model Development, Price Prediction Techniques, Model Evaluation and Optimization, Model Deployment, Streamlit Application Development, Documentation and Reporting.
-
Data Preprocessing: Import and concatenate, Clean and preprocess the data, handling missing values, Standardising Data Formats, encoding categorical variables, and normalizing numerical features.
-
Exploratory Data Analysis (EDA): Descriptive Statistics, Data Visualization and Feature Selection
-
Model Development: train-Test Split, Model Selection(Linear Regression, Decision Trees, Random Forests, Gradient Boosting Machines, etc.), Model Training and Hyperparameter Tuning.
-
Model Evaluation: Evaluate models using Mean Absolute Error (MAE), Mean Squared Error (MSE), R-squared as the primary metric. Use cross-validation to ensure model robustness and compare Models to find best performing model.
-
Optimization: Feature Engineering and Regularization(Apply regularization techniques to prevent overfitting - Lasso (L1) and Ridge (L2) regularization)
-
Prediction: Use the best-performing model to predict the Price of the used Car in the market.
-
Deployment: Streamlit Application(Deploy the final model using Streamlit to create an interactive web application - Allow users to input car features and get real-time price predictions) and User Interface Design(Ensure the application is user-friendly and intuitive.)
The Dataset contains multiple excel files, each represents its city, columns in each excel gives you an overview of each car, its details, specification and available features.
Clone the project
git clone https://github.com/Vijay6383/Cardheko_price_prediction_app.git
Install dependencies
pip install scikit-learn, scipy, seaborn, ast, joblib
Run App
streamlit run webApp.py
- Price Prediction
- Machine Learning
- Data Preprocessing
- Regression
- Python
- Pandas
- Scikit-Learn
- Exploratory Data Analysis (EDA)
- Streamlit, Model Deployment