Skip to content

Estimation of used car prices using data on 100,000 used cars in the UK

Notifications You must be signed in to change notification settings

polaternez/predicting-used-car-price-UK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UK Used Car Price Estimator: Project Overview

This project aims to create a predictive tool for estimating used car prices, catering specifically to new car buyers

  • Utilized a comprehensive dataset consisting of 100,000 UK used car records sourced from Kaggle.
  • Conducted thorough Exploratory Data Analysis (EDA) to gain insights into the dataset's characteristics.
  • Employed advanced feature engineering techniques to enhance model performance.
  • Trained multiple machine learning models and rigorously evaluated their performance using cross-validation.
  • Developed a user-friendly API using Flask to provide seamless access to the predictive tool.

Code and Resources

Python Version: 3.9
Packages: numpy, pandas, matplotlib, seaborn, scikit-learn, xgboost, flask, json, pickle
Flask API Setup:

  • pip install -r requirements.txt
  • conda env create -n <ENVNAME> -f environment.yaml (Anaconda Environment)

Dataset: https://www.kaggle.com/datasets/adityadesai13/used-car-dataset-ford-and-mercedes

Getting Data

The project leverages the 100,000 UK Used Car Dataset obtained from Kaggle. This dataset consists of 100,000 listings for used cars from UK, carefully organized into separate files based on the car manufacturer. Each file provides detailed information about each car, including price, transmission type, mileage, fuel type, road tax band, miles per gallon (MPG), and engine size.

EDA

Conducted comprehensive EDA to understand the data distribution and relationships. Key highlights from the analysis include:

alt text alt text alt text alt text

Model Building

  • Split the data into train and test sets with a test size of 20%
  • Applied standardization to scale the train and test sets.
  • Utilizing cross-validation, we trained multiple models, evaluating their performance based on both Mean Absolute Error (MAE) and training time. After thorough analysis, the XGBoost model was selected due to its superior performance in terms of both predictive accuracy and efficiency
  • Fine-tune of the XGBoost model for better performance.

After cross-validation, the models showed the following performances:

alt text

Productionization

Created a user interface using Flask. The API endpoint receives requests and returns estimated vehicle prices.

alt text

About

Estimation of used car prices using data on 100,000 used cars in the UK

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages