Diamond Price Predictor web application predicts diamond price using Regression models. The implementation includes Linear Regression, Lasso, Ridge, ElasticNet, DecisionTree Regressor, RandomForest Regressor and KNeighbors Regressor. The selection process involved a comprehensive comparison, evaluation, and cross-validation to identify the most effective model. After rigorous cross-validation and metric analysis, the Random Forest Regressor model emerged as the optimal choice, exhibiting superior performance with an accuracy of 97%.
The Random Forest Regressor model, deemed the most reliable, has been deployed in a flask web application using AWS. The web application utilizes the model.pkl
file, which contains the trained Random Forest Regressor model, offering a user-friendly interface for diamond price prediction.
Diamond_Predictor.mp4
- Instances: 193573
- Attributes: 10 Diamonds-related attributes and 1 output attribute (Price)
- Dataset Source Link: https://www.kaggle.com/competitions/playground-series-s3e8/data?select=train.csv
- Programming Language: Python
- Libraries: Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn , Pickle, Warnings
- Web Framework: Flask
- Frontend: HTML, CSS
- Deployment: AWS Elastic Beanstalk, AWS CodePipeline
Hosted this web application using Flask and deployed on AWS
-
Clone the GitHub Repository:
- Open your terminal or command prompt.
- Navigate to the directory where you want to clone the repository.
cd path/to/your/directory # Run the following command to clone the GitHub repository: git clone https://github.com/sai-manas/Diamond_Price_Prediction.git cd Diabetes_Predictor_ML
-
Install Git (if not already installed):
- If you don't have Git installed, you can download and install it from the official Git website.
-
Delete Artifacts:
- Before initiating the training pipeline, it's important to ensure a clean slate by deleting all files from the 'artifacts' folder. This step helps prevent any conflicts or issues during the training process.
Option 1: Manual Deletion
- Manually delete all files inside the 'artifacts' folder. You can do this through your file explorer or terminal.
Option 2: Delete Using Command Line (Linux/Mac):
rm -rf artifacts/*
Option 3: Delete Using Command Prompt (Windows):
del /Q artifacts\*
Note: The 'model.pkl' file will be generated during the training pipeline and is not uploaded to GitHub due to its large size (1.8 GB). Therefore, it is advisable to delete all files from the 'artifacts' folder and generate the files locally by following these steps
-
Install Python and/or Conda (if not already installed):
-
If you haven't already, make sure you have Python installed on your system. You can download it from the official Python website.
-
Alternatively, if you prefer using Conda, you can install it from the official Conda website.
-
-
Open Terminal or Command Prompt (or VS Code):
- Open your terminal or command prompt. If you prefer using Visual Studio Code (VS Code), you can open it in the project directory:
code .
-
Navigate to the Directory:
- Navigate to the cloned repository:
cd Diamond_Price_Prediction
-
Create and Activate Virtual Environment (Optional but recommended):
- It's a good practice to create a virtual environment to isolate the dependencies of your project. You can create a virtual environment using the following command:
# Using conda (if you have conda installed): conda create --name your_env_name python=3.8 conda activate your_env_name # OR using Python's built-in venv (if you prefer): python -m venv your_env_name # On Windows: your_env_name\Scripts\activate # On macOS and Linux: source your_env_name/bin/activate
-
Install Required Packages using setup.py:
- Navigate to the root folder of your cloned repository and run the following command to install the required Python packages listed in the setup.py file:
python setup.py install
-
Run Training Pipeline:
- Run the training pipeline script in the terminal. This process will generate necessary files, such as pickles and CSVs, inside the artifacts folder:
python src/training_pipeline.py
-
Run Flask Application:
- Once the training is complete, you can start the Flask application. In your repository's root folder, you should typically have a file named
application.py
, which is the main Flask application file:
python application.py
- Once the training is complete, you can start the Flask application. In your repository's root folder, you should typically have a file named
-
Access the Application:
- Your Flask application should now be running. You can access it in your web browser by navigating to http://localhost:5000 or the URL provided by your application.
-
Follow the above-mentioned steps from 1 to 10 and push your files to GitHub, then proceed to the next step outlined below.
-
Deploy application using ELastic Beanstalk and Code Pipeline in AWS
- Refer to the steps outlined in this article for deploying a web application on AWS from GitHub as the source. URL - https://dev.to/wardaliaqat01/cicd-pipeline-hands-on-aws-code-pipeline-elastic-beanstalk-github-35n3