Authors: Abdul Safdar, Karlygash Zhakupbayeva, Tengwei Wang
That data set that we used in this project can be found here). It consists of physicochemical properties from various wine samples. These features were obtained through rigorous testing and are used to assess the quality of wine.The primary goal of this analysis was to perform a binary classification to determine wine quality. Wine quality scores greater than 5 were labeled as 1 (good quality), while scores of 5 or less were labeled as 0 (not good quality). This approach simplifies the multi-class problem into a more straightforward binary classification task, facilitating the use of a decision tree classifier. We utilized a Decision Tree Classifier to model the relationship between the physicochemical properties and the binary quality rating. The model was fine-tuned through the optimization of the max_depth parameter, which controls the complexity of the decision tree.
In our report, we fit a simple decision tree classifier model and evaluated the best max depth hyperparameter for our decision tree model. Our results indicated that the most important features in predicting wine quality were alcohol, sulphates and volatile acidity. The decision tree model provided a transparent method for understanding the complex interplay of factors that determine wine quality. The use of a binary classification system allowed for clear distinctions between good and poor-quality wines, making it accessible for both experts and casual consumers to appreciate the subtleties of wine evaluation. More information about the specific analysis and results can be found in the report linked below.
The final report can be viewed here
To replicate this analysis on your own machine, first clone this repository to your local machine using git clone. Follow the following steps to run this project.
Install Docker.
IMPORTANT : Make sure Docker Desktop is running.
Then open terminal and run the following in the root of the repository in your terminal :
docker compose up
Wait until docker finishing pulling and running the image. Copy and paste the url from output information, which is like "http://127.0.0.1:8888/lab?token=xxxxxxxxxx", into your web browser. An example of this link is provided below, and is highlighted.
To run the analysis, run the following commands by opening terminal and then run the following in the root of the repository in terminal in the virtual docker container. In order to clean the report.
make clean
Next to generate the report, enter the following in the root of the repository in the virtual docker container.
make all
To view the analysis once the analysis has rendered, navigate to the reports folder on the left. Click on wine_quality_analysis.html and then click on Trust HTML. You can now view the analysis report.
To run the function tests, enter the following in the root of the repository.
pytest
To stop and clean up the container, you would type Ctrl + C in the terminal where you entered docker compose up, and then type
docker compose rm
conda
(version 24.9.1 or higher)conda-lock
(version 2.5.7 or higher)jupyterlab
(version 4.2.4 or higher)mamba
(version 1.5.11 or higher)- Python and packages listed in
environment.yaml
- Docker
- Working on a new branch, update the
environment.yaml
file. - In terminal, enter
conda-lock -k explicit --file environment.yml -p linux-64
to rebuild theconda-linux-64.lock
file. - Rebuild the docker image in your local terminal. On a Mac enter :
docker build --platform=linux/amd64 --tag <name_of_test_image> .
On other OS :docker build --tag <name_of_test_image> .
Note: these instructions will likely vary depending on your specific OS and chip. - Update
docker-compose.yml
file to use the newly built container image. - Push your changes to GitHub.
- Open a pull request, to have your changes merged to the main branch.
The project report is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/, and the software in this repository is licensed under the MIT License. More information can be found in the LICENSE.md file in the repository.
UCI Machine Learning Repository. Wine Quality Dataset. Available at: https://archive.ics.uci.edu/ml/datasets/wine+quality
Scikit-learn Documentation. Decision Trees. Available at: https://scikit-learn.org/stable/modules/tree.html
Kolhatkar, V. (2024). DSCI 571 Supervised Learning I Lecture 2 ML Fundamentals https://pages.github.ubc.ca/mds-2024-25/DSCI_571_sup-learn-1_students/lectures/notes/02_ml-fundamentals.html
Ostblom, J. (2024). DSCI 573 Feature and Model Selection, Lecture 1 Classification Metrics https://pages.github.ubc.ca/mds-2024-25/DSCI_573_feat-model-select_students/lectures/01_classification-metrics.html
VanderPlas, J., & Satyanarayan, A. (2018). Altair: Declarative Visualization in Python. https://altair-viz.github.io/