-
The characteristics of the public dataset were examined. The dataset contains different type of things that effects the quality such as citric acid, pH, density etc. It was checked for missing data and the accuracy of data types in the columns. The correlation matrix and correlation heatmap were used to analyze the positive and negative correlations between the features.
-
When the Quality column was visualized with a bar graph, it was observed that there was an imbalanced classification problem. The dataset was balanced using the SMOTE method, and then feature scaling was applied.
-
The dataset was trained using the Logistic Regression, Decision Tree Classifier, Random Forest Classifier, Extra Tree Classifier, and LGBM Classifier methods. The results were compared based on accuracy and Cross Validation Score.
You can read the pdf file or you can download the ipynb and the dataset to create the same project and make it better. The dataset: https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009