In this project, I have used a dataset freely available on Kaggle to build a machine learning model for classifying mushrooms as either edible or poisonous.
The objective of this project is to develop a machine learning model that can accurately classify mushrooms based on their characteristics as either safe to eat or potentially poisonous.
In the process of solving this problem, I employed various key machine learning techniques during the dataset preprocessing phase, including:
-
Categorical Column Encoding: I utilized the OneHotEncoder technique to convert categorical columns into a format suitable for machine learning.
-
Handling Imbalanced Data: To address the issue of class imbalance, I applied the Synthetic Minority Over-sampling Technique (SMOTE) to create a more balanced dataset.
-
Feature Selection: I employed Recursive Feature Selection (RFE) to select the most relevant features for the classification task, improving the model's efficiency.
Once the dataset was successfully preprocessed, I evaluated its performance using several prominent machine learning models for classification, including:
- Logistic Regression
- K-Nearest Neighbors
- Support Vector Machines
- Naive Bayes
- Decision Trees
- Random Forest
- XGBoost
As can be seen above, I explored two Ensemble Learning algorithms: Random Forest (Bagging) and XGBoost (Boosting), to enhance the classification results.
I am pleased to report that our models achieved nearly 100% accuracy during the evaluation phase.