Iris Flower Classification Model

Overview

This project aims to classify Iris flowers into three species—setosa, versicolor, and virginica—based on their sepal and petal measurements using machine learning techniques. The dataset comprises 150 samples evenly distributed among these species, making it a standard benchmark for introductory classification tasks.

Models Used

Two primary models were employed:

Logistic Regression: A linear model suitable for binary and multi-class classification tasks.
Random Forest Classifier: An ensemble learning method effective for handling complex classification problems.

Data Preprocessing

Data Loading

The Iris dataset was loaded from a CSV file containing 150 records and 5 attributes: sepal length, sepal width, petal length, petal width, and species.

Exploratory Data Analysis (EDA)

Summary Statistics: Provided insights into the distribution and variation of sepal and petal measurements.
Pair Plot: Visualized relationships between features across different species.
Correlation Heatmap: Showed feature correlations, aiding in feature selection.

Models Training and Evaluation

Model Training

Splitting Data: The dataset was split into training (80%) and testing (20%) sets.
Logistic Regression: Trained a linear model for classification.
Random Forest Classifier: Trained an ensemble model to handle complex relationships.

Model Evaluation

Best Parameters: The optimal parameters found for the Random Forest Classifier were {'max_depth': 20, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 200}. These parameters were selected based on cross-validation to maximize accuracy.
Best Random Forest Accuracy: The model achieved an accuracy of 100% on the test dataset, indicating that it correctly classified all Iris flowers.

Interpretation of Classification Report

The classification report provides a detailed breakdown of how well the model performed for each species:

Precision: Measures the accuracy of positive predictions.
Recall: Indicates how well the model captures instances of a class.
F1-score: Harmonic mean of precision and recall, providing a single metric to evaluate the model's performance.
Support: Number of samples in each class.

For example:

Iris-setosa: The model correctly classified all 10 samples of Iris-setosa, achieving perfect precision, recall, and F1-score.
Iris-versicolor: Similarly, all 9 samples of Iris-versicolor were correctly classified.
Iris-virginica: All 11 samples of Iris-virginica were also classified correctly.

The overall accuracy of 100% indicates that the model successfully learned the patterns in the data and accurately classified the Iris flowers into their respective species.

Data Visualization

Pair Plot: Visualizes relationships between sepal length, sepal width, petal length, and petal width across different species.
Correlation Heatmap: Shows the correlation coefficients between these features, aiding in feature selection and understanding feature importance.

Findings

Data Exploration:

Summary statistics provided insights into the distribution and variation of sepal and petal measurements.
Pair plots visually represented the clustering of different species based on their measurements.
The correlation heatmap highlighted significant relationships between certain features, influencing classification accuracy.

Model Performance:

Both Logistic Regression and Random Forest Classifier achieved perfect accuracy of 100% on the test dataset.
Precision, recall, and F1-score metrics confirmed the models' ability to effectively distinguish between Iris species.

Output

The output from the models includes:

Best Parameters for Random Forest Classifier: {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 10, 'n_estimators': 50}
Best Random Forest Accuracy: 1.0
Classification Reports for Logistic Regression and Random Forest Classifier, showing precision, recall, and F1-score metrics for each Iris species.

Conclusion

This project demonstrated the application of machine learning models to classify Iris flowers based on their morphological measurements with high accuracy. The selected models, Logistic Regression and Random Forest Classifier, performed exceptionally well, showcasing their effectiveness for such classification tasks. By leveraging data preprocessing, visualization, and thorough evaluation techniques, this project provides a robust framework for introductory classification tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Images		Images
IrisFlowModel		IrisFlowModel
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iris Flower Classification Model

Table of Contents

Overview

Models Used

Data Preprocessing

Data Loading

Exploratory Data Analysis (EDA)

Models Training and Evaluation

Model Training

Model Evaluation

Interpretation of Classification Report

Data Visualization

Findings

Data Exploration:

Model Performance:

Output

Conclusion

About

Releases

Packages

Languages

License

noturlee/Iris-DataAnalyis

Folders and files

Latest commit

History

Repository files navigation

Iris Flower Classification Model

Table of Contents

Overview

Models Used

Data Preprocessing

Data Loading

Exploratory Data Analysis (EDA)

Models Training and Evaluation

Model Training

Model Evaluation

Interpretation of Classification Report

Data Visualization

Findings

Data Exploration:

Model Performance:

Output

Conclusion

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages