This repository demonstrates the use of popular machine learning classifiers to classify the MNIST dataset, a collection of handwritten digits. The MNIST dataset is widely used as a benchmark in the field of machine learning, making it an excellent starting point for exploring various classifiers.
The following classifiers are implemented and compared in this repository:
-
Logistic Regression: A linear classifier that is simple yet effective.
-
k-Nearest Neighbors (k-NN): A non-parametric method based on the similarity of data points.
-
Support Vector Machine (SVM): A powerful classifier that works well for both linear and non-linear data.
-
Varius Ensemblers: Ensemble learning method that builds a multitude of decision trees.
The MNIST dataset consists of 28x28 pixel grayscale images of handwritten digits (0 through 9). It is a classic dataset for introducing image classification concepts.
Download the data: https://www.kaggle.com/datasets/zalando-research/fashionmnist
- Install Dependencies:
pip install -r requirements.txt
Each notebook contains detailed instructions and explanations for the following steps:
- Loading and Preprocessing the MNIST dataset.
- Implementing and training the respective classifier.
- Evaluating the model's performance.
- Fine-tuning and optimizing parameters (where applicable).
The results and comparative analysis of each classifier are provided in the notebooks. Feel free to experiment with different hyperparameters, preprocessing techniques, or even explore additional classifiers.
In the course of our analysis, the Support Vector Machine (SVM) with a radial basis function (rbf) kernel emerged as the top-performing classifier, aligning with expectations. This achievement was further enhanced by leveraging Principal Component Analysis (PCA) for feature selection/engineering. By retaining 95% of the variance in the data, we managed to significantly improve training speed compared to using all features, demonstrating the effectiveness of dimensionality reduction in enhancing the SVM model's performance.
This repository serves as a practical guide for implementing and comparing various classifiers on the MNIST dataset. This project offers a hands-on experience with popular models.