Classification project in TTT4275

Authors: Bratvold, Torbjørn and Lima-Eriksen, Leik

Subject: Estimation, Detection, and Classification (TTT4275)

Date: April 2020

The project is divided into two parts:

IRIS dataset: The performance of a linear classifier is thoroughly analyzed. Here, we take a deeper look on the impact of choosing the right samples for testing and training, and how creating a different partitioning may trick us into thinking that the classifier perfoms better when it in fact performs just as good.

We then evaluate the performance after excluding some of the most overlapping features. Surprisingly, it turns out that the classifier has the same error rate but requires more computations. The reasons why are discussed in detail in the report.
MNIST dataset: Different variants of the K-Nearest Neighbours (KNN) classifier is thoroughly analyzed in terms of error rates, confusion matrices and computation times. We start with training a KNN classifier with K=1 on the dataset. Then we compare the performance against a KNN classifier with K=4. It turns out that the error rate is marginally better for K=4.

In the last part we first apply a K-means clustering on the samples. Then we run a KNN classifier with K=1 on the test data. This results in 10 times less errors (5% error rate compared to 51%) and a lot less computations required.

Prerequisities

Please install the prerequisities before running the scripts:

pip3 install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
images		images
iris_report		iris_report
mnist_datasets		mnist_datasets
mnist_report		mnist_report
.gitignore		.gitignore
README.md		README.md
iris.py		iris.py
iris_dataset.csv		iris_dataset.csv
iris_scatter_plot.py		iris_scatter_plot.py
mnist.py		mnist.py
requirements.txt		requirements.txt
squashing_functions.py		squashing_functions.py