AI-Final-Project

Final project in Prof. Louzoun's Artificial Intelligence course.

Abstract

Mushrooms data set was analyzed using artificial intelligence methods to classify the samples by their odor. Three approaches were used. The first approach was to cluster the data, so that each cluster represents a different odor. The second approach was to build a machine that learns the mushroom family based on its features. The third approach was to repeat the other two techniques after creating better features. The goal was to find the best approach among this techniques. Out of the three approaches tested, applying the second approach with SVM RBF kernel with a box constraint of 0.8 was found to be the best with an average F1-Score Score of 0.86449. Then, four approaches for imputing missing values were used on the samples with the missing data of the mushrooms. The imputed data was classified by SVM RBF kernel with a box constraint of 0.8 which found to be the best classification method for the data. Comparison of the results revealed that imputing the missing values with the median of their column is the best imputation approach. At least, the fit between the classification and the real labels was measured and the average F1-Score found to be 0.51726.

Data Sets

The data may be found here:

Instructions

Running the python code for the Unsupervised Methods may be done directly from main_file.py. For the Supervised classification scripts there are instructions bellow. Than for the data with the missing values one need to run part_2_main_file.py.

Note: Running the files must be don according to that order since some of the methods requires files created in oder methods (e.g. some of the classification methods uses the dimension reduction results - as detailed in the pdf paper).

Instructions for classification methods

By running classification you can generate all of our classifications methods. When you ran the file you will be asked to enter the dimension reduction method of your choosing, if you wish to use no dimension reduction methods you can just write normal. The program then will create your classifications additionally, it will run statistical tests between all the classifications to find the best features for a given method and also to find the best method one.

Important note: To save the results from the classification (for example the f1 score) you need to create appropriate directories. To make it easy one can just open one of the classification folders and copy its content. By doing that he will be able to run each dimension method he will want (that is given he have the dimension reduction file).

creating_the_plot:

To create the classifications plot by yourself, one will only need to run the file creating_the_plot.

best champion:

To find the best classification method before and after dimension reduction, one will only need to run the file best champion.

Python Modules

The main modules used on this project are:

Sklearn
Matplotlib
Skfuzzy
Numpy
Pandas
Scipy
Yellowbrick
Torch
Xgboost
NetworkX

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
Classification Champions		Classification Champions
Figures		Figures
Gradient_Boosting		Gradient_Boosting
MLP		MLP
Random_Forest		Random_Forest
SGD		SGD
XGBoost		XGBoost
__pycache__		__pycache__
classifications_CMDS		classifications_CMDS
classifications_ICA		classifications_ICA
classifications_PCA		classifications_PCA
classifications_ae		classifications_ae
classifications_normal		classifications_normal
clustering results after dimension reduction		clustering results after dimension reduction
clustering results labelencoder backup		clustering results labelencoder backup
clustering results onehot backup		clustering results onehot backup
clustering results with silhouette scores backup		clustering results with silhouette scores backup
clustering results		clustering results
compare approaches		compare approaches
dimension_reduction		dimension_reduction
missing data		missing data
svm_linear		svm_linear
svm_poly		svm_poly
svm_rbf		svm_rbf
svm_sigmoid		svm_sigmoid
.gitignore		.gitignore
Classification.py		Classification.py
README.md		README.md
autoencoder.py		autoencoder.py
best_champion.py		best_champion.py
classification_statistical_test_results.txt		classification_statistical_test_results.txt
clustering.py		clustering.py
clustering_nn.csv		clustering_nn.csv
creating_the_plot.py		creating_the_plot.py
data with anomalies.csv		data with anomalies.csv
described_data.csv		described_data.csv
dimension_reduction.py		dimension_reduction.py
dimension_reduction.zip		dimension_reduction.zip
main_file.py		main_file.py
mushrooms_data.txt		mushrooms_data.txt
mushrooms_data_missing.txt		mushrooms_data_missing.txt
mushrooms_readme.txt		mushrooms_readme.txt
nn.py		nn.py
ordinal_mushrooms_data.csv		ordinal_mushrooms_data.csv
part_2_main_file.py		part_2_main_file.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Final-Project

Abstract

Data Sets

Instructions

Instructions for classification methods

Python Modules

About

Releases

Packages

Languages

roysgitprojects/AI-Final-Project

Folders and files

Latest commit

History

Repository files navigation

AI-Final-Project

Abstract

Data Sets

Instructions

Instructions for classification methods

Python Modules

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages