The following repository contains all metrails for repoducing the paper "To tune or not to tune? A meta-leaning approach for recommending important hyperparameters":
-
the scripts for collecting performance data of 6 machine learning algorithms on 200 classification tasks from OpenML environment.
-
the collected performance data of SVM, Decision Tree, Random Forest, AdaBoost, Gradient Boosting and Extra Trees Classifiers.
-
Several notebooks that each performs one experiment and conducts the results.
-
Based on PerformanceData, created new datasets that all are in output_csv folders.
-
tools for:
- Importing and modifying the collected data
- Searching correlation between the dataset metafeatures and classifier performances.
- Conducting statistical tests to compare performance of the classifiers over the tasks.
- Computing the best value for each important hyperparameter.
- Computing Wilcoxon test for verifing the result.
-
script for extracting metafeatures of the datasets
-
script for performing fANOVA on the performance data
from DataCollection.functions import *
path_to_datasets = 'Datasets/'
classification_per_algorithm(path=path_to_datasets, algorithm='DecisionTree')
from fANOVA.fanova_functions import *
do_fanova(dataset_name='PerformanceData/AB_results_total.csv', algorithm='AdaBoost')
from tools.metafeatures import *
extract_for_all(path_to_datasets)
from Tools.database import Database
db = Database()
per_dataset_acc = db.get_per_dataset_accuracies()
per_dataset_acc.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
dataset | AB | ET | RF | DT | GB | SVM | |
---|---|---|---|---|---|---|---|
0 | AP_Breast_Omentum.csv | 0.981060 | 0.976235 | 0.976462 | 0.973912 | 0.983555 | 0.914538 |
1 | AP_Breast_Prostate.csv | 0.995238 | 0.995238 | 0.995238 | 0.995238 | 0.995238 | 0.961498 |
2 | AP_Endometrium_Lung.csv | 0.968363 | 0.958392 | 0.957018 | 0.929240 | 0.968363 | 0.894591 |
3 | AP_Endometrium_Prostate.csv | 0.992857 | 0.992857 | 0.992857 | 1.000000 | 1.000000 | 0.984615 |
4 | AP_Endometrium_Uterus.csv | 0.854854 | 0.837953 | 0.859561 | 0.827924 | 0.860409 | 0.758801 |
metafeatures = db.get_metafeatures()