URL for all required datasets to run these notebooks can be found here.
Authors: Ahmad Chaiban, Dusan Sovilj, Hazem Soliman, Geoff Salmon, Xiaodong Lin
There are several notebooks of interest for our study, they are highlighted below,
initial_notebook.ipynb
is where some early investigation was done.Dataset_notebook.ipynb
contains the initial look at 25 datasets.Dataset_20_phase_1_tests.ipynb
&Dataset_20_phase_2_ML.ipynb
, these notebooks pertain to the analysis and ML done on A.K. Singh's dataset.dataset_surtur.ipynb
deals with our custom dataset GAWAIN. It continues its creation, performs some rounds of ML on it and saves a final copy of it wich can be found here.ML_table.ipynb
is the final notebook where XGBoost was trained on the different feature types and their combinations.
As for the folders addon_features
, data_construction
, and img_extract
, these folders contain the scripts that were used to create the datasets and how those features were extracted. Specifically the data_construction
folder contains the necessary pipeline to generate the final dataset GAWAIN
from the study.