Datasource was a tabular representation with extracted audio features:
I chose a Bayesian search method to find the best hyperparameters for each algorithm.
Results were compared for multiple ML algorithms and for two different audio sample lengths.
The best accuracy (91,24%) was achieved by the LightGBM model trained on 3-second audio.
For the second part of the project I used Mel-spectrogram representation of audio.
I compared many CNN architectures to choose the best one, which I present below and which achieved an accuracy level of 85,06%.