Insect sound classifier: InsectEffNet

Deep learning classification for insect sounds (grasshoppers, crickets, cicadas) developed by the winning team of the Capgemini Global Data Science Challenge 2023.

The classifier was trained to identify 66 species. The winning team (team name "It is a bug", Capgemini I&D Austria) developed a solution and training pipeline based on EfficientNet applied to audio spectrograms. This repository contains code to create, train and evaluate such ML models.

This README file provides an overview of the model, its functionality and how to use it. Refer to README files located in the sub-directories for in-depth explanations.

Setup

Repository Structure

This is the expected structure to run the model successfully. Paths with no file ending are folders.

class_weights                       Directory containing different class weights, based on various metrics.
data                                Directory containing train, validation and test data.
notebooks/
  01_preprocess_waves.ipynb         Notebook for preprocessing sound data to uniform length. 
  02_classweights.ipynb             Notebook that calculates various class weights. 
  03_scan_lr.ipynb                  Notebook that trains the model multiple times with different learning rates.
  04_run_training.ipynb             Notebook that trains and evaluates the data. 
  05_scan_parameters.ipynb          Notebook that loops over multiple parameters for training
shell_scripts                       Directory that contains 2 shell scripts, to help reduce costs on AWS.
src/
  custom/
    __init__.py
    data.py
    eval.py
    net.py
    trainer.py
    utils.py
  config.py                         Imported libraries and utilities
  gdsc_utils.py                     Imported libraries and utilities
requirements.txt

Usage

Clone this repository.
Create a Python virtual environment in which to do this work, e.g. using Conda or Virtualenv. (The code has been tested with Python 3.8 on Ubuntu, using virtualenv to create the virtual environment: python -m venv venv_itisabug)
After acitivating your virtual environment, install the necessary Python packages specified in requirements.txt, e.g. using pip install -r requirements.txt
Make sure that the data folder is set up correctly (download the needed data, and create folders as needed). See the README.file inside the data subfolder for full instructions.
Navigate to the notebooks directory and read the README.
Decide which hyperparameters to use, whether to use a pretrained model, etc.
Run the necessary notebooks including 04_run_training.

Please be warned that some config settings and file paths might need to be adapted, depending on your OS and environment setup.

Methodology

Data

The model is expecting waveform files in the data folder together with a metadata csv file, that contains the exact filename, path and label for training/validation data and filename, path for test data. Crucially, all data is supposed to have the same sampling frequency, but can vary in length. We split every audio file into smaller windows of uniform length. We made this decision since we noticed a big deviation of audio length between our training data files and the spectrogram approach relies on audio files of the same length.

The notebook 01_preprocess_waves contains the steps needed to prepare a dataset for use.

Model Description

The ML model is built using PyTorch Lightning and utilizes pre-trained models as part of its evaluation. It is based on the spectrogram approach of audio classification i.e. transforming raw waveform data into Mel spectrograms and then solving the resulting image classification problem. We use the small, on ImageNet21k pre-trained, EfficientNetv2 architecture to solve the computer vision problem, which is as the name implies quite efficient to fine-tune and run. EfficientNetV2-S consists of ~20 million parameters, thus achieves fast training speed while maintaining state of the art performance. Link to the supporting research: EfficientNetV2: Smaller Models and Faster Training.

Class Imbalance and Augmentations

To counter class imbalance, custom class weights can be passed to the CrossEntropyLoss in Pytorch, allowing to weigh underrepresented classes higher or overrepresented ones lower, depending on how the weights are calculated. For further details, we refer to the 02_classweights notebook. To further reduce overconfident model predictions during training, we use label smoothing of 0.1 in the CrossEntropyLoss, however, the value can be tuned as necessary.

As data augmentations, we differentiate between augmentations that are applied on waveforms and on spectrograms. All augmentations are applied on the fly during training with a pre-defined probability per sample or batch. The most performant results have been achieved with moderate levels of augmentations, with 10-15% chance of being applied to solely waveforms. However, the options to additionally include the spectrogram augmentations listed below are still available in all training notebooks.

Waveform:

OneOf(Noise Injection, Gaussian Noise, Pink Noise)
Impulse response, i.e. to model reverb and room effects

Spectrogram:

Mixup
OneOf(MaskFrequency, MaskTime)

Pretrained vs Trained Model

Decide if you want to fine-tune a pretrained model, or train a model from scratch. This and other hyperparameters can be set in the 04_run_training notebook.

Running the 04_run_training notebook creates sub-folders for each training run, containing model checkpoints, hyperparameter savefiles, events.out.tfevents for logging, two prediction csv and other useful files. Refer to the notebooks README.

Model Evaluation

We evaluate a single audio file by calculating the prediction of each uniform subfile, then averaging all predictions for that singular file. Finally, we calculate the f1-score, among other metrics, based on all predictions. In addition we return the confusion matrix and use Tensorboard logging, which is a supported logger for Pytorch Lightning, during our training runs. The saved events.out.tfevents files can be analyzed in tensorboard, to monitor all desirable metrics during the training run. For further details, refer to tensorboard documentation.

Hardware & Performance

The following performance stats correpsond to the use of an ml.g4dn.xlarge instance on AWS Sagemaker, which uses a Tesla-V4 GPU. The model only requires ~1.2 GB of GPU memory itself. See the table below for details on memory usage and inference time. Values have been calculated using 5 second long mel spectrograms (n_bins=128). Results may vary for longer audio files or larger spectrograms.

Batch Size	GPU Memory [MB]	Inference Time [ms]
1	1200	25.1
32	4060	65.3
64	6950	121
128	12180	238

Contributors & Acknowledgements

List of people, that contributed in creating this model:

Raffaela Heily
Lukas Kemetinger
Dominik Lemm
Lucas Unterberger

Special thanks to the team of Naturalis Biodiversity Center that provided the data, the idea and the path towards a more sustainable and biodiverse future:

Dr. Elaine van Ommen Kloeke
Max Schöttler
Dr. Dan Stowell
Marius Faiß

Many thanks to our Capgemini Sponsors and organizational team that made all of this possible:

Niraj Parihar
Susanna Ostberg
TP Deo
Anne-Laure Thiuellent
Andris Roling
Dr. Daniel Kühlwein
Kanwalmeet Singh Kochar
Kristin O'Herlihy
Marc Niedermeier
Mateusz Gryz
Nikolai Babic
Sebastian Sell
Sophie (Nien-chun) Yin
Steffen Klempau
Surobhi Deb
Timo Abele
Tomasz Czerniawski

Additionally, we want to thank AWS, for providing us with their state of the art cloud infrastructure and enough resources, to make this challenge a reality.

Licenses

The used impulse response data is available on OpenAir licenced under CC BY 4.0. All additional Python dependencies are licensed either under MIT or Apache 2.0.

For the software license that covers this repository and the InsectEffNet code, see the file LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insect sound classifier: InsectEffNet

Table of contents

Setup

Repository Structure

Usage

Methodology

Data

Model Description

Class Imbalance and Augmentations

Pretrained vs Trained Model

Model Evaluation

Hardware & Performance

Contributors & Acknowledgements

Licenses

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
class_weights		class_weights
data		data
notebooks		notebooks
shell_scripts		shell_scripts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

danstowell/insect_classifier_GDSC23_insecteffnet

Folders and files

Latest commit

History

Repository files navigation

Insect sound classifier: InsectEffNet

Table of contents

Setup

Repository Structure

Usage

Methodology

Data

Model Description

Class Imbalance and Augmentations

Pretrained vs Trained Model

Model Evaluation

Hardware & Performance

Contributors & Acknowledgements

Licenses

About

Topics

Resources

License

Stars

Watchers

Forks

Languages