Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset. Paper can be found here
-
To
preprocess
the audio data, runpython main.py preprocess -r <audio_data_path>
-
To
train
the model using the preprocessed audio data, runpython main.py train -r <audio_data_path> -m <model_name>
Models Implemented: simple_cnn, resnet18, resnet34, resnet50, simple_lstm, bi_lstm, lstm_attention, bi_lstm_attention
-
To
test
the model on the test data, runpython main.py test -r <audio_data_path> -m <model_name> -c <saved_model_path>
-
To perform
inference
on the audio files directly, runpython main.py inference -r <audio_files_path> -m <saved_model_path>