multi-task-speech-classification

Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset. Paper can be found here

Run instructions

To preprocess the audio data, run

python main.py preprocess -r <audio_data_path>

To train the model using the preprocessed audio data, run
```
python main.py train -r <audio_data_path> -m <model_name> 
```
Models Implemented: simple_cnn, resnet18, resnet34, resnet50, simple_lstm, bi_lstm, lstm_attention, bi_lstm_attention

To test the model on the test data, run

python main.py test -r <audio_data_path> -m <model_name> -c <saved_model_path>

To perform inference on the audio files directly, run

python main.py inference -r <audio_files_path> -m <saved_model_path>

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
checkpoints		checkpoints
figures		figures
models		models
utils		utils
.gitignore		.gitignore
200420608-Multi_task_speech_classification.pdf		200420608-Multi_task_speech_classification.pdf
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt