Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

Overview

This github repository contains models, scripts and data splits from our paper accepted at Interspeech 2024, which can be found here

Folder Structure

Source code for training different Supervised and Self Supervised models can be found under /src

/egs contains bash scripts to train models on the MyST and CSLU OGI Kids' datasets, as well as scripts to filter these datasets and obtain train test splits

Getting Started

Install Dependencies: transformers==4.32.1 torch evaluate datasets
On older versions of transformers it might be necessary to make minor edits to trainer.py to allow hotloading of Iterable datasets (if streaming is set to True). Follow the instructions in /egs/MyST/README.txt to make the necessary edits.
For training Nemo based models, it is necessary to clone the github repo
To train/evaluate a model a particular dataset, edit the corresponding yaml file stored in the /egs/dataset/config directory, and edit the necessary bash script.

Trained Models

MyST Models - Fully Finetuned

Model	MyST test WER	Huggingface Link
Whisper tiny	11.6	model
Whisper base	10.4	model
Whisper small	9.3	model
Whisper Medium	8.9	model
Whisper Large	13.0	model
Whisper Large v3	9.1	model
Canary	9.2	model
Parakeet	8.5	model
Wav2vec2.0 Large	11.1	model
HuBERT Large	11.3	model
WavLM Large	10.4	model

OGI Models - Fully Finetuned

Model	OGI test WER	Huggingface Link
Whisper tiny	3.0	model
Whisper base	2.3	model
Whisper small	1.8	model
Whisper Medium	1.5	model
Whisper Large	1.7	model
Whisper Large v3	1.4	model
Canary	1.5	model
Parakeet	1.8	model
Wav2vec2.0 Large	2.5	model
HuBERT Large	2.5	model
WavLM Large	1.8	model

MyST Models - Whisper small with Data Augmentations

Data Augmentation	Myst test WER	Huggingface Link
PP	8.8	model
VTLP	9.0	model
SP	8.9	model
SA	9.0	model

MyST Models - Whisper small with PEFT

PEFT Method	MyST test WER	Huggingface Link
Enc	9.2	model
Dec	9.5	model
LoRA	9.6	model
Prompt	10.4	model
Prefix	10.2	model
Adapter	9.3	model

Citation

If you use this code in your research, please cite it as follows:

@inproceedings{fan24b_interspeech,
  title     = {Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models},
  author    = {Ruchao Fan and Natarajan {Balaji Shankar} and Abeer Alwan},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {5173--5177},
  doi       = {10.21437/Interspeech.2024-1353},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
egs		egs
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

Overview

Folder Structure

Getting Started

Trained Models

MyST Models - Fully Finetuned

OGI Models - Fully Finetuned

MyST Models - Whisper small with Data Augmentations

MyST Models - Whisper small with PEFT

Citation

About

Releases

Packages

Languages

Diamondfan/SPAPL_KidsASR

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

Overview

Folder Structure

Getting Started

Trained Models

MyST Models - Fully Finetuned

OGI Models - Fully Finetuned

MyST Models - Whisper small with Data Augmentations

MyST Models - Whisper small with PEFT

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages