🗣️ SHEET / MOS-Bench 🎧

Manipulate MOS-Bench with SHEET

MOS-Bench is a benchmark designed to benchmark the generalization abilities of subjective speech quality assessment (SSQA) models. SHEET stands for the Speech Human Evaluation Estimation Toolkit. SHEET was designed to conduct research experiments with MOS-Bench.

Key Features

MOS-Bench is the first large-scale collection of training and testing datasets for SSQA, covering a wide range of domains, including synthetic speech from text-to-speech (TTS), voice conversion (VC), singing voice synthetis (SVS) systems, and distorted speech with artificial and real noise, clipping, transmission, reverb, etc. Researchers can use the testing sets to benchmark their SSQA model.
This repository aims to provide training recipes. While there are many off-the-shelf speech quality evaluators like DNSMOS, SpeechMOS and speechmetrics, most of them do not provide training recipes, thus are not research-oriented. Newcomers may utilize this repo as a starting point to SSQA research.

MOS-Bench Overview

MOS-Bench currently contains 7 training sets and 12 test sets. Below is a screenshot of a summary table from our paper. For more details, please see our paper or egs/README.md.

Supported models and features

Models

LDNet
- Original repo link: https://github.com/unilight/LDNet
- Paper link: [arXiv]
- Example config: egs/bvcc/conf/ldnet-ml.yaml
SSL-MOS
- Original repo link: https://github.com/nii-yamagishilab/mos-finetune-ssl/tree/main
- Paper link: [arXiv]
- Example config: egs/bvcc/conf/ssl-mos-wav2vec2.yaml
- Notes: We made some modifications to the original implementation. Please see the paper for more details.
UTMOS (Strong learner)
- Original repo link: https://github.com/sarulab-speech/UTMOS22/tree/master/strong
- Paper link: [arXiv]
- Example config: egs/bvcc/conf/utmos-strong.yaml
- Notes: After discussion with the first author of UTMOS, Takaaki, we feel that UTMOS = SSL-MOS + listener modeling + contrastive loss + several model arch and training differences. Takaaki also felt that using phoneme and reference is not really helpful for UTMOS strong alone. Therefore we did not implement every component of UTMOS strong. For instance, we did not use domain ID and data augmentation.
Modified AlignNet
- Original repo link: https://github.com/NTIA/alignnet
- Paper link: [arXiv]
- Example config: egs/bvcc+nisqa+pstn+singmos+somos+tencent+tmhint-qi/conf/alignnet-wav2vec2.yaml

Features

Modeling
- Listener modeling
- Self-supervised learning (SSL) based encoder, supported by S3PRL
  - Find the complete list of supported SSL models here.
Training
- Automatic best-n model saving and early stopiing based on given validation criterion
- Visualization, including predicted score distribution, scatter plot of utterance and system level scores
- Model averaging
- Model ensembling by stacking

Usage

I am new to MOS prediction research. I want to train models!

You are in the right place! This is the main purpose of SHEET.

We provide complete experiment recipes, i.e., set of scripts to download and process the dataset, train and evaluate models. This structure originated from Kaldi, and is also used in many speech processing based repositories (ESPNet, ParallelWaveGAN, etc.).

Please follow the installation instructions first, then see egs/README.md for how to start.

I already have my MOS predictor. I just want to do benchmarking!

We provide scripts to collect the test sets conveniently. These scripts can be run on Linux-like platforms with basic python requirements, such that you do not need to instal all the heavy packages, like PyTorch.

Please see the related section in egs/README.md for detailed instructions.

I just want to use your trained MOS predictor!

We utilize torch.hub to provide a convenient way to load pre-trained SSQA models and predict scores of wav files or torch tensors.

# load pre-trained model
>>> predictor = torch.hub.load("unilight/sheet:v0.1.0", "default", trust_repo=True, force_reload=True)

# you can either provide a path to your wav file
>>> predictor.predict(wav_path="/path/to/wav/file.wav")
3.6066928

# or provide a torch tensor with shape [num_samples]
>>> predictor.predict(wav=torch.rand(16000))
1.5806346

Or you can try out our HuggingFace Spaces Demo!

Instsallation

Editable installation with virtualenv

You don't need to prepare an environment (using conda, etc.) first. The following commands will automatically construct a virtual environment in tools/. When you run the recipes, the scripts will automatically activate the virtual environment.

git clone https://github.com/unilight/sheet.git
cd sheet/tools
make

Information

Citation

If you use the training scripts, benchmarking scripts or pre-trained models from this project, please consider citing the following paper.

@article{huang2024,
      title={MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models}, 
      author={Wen-Chin Huang and Erica Cooper and Tomoki Toda},
      year={2024},
      eprint={2411.03715},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.03715}, 
}

Acknowledgements

This repo is greatly inspired by the following repos. Or I should say, many code snippets are directly taken from part of the following repos.

Author

Wen-Chin Huang
Toda Labotorary, Nagoya University
E-mail: wen.chinhuang@g.sp.m.is.nagoya-u.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
egs		egs
sheet		sheet
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ SHEET / MOS-Bench 🎧

Manipulate MOS-Bench with SHEET

Table of Contents

Key Features

MOS-Bench Overview

Supported models and features

Usage

I am new to MOS prediction research. I want to train models!

I already have my MOS predictor. I just want to do benchmarking!

I just want to use your trained MOS predictor!

Instsallation

Editable installation with virtualenv

Information

Citation

Acknowledgements

Author

About

Releases

Packages

Languages

License

unilight/sheet

Folders and files

Latest commit

History

Repository files navigation

🗣️ SHEET / MOS-Bench 🎧

Manipulate MOS-Bench with SHEET

Table of Contents

Key Features

MOS-Bench Overview

Supported models and features

Usage

I am new to MOS prediction research. I want to train models!

I already have my MOS predictor. I just want to do benchmarking!

I just want to use your trained MOS predictor!

Instsallation

Editable installation with virtualenv

Information

Citation

Acknowledgements

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages