GitHub - aitor-alvarez/large-speech-models: Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper

Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper

This repository contains code to easily fine-tune pre-trained large speech recognition models on single or multiple GPUs.

Start by cloning the repository:

git clone https://github.com/aitor-alvarez/large-speech-models

Then install cd into the directory and install requirements:

pip install -r requirements.txt

There is only one configuration file model_asr.sh Inside this file you will find all the parameters needed.

python models/asr.py  \
--model_id='facebook/wav2vec2-xls-r-300m' \
--num_epochs=30 \
--batch_size=16 \
--lang='ar' \
--dataset='mozilla-foundation/common_voice_11_0' \
--output_dir='fine_tuned_models' \
--train_test='train'


Parameters:
-model_id: string use either a huggingface pretrained model (like above) or a local directory with the pre-trained model.
-num_epochs: int
-batch_size: int
-lang: string use language code if using CV (https://huggingface.co/datasets/common_voice).
-dataset: string a dataset from transformers library datasets.
-output_dir: string directorry where the fine-tuned model will be saved.
-train_test: string either 'train' or 'test' depending on whether you are fine-tuning or using a fine-tuned model for inference.

If using a custom dataset you can use the following parameter instead of "dataset":
-data_folder: string provide the path to your local dataset following the format of transformers dataset library (https://huggingface.co/docs/datasets/create_dataset)

If using data_folder you will need to use data_lang with language code.

This code was done with the idea of fine-tuning Wav2vec and Whisper for Arabic.

Some pre-trained models can be found here: https://huggingface.co/aitor-alvarez/wav2vec2-xls-r-300m-ar

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
model_asr.sh		model_asr.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper

About

Releases

Packages

Languages

aitor-alvarez/large-speech-models

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages