2ndPassContextASR

This repository contains the code for the paper Contextualized Speech Recognition: Rethinking Second-Pass Rescoring with Generative Large Language Models published in IJCAI 2024.

Overview

Our approach leverages contextual information to enhance speech recognition, particularly when dealing with diverse accents, as demonstrated on the SQuAD-SRC dataset.

Key features include:

Utilizing pre-trained speech models to generate diverse transcription candidates.
Exploiting contextual information for on-the-fly in-domain adaptation.
Employing large language models to refine transcriptions with rich linguistic knowledge.

Usage

Zero-Shot Prompting

To run zero-shot prompting:

python run_regenerate.py

This method achieves a 13.6% performance improvement without tuning the pre-trained speech and language models.

LoRA Tuning

To perform LoRA tuning:

Tune the model
Run the re-generation using finetuned model

python peft.py
python run_regenerate.py

Results show consistent performance gains with increasing training examples:

Tuning with just 100 examples results in a 19.8% improvement with Whisper Tiny and a 12% improvement with Whisper Medium.
Whisper Tiny tuned with 500 examples can outperform Whisper Medium, despite having about 20x fewer parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
peft.py		peft.py
run_regenerate.py		run_regenerate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2ndPassContextASR

Overview

Usage

Zero-Shot Prompting

LoRA Tuning

About

Releases

Packages

Languages

tangyixuan/2ndPassContextASR

Folders and files

Latest commit

History

Repository files navigation

2ndPassContextASR

Overview

Usage

Zero-Shot Prompting

LoRA Tuning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages