Skip to content

tangyixuan/2ndPassContextASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

2ndPassContextASR

This repository contains the code for the paper Contextualized Speech Recognition: Rethinking Second-Pass Rescoring with Generative Large Language Models published in IJCAI 2024.

Overview

Our approach leverages contextual information to enhance speech recognition, particularly when dealing with diverse accents, as demonstrated on the SQuAD-SRC dataset.

image

Key features include:

  • Utilizing pre-trained speech models to generate diverse transcription candidates.
  • Exploiting contextual information for on-the-fly in-domain adaptation.
  • Employing large language models to refine transcriptions with rich linguistic knowledge.

Usage

Zero-Shot Prompting

To run zero-shot prompting:

python run_regenerate.py
image

This method achieves a 13.6% performance improvement without tuning the pre-trained speech and language models.

LoRA Tuning

To perform LoRA tuning:

  1. Tune the model
  2. Run the re-generation using finetuned model
python peft.py
python run_regenerate.py
image

Results show consistent performance gains with increasing training examples:

  • Tuning with just 100 examples results in a 19.8% improvement with Whisper Tiny and a 12% improvement with Whisper Medium.
  • Whisper Tiny tuned with 500 examples can outperform Whisper Medium, despite having about 20x fewer parameters.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages