Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

speech recognition auto processor #1075

Merged
merged 7 commits into from
Dec 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Changed

- Changed `Wav2Vec2Processor` to `AutoProcessor` and seperate it from backbone [optional] ([#1075](https://github.com/PyTorchLightning/lightning-flash/pull/1075))

### Deprecated

### Fixed
Expand Down
13 changes: 11 additions & 2 deletions flash/audio/speech_recognition/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
)

if _AUDIO_AVAILABLE:
from transformers import Wav2Vec2Processor
from transformers import AutoProcessor


class SpeechRecognition(Task):
Expand All @@ -64,6 +64,7 @@ class SpeechRecognition(Task):
def __init__(
self,
backbone: str = "facebook/wav2vec2-base-960h",
processor_backbone: str = None,
optimizer: OPTIMIZER_TYPE = "Adam",
lr_scheduler: LR_SCHEDULER_TYPE = None,
learning_rate: float = 1e-5,
Expand All @@ -89,7 +90,15 @@ def __init__(
self.save_hyperparameters()

self.set_state(SpeechRecognitionBackboneState(backbone))
self.set_state(CollateFn(DataCollatorCTCWithPadding(Wav2Vec2Processor.from_pretrained(backbone))))
self.set_state(
CollateFn(
DataCollatorCTCWithPadding(
AutoProcessor.from_pretrained(backbone)
if processor_backbone is None
else AutoProcessor.from_pretrained(processor_backbone)
)
)
)

def forward(self, batch: Dict[str, torch.Tensor]):
return self.model(batch["input_values"])
Expand Down
4 changes: 2 additions & 2 deletions requirements/datatype_audio.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchaudio
librosa>=0.8.1
transformers>=4.11.0
datasets>=1.8
transformers>=4.13.0
datasets>=1.16.1
2 changes: 1 addition & 1 deletion requirements/datatype_text.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ sentencepiece>=0.1.95
filelock
transformers>=4.5
torchmetrics[text]>=0.5.1
datasets>=1.8,<1.13
datasets>=1.8
sentence-transformers