Speaker diarization with whisper transcription #181

Jiltseb · 2024-09-19T11:50:25Z

This PR addresses the feature request #176.

Combines output from Speaker Diarization Deployment and Whisper deployment.
The post-processing algorithm generates diarized transcription as shown in the example notebook.
Added test for the post-processing algorithm.
Changes in docs

g scripts, example notebook, unit test, test files and modifies sd deployment

…_speech_confidence values

aana/core/models/asr.py

aana/deployments/pyannote_speaker_diarization_deployment.py

aana/processors/speaker.py

docs/pages/model_hub/asr.md

aana/processors/speaker.py

movchan74 · 2024-09-20T12:21:30Z

I cannot comment the notebook, so I will write the comments here.

You can use the following code instead of your data filtering code:

[s.model_dump(include=["text", "time_interval", "speaker"]) for s in segments]

I cannot run the code because I get the following error:

RuntimeError: Traceback (most recent call last):
  File "/workspaces/aana_sdk/aana/deployments/pyannote_speaker_diarization_deployment.py", line 77, in apply_config
    self.diarize_model = Pipeline.from_pretrained(self.model_id)
  File "/root/.cache/pypoetry/virtualenvs/aana-vIr3-B0u-py3.10/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 138, in from_pretrained
    pipeline = Klass(**params)
  File "/root/.cache/pypoetry/virtualenvs/aana-vIr3-B0u-py3.10/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 130, in __init__
    model: Model = get_model(segmentation, use_auth_token=use_auth_token)
  File "/root/.cache/pypoetry/virtualenvs/aana-vIr3-B0u-py3.10/lib/python3.10/site-packages/pyannote/audio/pipelines/utils/getter.py", line 89, in get_model
    model.eval()
AttributeError: 'NoneType' object has no attribute 'eval'

I did set the HF_TOKEN environment variable to my Hugging Face token. And the model page says: You have been granted access to this model. So, I don't know what is the problem.

…k to speaker_recognition model hub

…update post processor

Jiltseb · 2024-09-23T14:03:19Z

I have changed the post-processing function into a class to keep the logic organized and the module-specific functions private. However, it has not been moved to diarization deployment because it will kill the flexibility to run the diarization and whisper models in parallel if needed. I think we can keep it this way until we have a specific SDK component that can combine the deployments and additional functions.

aana/processors/speaker.py

docs/pages/model_hub/asr.md

docs/pages/model_hub/speaker_recognition.md

aana/processors/speaker.py

docs/pages/model_hub/asr.md

movchan74

Looks good! Great work 👍

Jiltseb added 5 commits September 2, 2024 15:49

updates in asr post processing for diarized transcription

9d9cb23

diarized_transcription; initial commit. Post processin

5b3ab99

g scripts, example notebook, unit test, test files and modifies sd deployment

fixing notebook

8e4c433

fixes, modified post processing for segment length, confidence and no…

1c933f6

…_speech_confidence values

remove Union dependency and redundant comments

a5944e7

Jiltseb linked an issue Sep 19, 2024 that may be closed by this pull request

[FEATURE REQUEST] Add optional speaker information with whisper transcription #176

Closed

Jiltseb added 2 commits September 20, 2024 07:49

added description in docs

9f8eb8c

fix typo in docs

8355015

Jiltseb self-assigned this Sep 20, 2024

Jiltseb requested a review from movchan74 September 20, 2024 08:10

movchan74 requested changes Sep 20, 2024

View reviewed changes

Jiltseb added 7 commits September 23, 2024 12:36

Add default word speaker to None

f3f0e1d

added GatedRepoError, uses segments for combining speaker segments

2ae3e83

code snippet to illustrate combining ASR and diarization outputs, lin…

a0776d7

…k to speaker_recognition model hub

removes usage of WhisperOutput and SpeakerDiarizationOutput classes, …

1e14419

…update post processor

added linkt to asr model hub for diarized ASR implementation details

0ffbc19

added post processing class, minor changes

42d7619

code refactoring for diarized ASR post processing

8b62bed

Jiltseb requested a review from movchan74 September 23, 2024 14:03