Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker diarization with whisper transcription #181

Merged
merged 18 commits into from
Sep 26, 2024
Merged

Conversation

Jiltseb
Copy link
Contributor

@Jiltseb Jiltseb commented Sep 19, 2024

This PR addresses the feature request #176.

  1. Combines output from Speaker Diarization Deployment and Whisper deployment.
  2. The post-processing algorithm generates diarized transcription as shown in the example notebook.
  3. Added test for the post-processing algorithm.
  4. Changes in docs

@Jiltseb Jiltseb self-assigned this Sep 20, 2024
aana/core/models/asr.py Outdated Show resolved Hide resolved
aana/processors/speaker.py Outdated Show resolved Hide resolved
aana/processors/speaker.py Outdated Show resolved Hide resolved
docs/pages/model_hub/asr.md Outdated Show resolved Hide resolved
aana/processors/speaker.py Outdated Show resolved Hide resolved
aana/processors/speaker.py Outdated Show resolved Hide resolved
aana/processors/speaker.py Outdated Show resolved Hide resolved
aana/processors/speaker.py Outdated Show resolved Hide resolved
@movchan74
Copy link
Contributor

I cannot comment the notebook, so I will write the comments here.

  1. You can use the following code instead of your data filtering code:
[s.model_dump(include=["text", "time_interval", "speaker"]) for s in segments]
  1. I cannot run the code because I get the following error:
RuntimeError: Traceback (most recent call last):
  File "/workspaces/aana_sdk/aana/deployments/pyannote_speaker_diarization_deployment.py", line 77, in apply_config
    self.diarize_model = Pipeline.from_pretrained(self.model_id)
  File "/root/.cache/pypoetry/virtualenvs/aana-vIr3-B0u-py3.10/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 138, in from_pretrained
    pipeline = Klass(**params)
  File "/root/.cache/pypoetry/virtualenvs/aana-vIr3-B0u-py3.10/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 130, in __init__
    model: Model = get_model(segmentation, use_auth_token=use_auth_token)
  File "/root/.cache/pypoetry/virtualenvs/aana-vIr3-B0u-py3.10/lib/python3.10/site-packages/pyannote/audio/pipelines/utils/getter.py", line 89, in get_model
    model.eval()
AttributeError: 'NoneType' object has no attribute 'eval'

I did set the HF_TOKEN environment variable to my Hugging Face token. And the model page says: You have been granted access to this model. So, I don't know what is the problem.

@Jiltseb
Copy link
Contributor Author

Jiltseb commented Sep 23, 2024

I have changed the post-processing function into a class to keep the logic organized and the module-specific functions private. However, it has not been moved to diarization deployment because it will kill the flexibility to run the diarization and whisper models in parallel if needed. I think we can keep it this way until we have a specific SDK component that can combine the deployments and additional functions.

Copy link
Contributor

@movchan74 movchan74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Great work 👍

@Jiltseb Jiltseb merged commit 9eb34b3 into main Sep 26, 2024
6 checks passed
@Jiltseb Jiltseb deleted the js_diar_transcription branch October 29, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] Add optional speaker information with whisper transcription
2 participants