-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add speaker-aware transcription #147
base: develop
Are you sure you want to change the base?
Conversation
….tune. Fix major bug in Optimizer
Hey, I am unable to use this: |
@C0RE1312 sounds like a problem with pytorch not being able to compute the FFT. Have you tried updating the dependenciesof both torch and whisper? it's a pretty old PR |
Will this work with faster-whisper or any other faster version of whisper? |
BTW, I noticed that the last commit was in the April of 2023. So this feature has no new commits for more than one year. Do this mean the feature implementation has finished but it was not merged into the main branch? I noticed in the readme page of this project, there was a note stating that this feature was comming soon but ready. |
@ywangwxd unfortunately I haven't had the time to work on this as I'd like. I prioritized other things like documentation and testing for #98 |
Depends on #144
This PR adds a new
SpeakerAwareTranscription
pipeline that combines streaming diarization and streaming transcription to determine "who says what" in a live conversation. By default, this is shown as colored words in the terminal.The feature works as expected with
diart.stream
anddiart.serve
/diart.client
.The main thing preventing full compatibility with
diart.benchmark
anddiart.tune
is the evaluation metric.Since the output of the pipeline is annotated text with the format:
[speaker0]Hello [speaker1]Hi
, the metricdiart.metrics.WordErrorRate
will count labels as insertion errors.Next steps: implement a
SpeakerWordErrorRate
that computes the (weighted?) average WER across speakers.Changelog
TBD