Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker Identity Resolution #230

Closed
jfernandrezj opened this issue Jan 16, 2024 · 2 comments
Closed

Speaker Identity Resolution #230

jfernandrezj opened this issue Jan 16, 2024 · 2 comments
Labels
question Further information is requested

Comments

@jfernandrezj
Copy link

Thank you very much @juanmc2005 for this library, much much appreciated.
One question I have for this speaker-aware transcription is whether a custom plugin / observer / sink could be implemented for speaker identity resolution, and what would be the best pattern to achieve this.
Ideally on each buffer iteration or speaker change, a speaker resolution prediction, based on a model (probably like faiss / weaviate), could be added either to the rttm or to another file.

Any input would be much appreciated, thank you!

@juanmc2005 juanmc2005 added the question Further information is requested label Feb 2, 2024
@juanmc2005
Copy link
Owner

Hi @jfernandrezj, you could try recovering the internal speaker centroids of OnlineSpeakerClustering (centers attribute) to match them with other speakers as you mentioned. For this to work you'd need to use the same embedding model used in diart.

If you want to use a different speaker matching method/model, you can always incorporate it into the pipeline to either replace or complement diart's speaker embedding block, but this could be quite expensive in terms of latency. I would suggest to send audio to a separate speaker matching service and listening to it to label each speaker centroid at display time (e.g. speaker0 -> John).

@jfernandrezj
Copy link
Author

Thank you very much @juanmc2005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants