Speaker Identity Resolution #230

jfernandrezj · 2024-01-16T12:58:15Z

Thank you very much @juanmc2005 for this library, much much appreciated.
One question I have for this speaker-aware transcription is whether a custom plugin / observer / sink could be implemented for speaker identity resolution, and what would be the best pattern to achieve this.
Ideally on each buffer iteration or speaker change, a speaker resolution prediction, based on a model (probably like faiss / weaviate), could be added either to the rttm or to another file.

Any input would be much appreciated, thank you!

juanmc2005 · 2024-02-02T15:12:49Z

Hi @jfernandrezj, you could try recovering the internal speaker centroids of OnlineSpeakerClustering (centers attribute) to match them with other speakers as you mentioned. For this to work you'd need to use the same embedding model used in diart.

If you want to use a different speaker matching method/model, you can always incorporate it into the pipeline to either replace or complement diart's speaker embedding block, but this could be quite expensive in terms of latency. I would suggest to send audio to a separate speaker matching service and listening to it to label each speaker centroid at display time (e.g. speaker0 -> John).

jfernandrezj · 2024-02-19T09:49:46Z

Thank you very much @juanmc2005

juanmc2005 added the question Further information is requested label Feb 2, 2024

jfernandrezj closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker Identity Resolution #230

Speaker Identity Resolution #230

jfernandrezj commented Jan 16, 2024

juanmc2005 commented Feb 2, 2024

jfernandrezj commented Feb 19, 2024

Speaker Identity Resolution #230

Speaker Identity Resolution #230

Comments

jfernandrezj commented Jan 16, 2024

juanmc2005 commented Feb 2, 2024

jfernandrezj commented Feb 19, 2024