Skip to content

DongKeon/Awesome-Speaker-Diarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Awesome-Speaker-Diarization Awesome

Some comprehensive papers about speaker diarization (SD).

If you discover any unnoticed documents, please open issues or pull requests (recommended).


Table of Contents


Overview

  • DIHARD Keynote Session: The yellow brick road of diarization, challenges and other neural paths [Slides] [Video]

Reviews

  • “A review of speaker diarization: Recent advances with deep learning”, in Computer Speech & Language, Volume 72, 2023. (USC) [Paper]
  • "An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings", in Computer Speech & Language, 2023. [Paper]
  • "Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning," in Submitted to IEEE/ACM TASLP, 2024. [Paper]

EEND (End-to-End Neural Diarization)-based

  • BLSTM-EEND: "End-to-End Neural Speaker Diarization with Permutation-Free Objectives", in Proc. Interspeech, 2019. (Hitachi) [Paper]
  • SA-EEND (1): “End-to-End Neural Speaker Diarization with Self-attention”, in Proc. ASRU, 2019. (Hitachi) [Paper] [Code] [Pytorch] [Review]
  • SA-EEND (2): “End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification”, in arXiv:2003.02966, 2020. (Hitachi) [Paper] [Review]
  • SC-EEND: "Neural Speaker Diarization with Speaker-Wise Chain Rule", in arXiv:2006.01796, 2020. (Hitachi) [Paper] [Review]
  • EEND-EDA (1): “End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors”, in Proc. Interspeech, 2020. (Hitachi) [Paper] [Review] [Code]
  • EEND-EDA (2): “Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization”, in IEEE/ACM TASLP, 2022. (Hitachi) [Paper] [Review] [Code]
  • CB-EEND: "End-to-end Neural Diarization: From Transformer to Conformer", in Proc. Interspeech, 2021. (Amazon) [Paper] [Review]
  • TDCN-SA: "End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings", in Proc. ICASSP, 2021. (Google) [Paper] [Review]
  • "End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection", in Proc. IEEE SLT, 2021. (Hitachi) [Paper]
  • EEND-VC (1): "Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds", in Proc. ICASSP, 2021. (NTT) [Paper] [Review] [Code]
  • EEND-VC (2): "Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech", in Proc. Interspeech, 2021. (NTT) [Paper] [Review] [Code]
  • "Robust End-to-End Speaker Diarization with Conformer and Additive Margin Penalty," in Proc. Interspeech, 2021. (Fano Labs) [Paper]
  • EEND-GLA: "Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors", in Proc. ASRU, 2021. (Hitachi) [Paper] [Reivew]
  • "DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding", in Proc. ICASSP, 2022. (Google) [Paper]
  • RX-EEND: “Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization”, in Proc. ICASSP, 2022. (GIST) [Paper] [Review]
  • "End-to-end speaker diarization with transformer", in Proc. arXiv, 2022. [Paper]
  • EEND-VC-iGMM: "Tight integration of neural and clustering-based diarization through deep unfolding of infinite Gaussian mixture model", in Proc. ICASSP, 2022. (NTT) [Paper]
  • EDA-RC: "Robust End-to-end Speaker Diarization with Generic Neural Clustering", in Proc. Interspeech, 2022. (SJTU) [Paper]
  • EEND-NAA: "End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors", in Proc. Interspeech, 2022. (JHU) [Paper]
  • Graph-PIT: "Utterance-by-utterance overlap-aware neural diarization with Graph-PIT", in Proc. Interspeech, 2022. (NTT) [Paper] [Code]
  • "Efficient Transformers for End-to-End Neural Speaker Diarization", in Proc. IberSPEECH, 2022. [Paper]
  • "Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads", in Proc. ICASSP, 2023. (HU) [Paper]
  • EEND-NA: “Neural Diarization with Non-Autoregressive Intermediate Attractors”, in Proc. ICASSP, 2023. (LINE) [Paper]
  • EEND-EDA-SpkAtt: "Towards End-to-end Speaker Diarization in the Wild", in arXiv:2211.01299v1, 2022. [Paper]
  • "TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization", in Proc. ICASSP, 2023. (Alibaba) [Paper] [Code]
  • EEND-IAAE: "End-to-end neural speaker diarization with an iterative adaptive attractor estimation," in Neural Networks, Elsevier. [Paper] [Code]
  • "Improving End-to-End Neural Diarization Using Conversational Summary Representations", in Proc. Interspeech, 2023. (Fano Labs) [Paper]
  • AED-EEND: “Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor”, in Proc. Interspeech, 2023. (SJTU) [Paper] [Review]
  • "Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization", in Proc. Interspeech, 2023. (HU) [Paper]
  • "Powerset Multi-class Cross Entropy Loss for Neural Speaker Diarization", in Proc. Interspeech, 2023. (Pyannote) [Paper] [Code]
  • "End-to-End Neural Speaker Diarization with Absolute Speaker Loss", in Proc. Interspeech, 2023. (Pyannote) [Paper]
  • "Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization", in Electronics, 2023. [Paper]
  • EEND-TA: "Transformer Attractors for Robust and Efficient End-to-End Neural Diarization," in Proc. ASRU, 2023. (Fano Labs) [Paper]
  • "Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning," in Proc. ASRU, 2023. (Fano Labs) [Paper]
  • "NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization," in Proc. ICASSP, 2024. (NTT) [Paper]
  • AED-EEND-EE: "Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer," in IEEE/ACM TASLP, 2024. (SJTU) [Paper] [Review]
  • "DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors," in IEEE/ACM TASLP, 2024. (BUT) [Paper] [Code] [Review]
  • "EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings," in Submitted to IEEE SPL, 2024. (SNU) [Paper] [Review]
  • "EEND-M2F: Masked-attention mask transformers for speaker diarization," in arXiv:2401.12600, 2024. (Fano Labs) [Paper] [Review]
  • EEND-NAA (2): "End-to-End Neural Speaker Diarization with Non-Autoregressive Attractors", in IEEE/ACM TASLP, 2024. (JHU) [Paper]
  • "From Modular to End-to-End Speaker Diarization," Ph.D. thesis, 2024. (BUT) [Paper]

Related Speaker information

  • "Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?," in Proc. Odyssey, 2024. [Paper]
  • "Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios," in Proc. Odyssey, 2024. [Paper]

Simulated Dataset

  • Concat-and-sum: “End-to-end neuarl speaker diarization with permuation-free objectives”, in Proc. Interspeech, 2019. [Paper]
  • “From simulated mixtures to simulated conversations as training data for end-to-end neural diarization” , in Proc. Interspeech, 2022. (BUT) [Paper] [Code] [Review]
  • Markov selection: “Improving the naturalness of simulated conversations for end-to-end neural diarization”, in Proc. Odyssey, 2022. (Hitachi) [Paper]
  • "Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization", in Proc. ICASSP, 2023. (BUT) [Paper] [Code] [Review]
  • EEND-EDA-SpkAtt: "Towards End-to-end Speaker Diarization in the Wild", in arXiv:2211.01299v1, 2022. [Paper]
  • "Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation," in CHiME-7 Workshop, 2023. (NVIDIA) [Paper]
  • "Enhancing low-latency speaker diarization with spatial dictionary learning," in Proc. ICASSP, 2024. (NTU) [Paper] [Poster]
  • "Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling," in Proc. ICASSP, 2024. (OSU) [Paper]

Post-Processing

  • EENDasP: "End-to-End Speaker Diarization as Post-Processing", in Proc. ICASSP, 2021. (Hitachi) [Paper] [Review [Code]
  • Dover-Lap: "DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs", in Proc. IEEE SLT, 2021. (JHU) [Paper] [Review] [Code]
  • "DiaCorrect: Error Correction Back-end For Speaker Diarization," in Proc. ICASSP, 2024. (BUT) [Paper] [Code]

Using Target Speaker Embedding

  • TS-VAD: "Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario", in Proc. Interspeech, 2020. [Paper] [Code] [PPT]
  • “The STC system for the CHiME-6 challenge,” in CHiME Workshop, 2020. [Paper]
  • SEND (1): "Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information," in arXiv:2111.13694, 2021. (Alibaba) [Paper]
  • SEND (2): "Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios," in arXiv:2203.09767, 2022 (Alibaba) [Paper]
  • MTEAD: "Multi-target Filter and Detector for Unknown-number Speaker Diarization", in IEEE SPL, 2022. [Paper]
  • SOND: "Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis", in Proc. EMNLP, 2022. (Alibaba) [Paper] [Code]
  • EDA-TS-VAD: “Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization”, in Proc. ICASSP, 2023. (Microsoft) [Paper]
  • Seq2Seq-TS-VAD: “Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction”, in Proc. ICASSP, 2023. (DKU) [Paper] [Review]
  • QM-TS-VAD: "Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization", in Proc. Interspeech, 2023. (USTC) [Paper]
  • "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding," in IEEE/ACM TASLP, 2023. (USTC) [Paper] [Code]
  • NSD-MS2S: "Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture, " in Proc. ICASSP, 2024. (USTC) [Paper] [Code]
  • PET-TSVAD: "Profile-Error-Tolerant Target-Speaker Voice Activity Detection," in Proc. ICASSP, 2024. (Microsoft) [Paper]

Target Speech Diarization

  • PTSD: "Prompt-driven Target Speech Diarization," in Proc. ICASSP, 2024. (NUS) [Paper]

With Separation or Target Speaker Extraction

  • "Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis," in Proc. SLT, 2021. (JHU) [Paper] [Blog] [Review]
  • EEND-SS: "Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers”, in Proc. SLT, 2022. (CMU) [Paper] [Review]
  • "TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings", in IEEE/ACM TASLP, 2024. [Paper]
  • "Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings," in arXiv:2401.15993, 2024. (Tencent) [Paper] [Demo]
  • "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings," in Proc. Odyssey, 2024. [Paper] [Code]
  • MC-EEND: "Multi-channel Conversational Speaker Separation via Neural Diarization," in IEEE/ACM TASLP, 2024. (OSU) [Paper]
  • "USED: Universal Speaker Extraction and Diarization," in submitted to IEEE/ACM TASLP, 2024. (CUHK) [Paper] [Demo] [Util] [Review]
  • "Neural Blind Source Separation and Diarization for Distant Speech Recognition," in Proc. Interspeech, 2024. (AIST) [Paper]
  • "TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024," in Proc. Interspeech, 2024. (Pyannote) [Paper]

Multi-Channel

  • "Multi-Channel End-to-End Neural Diarization with Distributed Microphones", in Proc. ICASSP, 2022. (Hitachi) [Paper]
  • "Multi-Channel Speaker Diarization Using Spatial Features for Meetings", in Proc. ICASSP, 2022. (Tencent) [Paper]
  • "Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization," in Proc. IEEE SLT, 2023. (Hitachi) [Paper]
  • "Semi-supervised multi-channel speaker diarization with cross-channel attention", in Proc. ASRU, 2023. (USTC) [Paper]
  • "UniX-Encoder: A Universal X-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing," in arXiv:2310.16367, 2024. (JHU, Tencent) [Paper]
  • "Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection," in IEEE/ACM TASLP, 2024. [Paper]
  • "A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition," in Proc. ICASSP, 2024. (USTC) [Paper]
  • MC-EEND: "Multi-channel Conversational Speaker Separation via Neural Diarization," in IEEE/ACM TASLP, 2024. (OSU) [Paper]
  • "ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings," in Proc. Interspeech, 2024. (LIUM) [Paper]

Online

  • "Supervised online diarization with sample mean loss for multi-domain data", in Proc. ICASSP, 2020 [Paper] [Code]
  • "Online End-to-End Neural Diarization with Speaker-Tracing Buffer", in Proc. IEEE SLT, 2021. (Hitachi) [Paper]
  • BW-EDA-EEND: "BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers", in Proc. Interspeech, 2021. (Amazon) [Paper]
  • FS-EEND: "Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers", in Proc. Interspeech, 2021. (Hitachi) [Paper] [Reivew]
  • Diart: "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation", in Proc. ASRU, 2021. [Paper] [Code]
  • "Low-Latency Online Speaker Diarization with Graph-Based Label Generation", in Proc. Odyssey, 2022. (DKU) [Paper]
  • EEND-GLA: "Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors", in IEEE/ACM TASLP, 2022. (Hitachi) [Paper]
  • Online TS-VAD: "Online Target Speaker Voice Activity Detection for Speaker Diarization", in Proc. Interspeech, 2022. (DKU) [Paper]
  • "Absolute decision corrupts absolutely: conservative online speaker diarisation", in Proc. ICASSP, 2023. (Naver) [Paper]
  • "A Reinforcement Learning Framework for Online Speaker Diarization", in Under Review. NeruIPS, 2023. (CU) [Paper]
  • OTS-VAD: "End-to-end Online Speaker Diarization with Target Speaker Tracking," in Submitted IEEE/ACM TASLP, 2023. (DKU) [Paper]
  • FS-EEND: "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors," in Proc. ICASSP, 2024. (Hangzhou) [Paper] [Code]
  • "Online speaker diarization of meetings guided by speech separation," in Proc. ICASSP, 2024. (LTCI) [Paper] [Code]
  • "Interrelate Training and Clustering for Online Speaker Diarization," in IEEE/ACM TASLP, 2024. [Paper]

Clustering-based

  • UIS-RNN: "Fully Supervised Speaker Diarization" (Google) [Paper] [Code]
  • DNC: "Discriminative Neural Clustering for Speaker Diarisation", in Proc. IEEE SLT, 2019. [Paper] [Code] [Review]
  • Pyannote: "pyannote.audio: neural building blocks for speaker diarization", in Proc. ICASSP, 2020. (CNRS) [Paper] [Code] [Video]
  • NME-SC: “Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap”, IEEE SPL, 2019. [Paper] [Code]
  • Resegmentation with VB: “Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection”, in Proc. ICASSP, 2020. [Paper]
  • Pyannote 2.0: "End-to-end speaker segmentation for overlap-aware resegmentation", in Proc. Interspeech, 2021. (CNRS) [Paper] [Code] [Video]
  • UMAP-Leiden: "Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure", in Proc. ICASSP, 2022. (Alibaba) [Paper]
  • SCALE: "Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation", in Proc. ICASSP, 2023. (CAM) [Paper]
  • SHARC: "Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization", in Proc. ICASSP, 2023. (IISC) [Paper]
  • CDGCN: "Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization," in Proc. ICASSP, 2023. (XMU) [Paper]
  • "Pyannote.Audio 2.1: Speaker Diarization Pipeline: Principle, Benchmark and Recipe", in Proc. Interspeech, 2023. (CNRS) [Paper]
  • GADEC: "Graph attention-based deep embedded clustering for speaker diarization,", in Speech Communication, 2023. (NJUPT) [Paper]
  • "Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization," in submitted to IEEE/ACM TASLP, 2024. [Paper]
  • "Apollo's Unheard Voices: Graph Attention Networks for Speaker Diarization and Clustering for Fearless Steps Apollo Collection," in Proc. ICASSP, 2024. (UTD) [Paper]
  • "Multi-View Speaker Embedding Learning for Enhanced Stability and Discriminability," in Proc. ICASSP, 2024. (Tsinghua) [Paper]
  • "Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders," in arXiv:2407.01963, 2024. [Paper]
  • "Investigating Confidence Estimation Measures for Speaker Diarization," in Proc. Interspeech, 2024. [Paper]
  • "Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment," in Proc. Interspeech, 2024. (PU) [Paper]

Embedding (With Clustering)

  • "Multi-Scale Speaker Diarization With Neural Affinity Score Fusion", in Proc. ICASSP, 2021. (USC) [Paper]
  • AA+DR+NS: "Adapting Speaker Embeddings for Speaker Diarisation", in Proc. Interspeech, 2021. (Naver) [Paper] [Review]
  • GAT+AA: "Multi-scale speaker embedding-based graph attention networks for speaker diarisation", in Proc. ICASSP, 2022. (Naver) [Paper]
  • MSDD: "Multi-scale Speaker Diarization with Dynamic Scale Weighting", in Proc. Interspeech, 2022. (NVIDIA) [Paper] [Code] [Blog]
  • "In Search of Strong Embedding Extractors For Speaker Diarization", in Proc. ICASSP, 2023. (Naver) [Paper] [Review]
  • PRISM: "PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification", in Proc. Interspeech, 2022. (Alibaba) [Paper]
  • DR-DESA: "Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity", in Proc. ICASSP, 2023. (Naver) [Paper] [Review]
  • HEE: "High-resolution embedding extractor for speaker diarisation", in Proc. ICASSP, 2023. (Naver) [Paper] [Review]
  • "Frame-wise and overlap-robust speaker embeddings for meeting diarization", in Proc. ICASSP, 2023. (PU) [Paper] [Review]
  • "A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures", in Proc. Interspeech, 2023. (PU) [Paper]
  • "Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios", in Proc. ICASSP, 2024. (PU) [Paper] [Review]
  • "Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization," in Proc. Odyssey, 2024. (IDLab) [Paper]

With Speaker Identification

  • "Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification, in Submitted to IEEE/ACM TASLP, 2024. [Paper]

Speaker Recogniton & Verification

  • "Xi-Vector Embedding for Speaker Recognition," in IEEE, SPL. (A*STAR) [Paper] [Review]
  • "Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022," in Proc. Interspeech, 2023. (SJTU) [Paper]
  • RecXi "Disentangling Voice and Content with Self-Supervision for Speaker Recognition," in Proc. NeurIPS, 2023. (A*STAR) [Paper]
  • "ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings," in Proc. ASRU, 2023. (IDLab) [Paper] [Model] [Review]
  • "Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification," in ICASSP, 2024. (Naver) [Paper]
  • "Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition," in ICASSP, 2024. (CUHK) [Paper]

Scoring

  • LSTM scoring: "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization", in Proc. Interspeech, 2019. (DKU) [Paper]
  • "Self-Attentive Similarity Measurement Strategies in Speaker Diarization", in Proc. Interspeech, 2020. (DKU) [Paper]
  • “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization”, IEEE/ACM TASLP, 2023. (DKU) [Paper]

Varational Bayes and HMM

VBx Series

  • "Speaker Diarization based on Bayesian HMM with Eigenvoice Priors", in Proc. Odyssey, 2018. (BUT) [Paper]
  • "VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation", in Proc. Odyssey, 2018. (Tsinghua) [Paper]
  • “Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors”, IEEE/ACM TASLP, 2019. (BUT) [Paper]
  • "BUT System Description for DIHARD Speech Diarization Challenge 2019", in arXiv:1910.08847, 2019. (BUT) [Paper]
  • "Bayesian HMM Based x-Vector Clustering for Speaker Diarization", in Proc. Interspeech, 2019. (BUT) [Paper]
  • "Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge", in Proc. ICASSP, 2020. (BUT) [Paper]
  • "Analysis of the but Diarization System for Voxconverse Challenge", in Proc. ICASSP, 2021. (BUT) [Paper] [Code]
  • "Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks", in Computer Speech & Language, 2022. (BUT) [Paper]
  • MS-VBx: "Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization", in Proc. Interspeech, 2023. (NTT) [Paper]
  • DVBx: "Discriminative Training of VBx Diarization", in Proc. ICASSP, 2024. (BUT) [Paper] [Code]

Variational Bayes

  • "Variational Bayesian methods for audio indexing", in Proc. ICMI-MLMI, 2005. [Paper]
  • "Bayesian analysis of speaker diarization with eigenvoice priors", in CRIM, Montreal, Technical Report, 2008. [Paper]
  • "Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach", IEEE/ACM TASLP, 2013. [Paper]
  • "Diarization resegmentation in the factor analysis subspace", in Proc. ICASSP, 2015. [Paper]
  • "Diarization is hard: some experiences and lessons learned for the JHU team in the inaugural DIHARD challenge", in Proc. Interspeech, 2018. [Paper]

Normalization

  • "Analysis of i-vector length normalization in speaker recognition systems", in Proc. Interspeech, 2011. [Paper]

PLDA (Probabilistic Linear Discriminant Analysis)

  • "The speaker partitioning problem", in Proc. Odyssey, 2018. [Paper]
  • "Discriminatively trained probabilistic linear discriminant analysis for speaker verification", in Proc. ICASSP, 2021. [Paper]
  • "Speaker diarization with plda i-vector scoring and unsupervised calibration", in Proc. IEEE SLT, 2014. [Paper]
  • "Iterative PLDA Adaptation for Speaker Diarization", in Proc. Interspeech, 2016. [Paper]
  • "Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering, in Proc. Interspeech, 2017. [Paper]
  • "Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge", in Proc. Interspeech, 2018. [Paper]
  • DCA-PLDA "A Speaker Verification Backend with Robust Performance across Conditions”, in Computer & Language, 2022. [Paper] [Code]
  • "Generalized domain adaptation framework for parametric back-end in speaker recognition", in arXiv:2305.15567, 2023. [Paper]

With ASR

  • "Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR," in Proc. ICASSP, 2022. [Paper]
  • "Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription", in Proc. Interspeech, 2022. [Paper]
  • "Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator", in Proc. Interspeech, 2023. (CUHK) [Paper]
  • "Multi-resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model", in Proc. Interspeech, 2023.
  • "Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach", in Proc. Interspeech, 2023. [Paper]
  • "Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction", in Proc. Interspeech, 2023. (Amazon) [Paper]
  • "Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach,", in Proc. ICASSP, 2024. (NVIDIA) [Paper]
  • WEEND: "Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network," in arXiv:2309.08489, 2024. (Google) [Paper] [Supplementary]
  • "One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition", in Proc. ICASSP, 2024. (CMU) [Paper]
  • "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization," in arXiv:2309.16482, 2024. (PU) [Paper]
  • “Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing," in Proc. ICASSP, 2024. (DKU) [Paper]
  • "Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers" in Proc. ICASSP, 2024. (Idiap) [Paper]
  • "A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition," in Proc. ICASSP, 2024. (USTC) [Paper]

    Speaker-attributed ASR

    • "SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR," in Proc. ASRU, 2023. (Alibaba) [Paper]
    • "Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition," in arXiv:2312.10959, 2024. (NICT) [Paper]
    • "On Speaker Attribution with SURT," in Proc. Odyssey, 2024. (JHU) [Paper]
    • "Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications," in Proc. Odyssey, 2024. (CNRS) [Paper]

Language Diarization

  • "End-to-End Spoken Language Diarization with Wav2vec Embeddings", in Proc. Interspeech, 2023. [Paper] [Code]
  • "Multi-resolution Approach to Identification of Spoken Languages and To Improve Overall Language Diarization System Using Whisper Model," in Proc. Interspeech, 2023. [Paper]

With NLP (LLM)

  • "Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization", in Proc. ACL, 2023. (Alibaba) [Paper]
  • MMSCD, "Encoder-decoder multimodal speaker change detection", in Proc. Interspeech, 2023. (Naver) [Paper]
  • "Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment,", in Proc. ICTAI, 2023. [Paper]
  • "DiariST: Streaming Speech Translation with Speaker Diarization," in Proc. ICASSP, 2024. (Microsoft) [Paper] [Code]
  • JPCP: "Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation," in arXiv:2309.10456, 2024. (Alibaba) [Paper]
  • "DiarizationLM: Speaker Diarization Post-Processing with Large Language Models," in Submitted to ICLR, 2024. (Google) [Paper] [Code]
  • "LLM-based speaker diarization correction: A generalizable approach," in Submitted to IEEE/ACM TASLP, 2024. [Paper]
  • "AG-LSEC: Audio Grounded Lexical Speaker Error Correction," in Proc. Interspeech, 2024. (Amazon) [Paper]

With Vision

  • "Who said that?: Audio-visual speaker diarisation of real-world meetings", in Proc. Interspeech, 2019. (Naver) [Paper]
  • "Self-supervised learning for audio-visual speaker diarization", in Proc. ICASSP, 2020. (Tencent) [Paper] [Blog]
  • AVA-AVD (AVR-Net): "AVA-AVD: Audio-Visual Speaker Diarization in the Wild", in Proc. ACM MM, 2022. [Paper] [Code] [Video]
  • "End-to-End Audio-Visual Neural Speaker Diarization", in Proc. Interspeech, 2022. (USTC) [Paper] [Code] [Review]
  • DyViSE: "DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization", in Proc. MMSP, 2022. (THU) [Paper] [Code]
  • "Audio-Visual Speaker Diarization in the Framework of Multi-User Human-Robot Interaction", in Proc. ICASSP, 2023. [Paper]
  • STHG: "Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization, in Proc. CVPR, 2023. (Intel) [Paper]
  • "Speaker Diarization of Scripted Audiovisual Content," in arXiv:2308.02160, 2024. (Amazon) [Paper]
  • "Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings," in Proc. ACM MM, 2023. [Paper]
  • "Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization," in Springer Computer Science proceedings, 2023. [Paper]
  • EEND-EDA++: "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization," in arXiv:2211.01299v2, 2023. [Paper]
  • "AFL-Net: Integrating Audio, Facial, and Lip Modalities with Cross-Attention for Robust Speaker Diarization in the Wild," in Proc. ICASSP, 2024. (Tencent) [Paper] [Demos]
  • "Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation," in Proc. AAAI, 2024. (Tencent) [Paper]
  • "Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization," in Submitted to IEEE/ACM TASLP. (DKU) [Paper]
  • "3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization," in arXiv:2403.19971, 2024. (Alibaba) [Paper] [Code]
  • "Target Speech Diarization with Multimodal Prompts," in Submitted to IEEE/ACM TASLP, 2024. (NUS) [Paper]
  • MFV-KSD: "Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization," in Submitted to ACM MM, 2024. [Paper] [Code]

Related Spoofing

  • "Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio," in Proc. Interspeech, 2024. (IITK) [Paper]

Related TTS

Speaker Anonymization

  • "A Benchmark for Multi-speaker Anonymization," in Submitted to IEEE/ACM TASLP, 2024. (SIT) [Paper] [Code]

Singing Diarization

  • "Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework," in Proc. Interspeech, 2024. (LY) [Paper]

With Emotion

  • "Speech Emotion Diarization: Which Emotion Appears When?," in Proc. ASRU, 2023. (Zaion) [Paper]
  • "EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks," in arxiv:2310.12851, 2023. [Paper]
  • "ED-TTS: Multi-scale Emotion Modeling using Cross-domain Emotion Diarization for Emotional Speech Synthesis, in Proc. ICASSP, 2024. [Paper]

Personal VAD

  • "Personal VAD: Speaker-Conditioned Voice Activity Detection", in Proc. Odyssey, 2020. (Google) [Paper]
  • "SVVAD: Personal Voice Activity Detection for Speaker Verification", in Proc. Interspeech, 2023. [Paper]

VAD & OSD & SCD

  • "Overlapped Speech Detection in Broadcast Streams Using X-vectors," in Proc. Interspeech, 2022. [Paper]
  • "Overlapped speech and gender detection with WavLM pre-trained features," in Proc. Interspeech, 2022. [Paper]
  • "Microphone Array Channel Combination Algorithms for Overlapped Speech Detection," in Proc. Interspeech, 2022. [Paper]
  • "Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0," in Proc. ICASSP, 2023. [Paper] [Code]
  • "Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction," in Proc. Interspeech, 2023. [Paper]
  • "Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains," in arxiv:2307.13012, 2023. [Paper]
  • "Advancing the study of Large-Scale Learning in Overlapped Speech Detection," in arXiv:2308.05987, 2023. [Paper]
  • "USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models," in Proc. ICASSP, 2024. (Google) [Paper]
  • "Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection," in IEEE/ACM TASLP, 2024. [Paper]

Dataset

  • Voxconverse: "Spot the conversation: speaker diarisation in the wild", in Proc. Interspeech, 2020. (VGG, Naver) [Paper] [Code] [Dataset]
  • MSDWild: Multi-modal Speaker Diarization Dataset in the Wild, in Proc. Interspeech, 2020. [Paper] [Dataset]
  • "LibriMix: An Open-Source Dataset for Generalizable Speech Separation," in arXiv:2005.11262, 2020. [Paper] [Code]
  • Ego4D: " Around the World in 3,000 Hours of Egocentric Video," in Proc. CVPR, 2022. (Meta) [Paper] [Code] [Page]
  • AliMeeting: "Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge," in Proc. ICASSP, 2022. (Alibaba) [Paper] [Dataset] [Code]
  • "VoxBlink: X-Large Speaker Verification Dataset on Camera", in Proc. ICASSP, 2024. [Paper] [Dataset]
  • "NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription," in arXiv:2401.08887, 2024. (MS) [Paper]
  • "A Comparative Analysis of Speaker Diarization Models: Creating a Dataset for German Dialectal Speech," in Proc. ACL, 2024. [Paper]
  • "Conversations in the wild: Data collection, automatic generation and evaluation," in Computer Speech & Language, 2025. [Paper]
  • "ALLIES: A Speech Corpus for Segmentation, Speaker Diarization, Speech Recognition and Speaker Change Detection," in Proc. ACL, 2024. (LIUM) [Paper]""

Self-Supervised

  • “Self-supervised Speaker Diarization”, in Proc. Interspeech, 2022. [Paper]
  • CSDA: "Continual Self-Supervised Domain Adaptation for End-to-End Speaker Diarization", in Proc. IEEE SLT, 2022. (CNRS) [Paper] [Code]

Semi-Supervised

  • "Active Learning Based Constrained Clustering For Speaker Diarization", in IEEE/ACM TASLP, 2017. (UT) [Paper]

Measurement

  • BER: “Balanced Error Rate For Speaker Diarization”, in Proc. arXiv:2211.04304, 2022 [Paper] [Code]

Child-Adult

  • "Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism," in Proc. Interspeech, 2023. [Paper]
  • "Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions," in Proc. Interspeech, 2024. (USC) [Paper]

Challenge

VoxSRC (VoxCeleb Speaker Recognition Challenge)

VosSRC-20 Track4

[Workshop]

VosSRC-21 Track4

[Workshop]

VoxSRC-22 Track4

[Paper] [Workshop]

VoxSRC-23 Track4

[Paper] [Workshop]

M2MeT (Multi-channel Multi-party Meeting Transcription Grand Challenge)

2022 M2MeT

[Introduction Paper] [Summary Paper] [Dataset-AliMeeting] [Code]

MISP (Multimodal Information Based Speech Processing)

2022 MISP Track1

[Introduction Paper] [Page] [Basline Code]

DIHARD

2020 DIHARD III

[Page] [Paper] [Program]

Track1

Track2

Etc.

"End-to-end speaker diarization system for the third dihard challenge system description," in DIHARD III Tech. Report, 2021

The DISPLACE Challenge 2023

  • "The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments," in Proc. Interspeech, 2023. [Paper] [Page]
  • "The SpeeD--ZevoTech submission at DISPLACE 2023," in Proc. Interspeech, 2023. [Paper]

MERLIon CCS Challenge 2023

  • "MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization," in Proc. Interspeech, 2023. [Paper] [Page]

CHiME-6

[Overview] [Paper]

ICMC-ASR Grand Challenge (ICASSP2024)

  • "ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge," 2023. [Paper]
  • "The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge," in Technical Report, 2023. [Paper]

The Second DISPLACE

  • "The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments," in Proc. Interspeech, 2024. [Paper]"

CHiME-8

  • "The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization," 2024. [Paper]

Other Awesome-list

https://github.com/wq2012/awesome-diarization

https://github.com/xyxCalvin/awesome-speaker-diarization