v1.0.0 – Initial Release

Overview

This is the first official release of WavLMMSDD, combining Microsoft’s WavLM (a robust speech representation model) with Nvidia’s MSDD (Multi-Scale Diarization Decoder) to deliver accurate multi-speaker diarization. By leveraging WavLM’s feature extraction and MSDD’s advanced segmentation and clustering, this project aims to handle even noisy or overlapping speech scenarios with greater precision.

Key Features

WavLM-Based Embeddings: High-quality, robust embeddings that enhance speaker identification.
MSDD Integration: Utilizes multi-scale inference for precise speaker diarization, including overlapping speech segments.
Telephony Model Support: Incorporates diar_msdd_telephonic (Nvidia NeMo), making it ideal for call-center and telephonic environments.

Use Cases

Call Centers: Efficiently track speakers in busy or noisy conversations.
Meeting Transcripts: Clearly segment overlapping voices in multi-participant discussions.
Voice Applications: Provides a strong foundation for any application that requires accurate speaker segmentation in diverse audio conditions.

Getting Started

Installation: You can install via PyPI using:
```
pip install wavlmmsdd
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0 – Initial Release

Overview

Key Features

Use Cases

Getting Started

Releases: bunyaminergen/WavLMMSDD

v1.0.0 – Initial Release

v1.0.0 – Initial Release

Overview

Key Features

Use Cases

Getting Started