Huge acceleration for speaker-diarization pipeline #1442

davidas1 · 2023-08-01T11:36:02Z

Due to some implementation detail in the Audio.crop method, I noticed a massive slow-down of the embedding part of the pipeline when working with a filepath.
If I run the same pipeline with pre-loaded audio (which is also resampled) - the performance is much faster.

So, using for example the load_audio method from whisperx, I run the following code:

audio = whisperx.load_audio(AUDIO_FILEPATH)
audio_file = {
    'waveform': torch.from_numpy(audio[None, :]),
    'sample_rate': whisperx.audio.SAMPLE_RATE
}
diarize_segments = self.diarize_pipe(audio_file)

This results in x5-x10 faster runtime compared to sending the file path to the pipeline.
I think you should consider making this the default way to run the code (or fix the issue that cause the slow performance when working with filepath)

The text was updated successfully, but these errors were encountered:

github-actions · 2023-08-01T11:36:24Z

Thank you for your issue.
We found the following entry in the FAQ which you may find helpful:

Does pyannote support streaming speaker diarization?

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

remic33 · 2023-08-03T12:43:08Z

Why did you use whisperX for loading? Is there method especially efficient?

davidas1 · 2023-08-03T13:46:44Z

I'm using pyannote as part of the whisperX pipeline, which already loads the audio so I just reuse it for pyannote.
I didn't check how exactly the audio is loaded in pyannote, but in whisperX it is loaded and resampled using FFMPEG, which should be very efficient.

kaixxx · 2023-08-16T15:06:58Z

I've tried this using the recommended way of processing an audio from memory (described here: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb):
import torchaudio
waveform, sample_rate = torchaudio.load(audio_file_path)
audio_in_memory = {"waveform": waveform, "sample_rate": sample_rate}
diarization = pipeline(audio_in_memory, hook=hook)

>> Sadly, I can see no difference in processing time.

torchaudio.load is also returning the audio in float32 dtype (the same as whisperX.load). So I dont't see a reason why loading with whisperX should result in such a massive improvement in processing time.

davidas1 · 2023-08-17T14:54:30Z

@kaixxx i didn't say that whisperx.load is faster compared to another method of using an in-memory audio buffer.
the point is that sending a filepath is very slow for some reason..

kaixxx · 2023-08-17T15:39:44Z

@davidas1 Sorry, I was a bit unclear. What I wanted to say was:
I can see no difference in processing time compared to using the filepath directly (pipeline(audio_file_path))

sorgfresser · 2023-08-25T09:34:55Z

I'm curious: where exactly is the slow down in the crop method? I was unable to track it down. The crop method should seek the corresponding audio parts, i don't get why it takes longer than loading first + diarization afterwards. If it is somehow related to the torchaudio.load() part: what torchaudio backend are you using? I found something strange in the sox_io source code, if I get it right we might download multiple times if we're providing an url. Additionally using something else than .wav could break it.

Torchaudio is moving it's audio io to ffmpeg soonish, if it's related to torchaudio.load it might be fixed with it.

stale · 2024-02-22T00:28:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

davidas1 mentioned this issue Aug 1, 2023

Potential big speedup of speaker diarization stage m-bain/whisperX#399

Closed

stale bot added the wontfix label Feb 22, 2024

stale bot closed this as completed Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge acceleration for speaker-diarization pipeline #1442

Huge acceleration for speaker-diarization pipeline #1442

davidas1 commented Aug 1, 2023

github-actions bot commented Aug 1, 2023

remic33 commented Aug 3, 2023

davidas1 commented Aug 3, 2023

kaixxx commented Aug 16, 2023

davidas1 commented Aug 17, 2023

kaixxx commented Aug 17, 2023

sorgfresser commented Aug 25, 2023 •

edited

Loading

stale bot commented Feb 22, 2024

Huge acceleration for speaker-diarization pipeline #1442

Huge acceleration for speaker-diarization pipeline #1442

Comments

davidas1 commented Aug 1, 2023

github-actions bot commented Aug 1, 2023

remic33 commented Aug 3, 2023

davidas1 commented Aug 3, 2023

kaixxx commented Aug 16, 2023

davidas1 commented Aug 17, 2023

kaixxx commented Aug 17, 2023

sorgfresser commented Aug 25, 2023 • edited Loading

stale bot commented Feb 22, 2024

sorgfresser commented Aug 25, 2023 •

edited

Loading