pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481

hbredin · 2023-09-29T06:16:26Z

I notice that 'pyannote/speaker-diarization-3.0' is quite slower than 'pyannote/speaker-diarization', even with the GPU fix. Does anyone observe the same phenomenon? I will get some sample benchmark code when I have time.

Originally posted by @gau-nernst in #1475 (comment)

The text was updated successfully, but these errors were encountered:

hbredin · 2023-09-29T06:21:57Z

There was indeed an issue related to onnxruntime in 3.0.0 release.
3.0.1 fixes it by relying on onnxruntime-gpu instead.
Make sure that you do not rely on another library that may overwrite it back to onnxruntime (with no GPU support).

I benchmarked both myself and got similar speed between 3.0 and 2.1 (even slightly faster with 3.0). For instance, on DIHARD 3, v3.0 is 43x faster than realtime while v2.1 is only 40x faster.

Benchmark	Duration	v2.1	v3.0
AISHELL-4	12h43m	38m41s (19x)	37m15s (20x)
AliMeeting (channel 1)	10h46m	19m32s (33x)	18m09s (35x)
AMI (IHM)	09h03m	12m36s (43x)	11m35s (46x)
AMI (SDM)	09h03m	12m21s (44x)	12m48s (42x)
AVA-AVD	04h30m	05m11s (52x)	05m04s (53x)
DIHARD 3 (full)	32h57m	49m12s (40x)	45m43s (43x)
MSDWild	09h49m	12m27s (47x)	11m08s (53x)
VoxConverse (v0.3)	43h32m	58m47s (44x)	50m35s (51x)

Duration = total duration of audio in benchmark.

thomasmol · 2023-09-29T07:04:30Z

I am noticing the same with v3.0.1.

Diarization inference takes about 10 minutes for a 25 minute audio file, running on an A40 (hosted at Replicate).

These are the logs of my pipeline:

Starting transcribing
Finished with transcribing, took 84.637 seconds
Finished with diarization, took 592.24 seconds
Finished with merging, took 0.0034907 seconds
Finished with cleaning, took 0.0009923 seconds
Processing time: 676.88 seconds
Finished with inference

So it takes about 1.5 minutes for the transcribing (with faster_whisper) and almost 10 minutes for running pyannote.
I believe diarization took about 30 seconds with pyannote v2.1 with this same file.

This is how I am setting it up:

class Predictor(BasePredictor):

    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        model_name = "large-v2"
        self.model = WhisperModel(
            model_name,
            device="cuda" if torch.cuda.is_available() else "cpu",
            compute_type="float16")
        self.diarization_model = Pipeline.from_pretrained(
            "pyannote/speaker-diarization-3.0", use_auth_token="TOKEN").to(torch.device("cuda"))

And calling the pipeline:

  def predict() -> Output:
    # other code
    diarization = self.diarization_model(
                audio_file_wav, num_speakers=num_speakers)
    # other code

File I am using is https://thomasmol.com/recordings/aiopensource.mp3

Maybe its still not loaded to the GPU correctly?

hbredin · 2023-09-29T07:06:55Z

Related maybe? SYSTRAN/faster-whisper#493

guilhermehge · 2023-09-29T12:47:26Z

@thomasmol when you start the script, try doing this:

import onnxruntime as ort

print(ort.get_device())

If it returns CPU, it is probably due to you using faster_whisper which has onnxruntime in its requirements, when you install onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), they get conflicted and always default to CPU.

You should change the requirements of faster_whisper to use onnxruntime-gpu, it will not affect faster_whisper's behaviour, I've tested it and it works fine, since it only uses onnx for the Silero-VAD.

You can also do watch -n 1 nvidia-smi in bash to check if the GPU is being used while the diarization pipeline is running. If it stays at 0%, your GPU is not being used.

In addition, to avoid conflicts with .mp3 files, you must use torchaudio to load the file into memory

import torchaudio

waveform, sample_rate = torchaudio.load("audio_file.mp3")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

thomasmol · 2023-09-29T14:47:05Z

You should change the requirements of faster_whisper to use onnxruntime-gpu, it will not affect faster_whisper's behaviour, I've tested it and it works fine, since it only uses onnx for the Silero-VAD.

Thanks @guilhermehge, that might be the problem, but how do I force faster-whisper to use onnxruntime-gpu? I am building a with a dockerfile, which will just install faster-whisper and all of its required dependencies (and therefore install onnxruntime). See SYSTRAN/faster-whisper#493 (comment)

guilhermehge · 2023-09-29T15:05:15Z

Installing it and uninstalling it afterwards is not recommended, so, I believe you should clone faster_whisper's repository to your machine, change the requirements, copy the directory when running the application into the container and build the package there, then install it. I haven't tested this yet, but it might work.

docker run --rm -v $(pwd):/home/app -it <image> /bin/bash to copy the directory you're in to the container.
python setup.py sdist bdist_wheel to build the new altered faster_whisper package
cd dist; pip install <.whl file> to install the new package

OR, you can just build the package in your local machine and install it inside the container, you will just need to alter the order of the commands above.

Edit: I just thought of a better way of doing this. You clone faster_whisper's repo to your machine, change the requirements, build the package with python setup.py sdist bdist_wheel. Then, you just copy this to your container via dockerfile and run a pip install in that same dockerfile, done, faster whisper installed with onnxruntime-gpu.

What I did to test if it works was (even though not recommended) uninstall onnxruntime and force reinstall onnxruntime-gpu. You can also try that for a initial step, if you may.

pip uninstall onnxruntime
pip install --force-reinstall onnxruntime-gpu

Saccarab · 2023-09-29T17:56:06Z

I can also +1 this
what is the torch and CUDA versions are you guys using?

thomasmol · 2023-09-30T09:32:46Z

Okay the issue seems to be just on faster-whispers end, and the issue is indeed regarding onnxruntime and onnxruntime-gpu. I forked faster-whisper and updated so it uses the gpu for VAD, that fixed the issue for me with pyannote running slow (its back to 20 seconds again for a 25 minute audio file instead of 10minutes). You can try it by importing git+https://github.com/thomasmol/faster-whisper.git@master, this is a fork with the updated requirements. I also created a pull request here SYSTRAN/faster-whisper#499.

I think this issue can be closed since pyannote 3.0.1 runs as fast (maybe faster?) as 2.1, just make sure you only have onnxruntime-gpu installed and not onnxruntime!

guilhermehge · 2023-10-03T16:11:17Z

So, the fix I mentioned here worked properly. Just clone the repo, change the requirements, run the setup.py and install the .whl file, everything should run as normal.

Saccarab · 2023-10-06T00:46:49Z

I guess kind of a separate issue but assuming the pipeline depends on onxxgpu wouldn't this cause compatibility issues for CUDA 12+ since onnx-gpu needs cuda 11

gau-nernst · 2023-10-12T09:48:31Z

Sorry for the late update from my side. After upgrading to 3.0.1, inference speed is fast as reported by @hbredin. Thank you for the fix. You can close the issue.

hbredin · 2023-10-12T10:02:32Z

Awesome. Thanks for the feedback!

hbredin · 2023-11-09T12:04:49Z

FYI: #1537

louismorgner · 2023-11-15T17:46:38Z

Also running into this issue. Things I was using/trying:

Approach 1: Replicate A40s

Used Cuda 11.8 & pyannote-audio==3.0.1 with the same code snippet suggested by @thomasmol and ensuring that onnxruntime-gpu is used. The moment the model boots there is this error:
The NVIDIA driver on your system is too old (found version 11080). . It works fine on a T4 tho suprisingly.

Approach 2: Custom docker image with runpod

Used Cuda 12 & pyannote-audio==3.0.1. Model works & boots without issues. However, the GPU is not used despite only onnxruntime-gpu being installed via pip and explicitly forcing GPU usage like so dz_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0", use_auth_token="YOUR_TOKEN").to(torch.device("cuda"))

I stripped away all other packages to try to isolate the issue but unfortunately cannot reproduce the performance claimed above. Maybe someone has some hints?

hbredin · 2023-11-16T13:03:37Z

Latest version no longer relies on ONNX runtime.
Please update to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 (and open new issues if needed).

filmo · 2023-11-27T00:03:22Z

I found in my case that doing as @guilhermehge suggested with regards to a forced refresh worked for me

pip uninstall onnxruntime
pip install --force-reinstall onnxruntime-gpu

It gave me an ugly warning but my set up seems to work:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
faster-whisper 0.9.0 requires onnxruntime<2,>=1.14, which is not installed.ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
faster-whisper 0.9.0 requires onnxruntime<2,>=1.14, which is not installed.
triton 2.0.0 requires cmake, which is not installed.
triton 2.0.0 requires lit, which is not installed.
mysql-connector-python 8.2.0 requires protobuf<=4.21.12,>=4.21.1, but you have protobuf 4.25.1 which is incompatible.
nemo-toolkit 1.21.0 requires numpy<1.24,>=1.22, but you have numpy 1.26.2 which is incompatible.
tensorboard 2.15.1 requires protobuf<4.24,>=3.19.6, but you have protobuf 4.25.1 which is incompatible.

triton 2.0.0 requires cmake, which is not installed.
triton 2.0.0 requires lit, which is not installed.
mysql-connector-python 8.2.0 requires protobuf<=4.21.12,>=4.21.1, but you have protobuf 4.25.1 which is incompatible.
nemo-toolkit 1.21.0 requires numpy<1.24,>=1.22, but you have numpy 1.26.2 which is incompatible.
tensorboard 2.15.1 requires protobuf<4.24,>=3.19.6, but you have protobuf 4.25.1 which is incompatible.

Went from 310 seconds (i5 13600k CPU) for a test file back down to 31 seconds on GPU (RTX-3090)

hbredin mentioned this issue Sep 29, 2023

pipeline.to(torch.device("cuda")) not working on T4 Tesla GPU (pyannote==3.0.0) #1475

Closed

pyannote deleted a comment from github-actions bot Sep 29, 2023

kaihe-stori mentioned this issue Sep 29, 2023

pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 m-bain/whisperX#499

Closed

thomasmol mentioned this issue Sep 30, 2023

Change onnxruntime requirement to gpu version and update VAD to run on gpu SYSTRAN/faster-whisper#499

Open

hbredin closed this as completed Oct 12, 2023

hbredin mentioned this issue Nov 9, 2023

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

Closed

askiefer mentioned this issue Jan 23, 2024

Pyannote.audio 3.1.1 and speaker-diarization 3.1 slower than 3.0 on CPU #1626

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481

pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481

hbredin commented Sep 29, 2023

hbredin commented Sep 29, 2023

thomasmol commented Sep 29, 2023

hbredin commented Sep 29, 2023

guilhermehge commented Sep 29, 2023 •

edited

Loading

thomasmol commented Sep 29, 2023

guilhermehge commented Sep 29, 2023 •

edited

Loading

Saccarab commented Sep 29, 2023

thomasmol commented Sep 30, 2023

guilhermehge commented Oct 3, 2023

Saccarab commented Oct 6, 2023

gau-nernst commented Oct 12, 2023

hbredin commented Oct 12, 2023

hbredin commented Nov 9, 2023

louismorgner commented Nov 15, 2023

hbredin commented Nov 16, 2023

filmo commented Nov 27, 2023

pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481

pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481

Comments

hbredin commented Sep 29, 2023

hbredin commented Sep 29, 2023

thomasmol commented Sep 29, 2023

hbredin commented Sep 29, 2023

guilhermehge commented Sep 29, 2023 • edited Loading

thomasmol commented Sep 29, 2023

guilhermehge commented Sep 29, 2023 • edited Loading

Saccarab commented Sep 29, 2023

thomasmol commented Sep 30, 2023

guilhermehge commented Oct 3, 2023

Saccarab commented Oct 6, 2023

gau-nernst commented Oct 12, 2023

hbredin commented Oct 12, 2023

hbredin commented Nov 9, 2023

louismorgner commented Nov 15, 2023

Approach 1: Replicate A40s

Approach 2: Custom docker image with runpod

hbredin commented Nov 16, 2023

filmo commented Nov 27, 2023

guilhermehge commented Sep 29, 2023 •

edited

Loading

guilhermehge commented Sep 29, 2023 •

edited

Loading