Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481

Closed
hbredin opened this issue Sep 29, 2023 · 16 comments
Closed

Comments

@hbredin
Copy link
Member

hbredin commented Sep 29, 2023

I notice that 'pyannote/speaker-diarization-3.0' is quite slower than 'pyannote/speaker-diarization', even with the GPU fix. Does anyone observe the same phenomenon? I will get some sample benchmark code when I have time.

Originally posted by @gau-nernst in #1475 (comment)

@hbredin
Copy link
Member Author

hbredin commented Sep 29, 2023

There was indeed an issue related to onnxruntime in 3.0.0 release.
3.0.1 fixes it by relying on onnxruntime-gpu instead.
Make sure that you do not rely on another library that may overwrite it back to onnxruntime (with no GPU support).

I benchmarked both myself and got similar speed between 3.0 and 2.1 (even slightly faster with 3.0). For instance, on DIHARD 3, v3.0 is 43x faster than realtime while v2.1 is only 40x faster.

Benchmark Duration v2.1 v3.0
AISHELL-4 12h43m 38m41s (19x) 37m15s (20x)
AliMeeting (channel 1) 10h46m 19m32s (33x) 18m09s (35x)
AMI (IHM) 09h03m 12m36s (43x) 11m35s (46x)
AMI (SDM) 09h03m 12m21s (44x) 12m48s (42x)
AVA-AVD 04h30m 05m11s (52x) 05m04s (53x)
DIHARD 3 (full) 32h57m 49m12s (40x) 45m43s (43x)
MSDWild 09h49m 12m27s (47x) 11m08s (53x)
VoxConverse (v0.3) 43h32m 58m47s (44x) 50m35s (51x)

Duration = total duration of audio in benchmark.

@thomasmol
Copy link

I am noticing the same with v3.0.1.

Diarization inference takes about 10 minutes for a 25 minute audio file, running on an A40 (hosted at Replicate).

These are the logs of my pipeline:

Starting transcribing
Finished with transcribing, took 84.637 seconds
Finished with diarization, took 592.24 seconds
Finished with merging, took 0.0034907 seconds
Finished with cleaning, took 0.0009923 seconds
Processing time: 676.88 seconds
Finished with inference

So it takes about 1.5 minutes for the transcribing (with faster_whisper) and almost 10 minutes for running pyannote.
I believe diarization took about 30 seconds with pyannote v2.1 with this same file.

This is how I am setting it up:

class Predictor(BasePredictor):

    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        model_name = "large-v2"
        self.model = WhisperModel(
            model_name,
            device="cuda" if torch.cuda.is_available() else "cpu",
            compute_type="float16")
        self.diarization_model = Pipeline.from_pretrained(
            "pyannote/speaker-diarization-3.0", use_auth_token="TOKEN").to(torch.device("cuda"))

And calling the pipeline:

  def predict() -> Output:
    # other code
    diarization = self.diarization_model(
                audio_file_wav, num_speakers=num_speakers)
    # other code

File I am using is https://thomasmol.com/recordings/aiopensource.mp3

Maybe its still not loaded to the GPU correctly?

@hbredin
Copy link
Member Author

hbredin commented Sep 29, 2023

Related maybe? SYSTRAN/faster-whisper#493

@guilhermehge
Copy link

guilhermehge commented Sep 29, 2023

@thomasmol when you start the script, try doing this:

import onnxruntime as ort

print(ort.get_device())

If it returns CPU, it is probably due to you using faster_whisper which has onnxruntime in its requirements, when you install onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), they get conflicted and always default to CPU.

You should change the requirements of faster_whisper to use onnxruntime-gpu, it will not affect faster_whisper's behaviour, I've tested it and it works fine, since it only uses onnx for the Silero-VAD.

You can also do watch -n 1 nvidia-smi in bash to check if the GPU is being used while the diarization pipeline is running. If it stays at 0%, your GPU is not being used.

In addition, to avoid conflicts with .mp3 files, you must use torchaudio to load the file into memory

import torchaudio

waveform, sample_rate = torchaudio.load("audio_file.mp3")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

@thomasmol
Copy link

You should change the requirements of faster_whisper to use onnxruntime-gpu, it will not affect faster_whisper's behaviour, I've tested it and it works fine, since it only uses onnx for the Silero-VAD.

Thanks @guilhermehge, that might be the problem, but how do I force faster-whisper to use onnxruntime-gpu? I am building a with a dockerfile, which will just install faster-whisper and all of its required dependencies (and therefore install onnxruntime). See SYSTRAN/faster-whisper#493 (comment)

@guilhermehge
Copy link

guilhermehge commented Sep 29, 2023

Installing it and uninstalling it afterwards is not recommended, so, I believe you should clone faster_whisper's repository to your machine, change the requirements, copy the directory when running the application into the container and build the package there, then install it. I haven't tested this yet, but it might work.

docker run --rm -v $(pwd):/home/app -it <image> /bin/bash to copy the directory you're in to the container.
python setup.py sdist bdist_wheel to build the new altered faster_whisper package
cd dist; pip install <.whl file> to install the new package

OR, you can just build the package in your local machine and install it inside the container, you will just need to alter the order of the commands above.

Edit: I just thought of a better way of doing this. You clone faster_whisper's repo to your machine, change the requirements, build the package with python setup.py sdist bdist_wheel. Then, you just copy this to your container via dockerfile and run a pip install in that same dockerfile, done, faster whisper installed with onnxruntime-gpu.

What I did to test if it works was (even though not recommended) uninstall onnxruntime and force reinstall onnxruntime-gpu. You can also try that for a initial step, if you may.

pip uninstall onnxruntime
pip install --force-reinstall onnxruntime-gpu

@Saccarab
Copy link

I can also +1 this
what is the torch and CUDA versions are you guys using?

@thomasmol
Copy link

Okay the issue seems to be just on faster-whispers end, and the issue is indeed regarding onnxruntime and onnxruntime-gpu. I forked faster-whisper and updated so it uses the gpu for VAD, that fixed the issue for me with pyannote running slow (its back to 20 seconds again for a 25 minute audio file instead of 10minutes). You can try it by importing git+https://github.com/thomasmol/faster-whisper.git@master, this is a fork with the updated requirements. I also created a pull request here SYSTRAN/faster-whisper#499.

I think this issue can be closed since pyannote 3.0.1 runs as fast (maybe faster?) as 2.1, just make sure you only have onnxruntime-gpu installed and not onnxruntime!

@guilhermehge
Copy link

So, the fix I mentioned here worked properly. Just clone the repo, change the requirements, run the setup.py and install the .whl file, everything should run as normal.

@Saccarab
Copy link

Saccarab commented Oct 6, 2023

I guess kind of a separate issue but assuming the pipeline depends on onxxgpu wouldn't this cause compatibility issues for CUDA 12+ since onnx-gpu needs cuda 11

@gau-nernst
Copy link

Sorry for the late update from my side. After upgrading to 3.0.1, inference speed is fast as reported by @hbredin. Thank you for the fix. You can close the issue.

@hbredin
Copy link
Member Author

hbredin commented Oct 12, 2023

Awesome. Thanks for the feedback!

@hbredin
Copy link
Member Author

hbredin commented Nov 9, 2023

FYI: #1537

@louismorgner
Copy link

Also running into this issue. Things I was using/trying:

Approach 1: Replicate A40s

Used Cuda 11.8 & pyannote-audio==3.0.1 with the same code snippet suggested by @thomasmol and ensuring that onnxruntime-gpu is used. The moment the model boots there is this error:
The NVIDIA driver on your system is too old (found version 11080). . It works fine on a T4 tho suprisingly.

Approach 2: Custom docker image with runpod

Used Cuda 12 & pyannote-audio==3.0.1. Model works & boots without issues. However, the GPU is not used despite only onnxruntime-gpu being installed via pip and explicitly forcing GPU usage like so dz_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0", use_auth_token="YOUR_TOKEN").to(torch.device("cuda"))

I stripped away all other packages to try to isolate the issue but unfortunately cannot reproduce the performance claimed above. Maybe someone has some hints?

@hbredin
Copy link
Member Author

hbredin commented Nov 16, 2023

Latest version no longer relies on ONNX runtime.
Please update to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 (and open new issues if needed).

@filmo
Copy link

filmo commented Nov 27, 2023

I found in my case that doing as @guilhermehge suggested with regards to a forced refresh worked for me

pip uninstall onnxruntime
pip install --force-reinstall onnxruntime-gpu

It gave me an ugly warning but my set up seems to work:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
faster-whisper 0.9.0 requires onnxruntime<2,>=1.14, which is not installed.ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
faster-whisper 0.9.0 requires onnxruntime<2,>=1.14, which is not installed.
triton 2.0.0 requires cmake, which is not installed.
triton 2.0.0 requires lit, which is not installed.
mysql-connector-python 8.2.0 requires protobuf<=4.21.12,>=4.21.1, but you have protobuf 4.25.1 which is incompatible.
nemo-toolkit 1.21.0 requires numpy<1.24,>=1.22, but you have numpy 1.26.2 which is incompatible.
tensorboard 2.15.1 requires protobuf<4.24,>=3.19.6, but you have protobuf 4.25.1 which is incompatible.

triton 2.0.0 requires cmake, which is not installed.
triton 2.0.0 requires lit, which is not installed.
mysql-connector-python 8.2.0 requires protobuf<=4.21.12,>=4.21.1, but you have protobuf 4.25.1 which is incompatible.
nemo-toolkit 1.21.0 requires numpy<1.24,>=1.22, but you have numpy 1.26.2 which is incompatible.
tensorboard 2.15.1 requires protobuf<4.24,>=3.19.6, but you have protobuf 4.25.1 which is incompatible.

Went from 310 seconds (i5 13600k CPU) for a test file back down to 31 seconds on GPU (RTX-3090)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants