pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 #499

kaihe-stori · 2023-09-29T16:15:05Z

Line 22 in 07fafa3

    
           ] + ["pyannote.audio @ git+https://github.com/pyannote/pyannote-audio@db24eb6c60a26804b1f07a6c2b39055716beb852"],

Currently pyannote.audio is pinned to 3.0.0, but it has been reported that it performed slower because the embeddings model ran on CPU. As a result a new release 3.0.1 fixed it by replacingonnxruntime with onnxruntime-gpu.

It makes sense for whisperX to update pyannote.audio to 3.0.1, however, there is a conflict with faster_whisper on onnxruntime, as discussed here. Until it is resolved on the faster_whisper side, installing both will end up onnxruntime still in CPU mode and thus slower performance.

My current workaround is running the following commands post installation

pip install pyannote.audio==3.0.1
pip uninstall onnxruntime
pip install --force-reinstall onnxruntime-gpu

Alternative, use the old 2.1 model.

model = whisperx.DiarizationPipeline(model_name='pyannote/speaker-diarization@2.1', use_auth_token=YOUR_AUTH_TOKEN, device='cuda')

The text was updated successfully, but these errors were encountered:

sam1am · 2023-09-30T01:30:06Z

Brilliant, thank you. I thought I was crazy. Your fix worked for me. Went from around 8 minutes to 30 second for diarization on 2 speaker ~45minute audio file.

Asofwar · 2023-10-02T13:26:36Z

thank you!

Sing303 · 2023-10-02T16:43:21Z

Oh, man, thank you. I thought I was going crazy too, not long ago everything was working fast and now it's very very slow ....

9throok · 2023-10-03T11:54:06Z

hey @kaihe-stori, I tried to use your approach, but I still get the error for onnxruntime

pkg_resources.DistributionNotFound: The 'onnxruntime<2,>=1.14' distribution was not found and is required by faster-whisper

Any suggestions how can I deal with that?

m-bain · 2023-10-05T21:12:27Z

Great find @kaihe-stori, you could send PR to README if you want

kaihe-stori · 2023-10-05T22:02:02Z

hey @kaihe-stori, I tried to use your approach, but I still get the error for onnxruntime
pkg_resources.DistributionNotFound: The 'onnxruntime<2,>=1.14' distribution was not found and is required by faster-whisper
Any suggestions how can I deal with that?

Did you get the error during package installation or running code?

Run my commands only after you install whisperx (thus faster-whisper).

kaihe-stori · 2023-10-05T22:05:47Z

Great find @kaihe-stori, you could send PR to README if you want

Sure, happy to do it. Is "Limitations" a good section to put this in?

m-bain · 2023-10-05T22:15:01Z

I think setup is. best! https://github.com/m-bain/whisperX#setup-%EF%B8%8F

9throok · 2023-10-12T07:51:45Z

hey @kaihe-stori, I tried to use your approach, but I still get the error for onnxruntime
pkg_resources.DistributionNotFound: The 'onnxruntime<2,>=1.14' distribution was not found and is required by faster-whisper
Any suggestions how can I deal with that?
Did you get the error during package installation or running code?

Run my commands only after you install whisperx (thus faster-whisper).

thanks.. that worked :)

remic33 · 2023-10-12T09:57:57Z

Thanks ! I was investing it and did not get what was happening.
Trying your solution now.
Could we maybe add somethin in the setup file to correct it?

dylorr · 2023-10-20T04:13:54Z

Hey! Noticed this problem while executing and monitoring GPU usage. Tried this approach and still am getting 0% GPU usages when it comes to the diarization step - can you further explain at which point you are executing the 3 lines of code you mentioned?

pip install pyannote.audio==3.0.1 pip uninstall onnxruntime pip install --force-reinstall onnxruntime-gpu

I was able to get the old 2.1 model working fine w/ the GPU but for whatever reason, using the workaround for the newer model isn't working. Ty for bringing light to this issue!

Context/TLDR:

Mac using Google Colab w/ GPU
I figured adding these install/uninstalls after pulling an updated install of whisperx would do the trick, but no luck
Assume that I'm using the Python starter code from the readme
any way related to this issue (Sadly onnxruntime-gpu dependency kills Mac support pyannote/pyannote-audio#1505)

7k50 · 2023-10-20T11:58:56Z

Off-the-cuff question, but is there any reason to believe that the newer "3.0" versions of pyannote segmentation/diarization are worse than "2.1" for WhisperX diarization quality (not speed, in this case)? I just made a couple of transcripts with 3.0 for the first time, and I wasn't happy with the quality of the speaker segmentation and thus speaker recognition. I've been quite pleased in the past with the previous models with WhisperX. Just anecdotal, I haven't investigated this.

remic33 · 2023-10-20T14:07:06Z

Off-the-cuff question, but is there any reason to believe that the newer "3.0" versions of pyannote segmentation/diarization are worse than "2.1" for WhisperX diarization quality (not speed, in this case)? I just made a couple of transcripts with 3.0 for the first time, and I wasn't happy with the quality of the speaker segmentation and thus speaker recognition. I've been quite pleased in the past with the previous models with WhisperX. Just anecdotal, I haven't investigated this.

It should not, pyannote 3.0 integrate a new model that is supposed to get better results especially on overlapping discussions. You can see their results on public database on the release note here.
But, it is research oriented. It is possible that your dataset or your data are not like the , and you can have worst result.
Another thing could be a problem on the whisperX process.
Too be sure you should compare manually. (And... it is not the easiest thing to do)

On the other subject, uninstall reinstall do not work for me either. And that is a big problem.

kaihe-stori · 2023-10-20T14:13:00Z

Mac using Google Colab w/ GPU

I figured adding these install/uninstalls after pulling an updated install of whisperx would do the trick, but no luck

Yes, the workaround is intended to be right after installing whisperx to alter the dependencies, beforing running your application code.

Not sure about Mac, as my experience is on AWS (a GPU instance with pytorch+cuda container). Sorry.

dylorr · 2023-10-20T15:01:40Z

Mac using Google Colab w/ GPU

I figured adding these install/uninstalls after pulling an updated install of whisperx would do the trick, but no luck

Yes, the workaround is intended to be right after installing whisperx to alter the dependencies, beforing running your application code.

Not sure about Mac, as my experience is on AWS (a GPU instance with pytorch+cuda container). Sorry.

np, ty for the response! will be setting up a proper environment shortly, figured I would see if I was understanding corectly 👍

justinwlin · 2023-10-24T19:40:44Z

:( The current setup.py change breaks Mac, which sucks. Not sure I understood / I tried solution people pointed out about the slower pyannote, but no matter what I do there is no onnxruntime available as @dylorr pointed out for mac. So I don't think the solution really helps Mac users out.

Seems like this has to do with the pyanote dependency and not really whisperx, so I created a docker container following the advice for the pyannote issue where it said to use a 3.0.0 or lower, by cloning the whisperx and modifying the setup.py.

For those on Mac, here is the repository / docker image if you want to use it and just have a plug-and-go solution.

I was trying to get something working in the setup.py to detect the environment but didn't try too hard / was running into a bit of issues, so just decided to hard-code it for now to 3.0.0 and maybe that detect OS can be a future thing.

Link to the modified repo:
https://github.com/justinwlin/WhisperXMac

Docker image:
https://hub.docker.com/layers/justinwlin/whisperxmac/1.0/images/sha256-3e56473cc25de95269955ef1a9c596ea7e62a9b83da682cf9bc3e91abe5d8798?context=repo

I didn't test the Docker image too hard, I just made sure the below worked:

docker run --rm -it -v $(pwd):/app justinwlin/whisperxmac:1.0 /bin/bash
whisperx input.mp3 --compute_type int8 --language en

also just made sure that import whisperx work for python, and since the cli in my docker container is just a bash shell passing the arguments to the python function, i assume it works for any python scripts too.

remic33 · 2023-10-26T12:43:49Z

Still no solution to make it work directly on onnx gpu without having to uninstall force reinstall ?
Going to pyannote 3.1 is an easy move but it do not work properly with the gpu still.

grazder · 2023-10-30T11:00:48Z

For me it's not working too, even after update to 3.0.1 and removing onnxruntime from dependencies.
My env dependencies looks like this:

[tool.poetry.dependencies]
python = "~3.8"
loguru = "^0.7.2"
torch = { version = "^2.1.0", source = "torch_cuda118" }
torchaudio = { version = "^2.1.0", source = "torch_cuda118" }
torchvision = { version = "^0.16.0", source = "torch_cuda118" }
"pyannote.audio" = "3.0.1"

[[tool.poetry.source]]
name = "torch_cuda118"
url = "https://download.pytorch.org/whl/cu118/"
priority = "supplemental"

I've got cuda 11.8 and cudnn 8.7.0.84

remic33 · 2023-10-30T13:21:39Z

It's not working for me either with GPU. But if I remember well, PyTorch 2.1 requires Cuda 12+ to work.

grazder · 2023-10-30T14:12:46Z

It has builds for lower cuda versions. Im using 11.8 here

EricOliveira90 · 2023-11-05T16:22:39Z

hey @kaihe-stori, I tried to use your approach, but I still get the error for onnxruntime
pkg_resources.DistributionNotFound: The 'onnxruntime<2,>=1.14' distribution was not found and is required by faster-whisper
Any suggestions how can I deal with that?
Did you get the error during package installation or running code?
Run my commands only after you install whisperx (thus faster-whisper).
thanks.. that worked :)

How did you fixed it? I ran the commands only after installing whisperx and this error message popped up during installation and running code. It stop showing the error after reinstalling onnxruntime but the performance issues continue happening.

remic33 · 2023-11-07T10:54:24Z

Same

grazder · 2023-11-08T06:58:10Z

Here is some changes that can fix pyannotate performance, you can try it

(already merged)
pyannote/pyannote-audio#1529

(i described what helped me)
pyannote/pyannote-audio#1523

MyraBaba · 2023-11-13T18:18:38Z

is this problem solved ?

really hectic.

how we can go back to previosu version of the whisperx which has running without problem on the gpu

remic33 · 2023-11-15T15:00:40Z

It seems it is not solved. Right now the all lib is kind of useless without gpu acceleration on diarization. I did not find a way to solve it for now... @m-bain Submited changes but it did not change that it is not using the gpu for now.
We may be in a dead end

remic33 · 2023-11-16T15:53:50Z

Pyannote just released 3.1 without onnx.
It should work fine for us. I'll work on it tomorrow if nobody did before.
Release note : https://github.com/pyannote/pyannote-audio/releases/tag/3.1.0

eplinux · 2024-04-11T15:19:14Z

Unfortunately, this issue seems to be back for me. I had no problems whatsoever but then upgraded to the latest version this week and now diarization takes ages to complete with high CPU and RAM load. Maybe this is related to this issue?

grazder · 2024-04-11T15:21:46Z

Try increasing OMP_NUM_THREADS, it works for me

eplinux · 2024-04-11T15:33:07Z

Try increasing OMP_NUM_THREADS, it works for me

thanks fo the suggestion. to what value did you increase it? currently trying 4

eplinux · 2024-04-11T15:38:12Z

it's just weird because it seemed to run seemlessly and complete within a few minutes before - now it's pretty much stuck - using similar files.

danxvv · 2024-04-13T08:17:33Z

Same here, two days ago it became too slow.

remic33 · 2024-04-15T07:52:50Z

That's weird, the only thing that did change is the usage of torchaudio>=2.2
What is the version of torchaudio in your project?

eplinux · 2024-04-15T14:37:34Z

That's weird, the only thing that did change is the usage of torchaudio>=2.2 What is the version of torchaudio in your project?

So after discovering that it didn't work, I just cleaned my whole environment and started from scratch - following the instructions in the readme file:

pip show torchaudio Name: torchaudio
Version: 2.0.1+cu118

eplinux · 2024-04-15T16:30:01Z

That's weird, the only thing that did change is the usage of torchaudio>=2.2 What is the version of torchaudio in your project?

Awesome, thanks for the hint! It seemed like I fixed it by reinstalling torchaudio et al., so now my GPU is running at full capacity again during diarization. It still takes very long, though. Is the new diarization model so much more resource hungry? Also, maybe we should edit the readme, then? I ran the following to reinstall torch-2.0.0+cu118 & torchaudio-2.0.1+cu118 ((Windows, CUDA 11.8):

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

remic33 · 2024-04-16T15:13:21Z

We can upgrade the requirment.txt file to upgrade to avoid that.

astubbs · 2024-09-17T03:57:32Z

Fresh install, diarization painfully slow (1 hour for for 1h12m of audio on M2 pro). Not sure how to debug this further...

pipx list
venvs are in /Users/astubbs/.local/pipx/venvs
apps are exposed on your $PATH at /Users/astubbs/.local/bin
manual pages are exposed at /Users/astubbs/.local/share/man
<snip>
   package whisperx 3.1.1, installed using Python 3.12.6
    - whisperx

kaihe-stori mentioned this issue Sep 29, 2023

Huggingface Authentication Issues probably related to pyannote #498

Closed

kaihe-stori mentioned this issue Oct 12, 2023

Add a special note about Speaker-Diarization-3.0 in readme #521

Merged

justinwlin mentioned this issue Oct 24, 2023

pyannote version 3.0.1 breaks MacOS compatibility #540

Open

sam1am mentioned this issue Oct 25, 2023

Diarization process in Whisperx does not utilize GPU #542

Open

jim60105 mentioned this issue Nov 14, 2023

is diarization using gpu or cpu ? jim60105/docker-whisperX#18

Closed

remic33 mentioned this issue Nov 17, 2023

Update pyannote to 3.1.0 #586

Merged

m-bain closed this as completed in #586 Nov 17, 2023

jimmy6DOF mentioned this issue Dec 26, 2023

Update pyannote to v3.1.1 to fix a diarization problem (and diarize.py) #646

Merged

Lucidology mentioned this issue May 1, 2024

404 Client Error for speaker-embedding.onnx #666

Open

pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 #499

pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 #499

Comments

kaihe-stori commented Sep 29, 2023 • edited Loading

sam1am commented Sep 30, 2023 • edited Loading

Asofwar commented Oct 2, 2023

Sing303 commented Oct 2, 2023

9throok commented Oct 3, 2023

m-bain commented Oct 5, 2023

kaihe-stori commented Oct 5, 2023

kaihe-stori commented Oct 5, 2023

m-bain commented Oct 5, 2023

9throok commented Oct 12, 2023

remic33 commented Oct 12, 2023

dylorr commented Oct 20, 2023 • edited Loading

7k50 commented Oct 20, 2023 • edited Loading

remic33 commented Oct 20, 2023

kaihe-stori commented Oct 20, 2023

dylorr commented Oct 20, 2023

justinwlin commented Oct 24, 2023 • edited Loading

remic33 commented Oct 26, 2023

grazder commented Oct 30, 2023

remic33 commented Oct 30, 2023

grazder commented Oct 30, 2023

EricOliveira90 commented Nov 5, 2023

remic33 commented Nov 7, 2023

grazder commented Nov 8, 2023

MyraBaba commented Nov 13, 2023

remic33 commented Nov 15, 2023

remic33 commented Nov 16, 2023

eplinux commented Apr 11, 2024

grazder commented Apr 11, 2024

eplinux commented Apr 11, 2024

eplinux commented Apr 11, 2024

danxvv commented Apr 13, 2024

remic33 commented Apr 15, 2024

eplinux commented Apr 15, 2024

eplinux commented Apr 15, 2024

remic33 commented Apr 16, 2024

astubbs commented Sep 17, 2024 • edited Loading

kaihe-stori commented Sep 29, 2023 •

edited

Loading

sam1am commented Sep 30, 2023 •

edited

Loading

dylorr commented Oct 20, 2023 •

edited

Loading

7k50 commented Oct 20, 2023 •

edited

Loading

justinwlin commented Oct 24, 2023 •

edited

Loading

astubbs commented Sep 17, 2024 •

edited

Loading