fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support #1478

hbredin · 2023-09-27T11:05:30Z

I would love feedback from @doublex @guilhermehge @realfolkcode

realfolkcode

I have modified the Colab MRE from #1475 to install this fix. Here is the link. However, it made the wall time even worse: 6m 45s. It allocates memory into VRAM but for some reason the embedding model is extremely slow.

onnxruntime-gpu version: 1.16.0
CUDA version: 11.8

hbredin · 2023-09-27T12:52:03Z

Thanks a lot. That really helped me narrow things down.

I think the issue is that default onnxruntime behavior is to optimize the computation graph for each new input shape... and it happens that pyannote speaker diarization pipeline might use a lot of different shapes when processing a file.

microsoft/onnxruntime#6978

I just pushed a new commit. Can you try again?

guilhermehge · 2023-09-27T13:25:53Z

I was testing the solution in an isolated environment using the docker image nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04.

First I tested only adding onnxruntime-gpu==1.16.0 to my requirements along with pyannote.audio==3.0.0, but the time didn't change and the GPU was not used.

Second, I tried using only this commit and it still got the same time and no gpu being used.

What I want to point out here is that GPU IS NOT BEING USED even though onnxruntime-gpu is installed. Is it possible that we need to allocate the pipeline to the GPU in a different manner? Using the onnx library for instance?

Since you've pushed another commit, I'll build the image again and I'll comeback here with the results.

guilhermehge · 2023-09-27T14:10:18Z

@hbredin still not working with the new commit, I still get the same embedding time and the GPU is not being used. Here's a snippet of nvidia-smi while the embedding was at 40%

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000001:00:00.0 Off |                  Off |
| N/A   37C    P0    25W /  70W |   4863MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

I remind you that this is in an isolated environment using a docker container. The GPU works fine for the older diarization pipeline @2.1 and for the faster_whisper algorithm, but for the embedding model of the new pipeline, it does not work.

hbredin · 2023-09-27T14:22:42Z

That's weird because it solves the issue on Google Colab.
pyannote/speaker-diarization-3.0 is even slightly faster than pyannote/speaker-diarization-2.1.

hbredin · 2023-09-27T14:24:43Z

I have no knowledge of docker containers.

Could it be something related to an incompatibility between onnxruntime-gpu and docker/cuda images?

Are you 100% sure that it used the latest commit and no cache was used?

guilhermehge · 2023-09-27T14:47:20Z

Yes, I am sure, I rebuilt the image from scratch and checked if your commit was in fact in the code. I'll go check the colab with your solution. As you mentioned, the problem might be a dependency problem with the specific image that I'm using in docker. I'll check it out and let you know.

Just FYI, docker containers are isolated environment that only run what we need for the application that we're using. It should work for all cases, not only for colab.

Edit: Indeed it worked in my MRE colab. I'll check it out in my docker container to see if I can make it work.

guilhermehge · 2023-09-27T15:17:27Z

Update: I did a pip install --force-reinstall onnxruntime-gpu and it worked on the docker container, but when loading the pipeline, I got the following warning:

2023-09-27 15:15:43.525160455 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-09-27 15:15:43.525193659 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Do you know what it might be?

I believe I know what the problem is, I'm also installing faster_whisper in this environment, and faster_whisper's requirements are:

av==10.*
ctranslate2>=3.17,<4
huggingface_hub>=0.13
tokenizers>=0.13,<0.15
onnxruntime>=1.14,<2

So, it installs the onnxruntime and your library is installing onnxruntime-gpu. I'll see if I can sort this out.

I believe you may complete this pull request, this is a problem at my end and your code is working.

Question: Will you publish this alterations on the pypi package? Like a 3.0.1 version?

hbredin · 2023-09-27T15:35:26Z

Thanks. Will make a few more tests on my side and will then merge.

Question: Will you publish this alterations on the pypi package? Like a 3.0.1 version?

Yes, it will be released as 3.0.1.

guilhermehge · 2023-09-27T16:15:13Z

Just as a sidenote, I believe your model will be used with faster_whisper, and using onnxruntime-gpu may make it incompatible with that library. I am going to run a few more tests and let you know my results, but, so far, faster_whisper stopped working when I uninstalled onnxruntime to leave only onnxruntime-gpu. Do you believe there is another alternative? Like porting your model outside of onnx?

I posted an issue on faster_whisper's repo to address the situation.

hbredin · 2023-09-27T16:24:49Z

Do you believe there is another alternative? Like porting your model outside of onnx?

The point is that this is not my model. pyannote does not (yet) have a good speaker embedding model of its own. It uses external ones.

Working on it, though ;-)

guilhermehge · 2023-09-27T18:54:01Z

Oh, fair enough, but is it possible to convert it for not using onnx?

hbredin · 2023-09-28T06:54:11Z

Oh, fair enough, but is it possible to convert it for not using onnx?

Issue #1477 has already been opened related to this particular aspect.
Let's continue this discussion there. But, short answer: I don't (yet) know how to do that.

hbredin · 2023-09-28T20:27:06Z

I just released 3.0.1, including this fix.

guilhermehge · 2023-09-29T13:33:37Z

Awesome! Just checked pypi, great job! fyi, I believe it's not still showing on github's releases yet.

pyannote 3.0.0 has a bug where the new embedding model does not run on the GPU. This is fixed in version 3.0.1 via pyannote/pyannote-audio#1478.

hbredin · 2023-11-09T12:04:56Z

FYI: #1537

wip: switch to onnxruntime-gpu

fc2beef

hbredin changed the title ~~fix: fix WeSpeakerPretrainedSpeakerEmbedding.to("cuda")~~ fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support Sep 27, 2023

realfolkcode reviewed Sep 27, 2023

View reviewed changes

wip: use default search algorithm instead of benchmarking

f7a041b

hbredin added 2 commits September 28, 2023 21:34

doc: update changelog

73d8063

ci: bump version

36d224a

hbredin merged commit e478d57 into develop Sep 28, 2023
3 checks passed

hbredin deleted the fix/onnxruntime-gpu branch September 28, 2023 19:36

kaihe-stori mentioned this pull request Sep 29, 2023

pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 m-bain/whisperX#499

Closed

fimad added a commit to fimad/whisperX that referenced this pull request Oct 13, 2023

Bump pyannote dependency to 3.0.1

7c78017

pyannote 3.0.0 has a bug where the new embedding model does not run on the GPU. This is fixed in version 3.0.1 via pyannote/pyannote-audio#1478.

hbredin mentioned this pull request Nov 9, 2023

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support #1478

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support #1478

hbredin commented Sep 27, 2023

realfolkcode left a comment •

edited

Loading

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 27, 2023

hbredin commented Sep 27, 2023

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 •

edited

Loading

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023

hbredin commented Sep 28, 2023

hbredin commented Sep 28, 2023

guilhermehge commented Sep 29, 2023

hbredin commented Nov 9, 2023

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support #1478

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support #1478

Conversation

hbredin commented Sep 27, 2023

realfolkcode left a comment • edited Loading

Choose a reason for hiding this comment

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 • edited Loading

guilhermehge commented Sep 27, 2023

hbredin commented Sep 27, 2023

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 • edited Loading

guilhermehge commented Sep 27, 2023 • edited Loading

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023 • edited Loading

hbredin commented Sep 27, 2023

guilhermehge commented Sep 27, 2023

hbredin commented Sep 28, 2023

hbredin commented Sep 28, 2023

guilhermehge commented Sep 29, 2023

hbredin commented Nov 9, 2023

realfolkcode left a comment •

edited

Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading

guilhermehge commented Sep 27, 2023 •

edited

Loading