Shared Memory while multi-gpu? #429

Bigfield77 · 2024-04-26T22:33:26Z

Bigfield77
Apr 26, 2024

Hello,

I am able to load llama3 70b instruct in 5.0bpw using exllamav2_hf in ooba if I am expose only on GPU (Extremely slow)

If i expose my 2 GPUs (3090s), there is not enough vram to load it and it does not fall back to using shared memory like it does when using a single GPU.

Is this something that is possible to enable, something like falling back to shared memory on the last available gpu?

I am running under windows

Bigfield77 · 2024-05-05T19:29:00Z

Bigfield77
May 5, 2024
Author

do you know if that a limitation with exllamav2, pytorch or the nvidia driver suite?

0 replies

turboderp · 2024-05-05T19:45:01Z

turboderp
May 5, 2024
Maintainer

I'm not really sure. This would be down to the NVIDIA driver, and there isn't to my knowledge any way you can control the sysmem fallback behavior from software. (?)

0 replies

Bigfield77 · 2024-05-05T20:09:21Z

Bigfield77
May 5, 2024
Author

Thanks for your answer!

I was asking because i saw something like that in the 'transformers' loader of oobabooga:

I will try to open a feature request with nvidia if i can find out where

Thanks for your time

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared Memory while multi-gpu? #429

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Shared Memory while multi-gpu? #429

Bigfield77 Apr 26, 2024

Replies: 3 comments

Bigfield77 May 5, 2024 Author

turboderp May 5, 2024 Maintainer

Bigfield77 May 5, 2024 Author

Bigfield77
Apr 26, 2024

Bigfield77
May 5, 2024
Author

turboderp
May 5, 2024
Maintainer

Bigfield77
May 5, 2024
Author