Shared Memory while multi-gpu? #429
Closed
Bigfield77
started this conversation in
General
Replies: 3 comments
-
do you know if that a limitation with exllamav2, pytorch or the nvidia driver suite? |
Beta Was this translation helpful? Give feedback.
0 replies
-
I'm not really sure. This would be down to the NVIDIA driver, and there isn't to my knowledge any way you can control the sysmem fallback behavior from software. (?) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am able to load llama3 70b instruct in 5.0bpw using exllamav2_hf in ooba if I am expose only on GPU (Extremely slow)
If i expose my 2 GPUs (3090s), there is not enough vram to load it and it does not fall back to using shared memory like it does when using a single GPU.
Is this something that is possible to enable, something like falling back to shared memory on the last available gpu?
I am running under windows
Beta Was this translation helpful? Give feedback.
All reactions