Replies: 1 comment
-
Try this |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have setup llama-server successfully so that it consumes my RTX 4000 via CUDA (v 11), both via docker and running locally.
But when I want to use the python-bindings (llama-cpp-python), it seems to not utilize the GPU at all, doing everything with CPU only which consumes much time.
I have installed the library with
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
What else do I need in order to enable GPU-support?
Code:
Beta Was this translation helpful? Give feedback.
All reactions