-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add CuBLAS support in Docker images #403
Conversation
Signed-off-by: Sébastien Prud'homme <sebastien.prudhomme@gmail.com>
Hello there! How do you set how many layers are offloaded to GPU? |
Hi, you need to setup "gpu_layers" in the model definition:
|
@marianbastiUNRN for your problem described on Discord: CUDA version?, NVIDIA driver version? See Try to build the image with the same CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION as your driver. CUDA and driver compatibility: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-compatibility |
@sebastien-prudhomme that's amazing! thank you! this is looking good at a first pass, I'll review it later and try to give it a shot locally too |
Thanks for the reply!
My config file look like this (formatted in utf-8 and validated ): ---
name: gpt-3.5-turbo
description: |
Manticore 13B - (previously Wizard Mega)
license: N/A
config_file: |
backend: llama
parameters:
model: manticore
top_k: 80
temperature: 0.2
top_p: 0.7
context_size: 1024
f16: true
template:
completion: manticore-completion
chat: manticore-chat
prompt_templates:
- name: manticore-completion
content: |
### Instruction: Complete the following sentence: {{.Input}}
### Assistant:
- name: manticore-chat
content: |
### Instruction: {{.Input}}
### Assistant:
gpu_layers: 60 Other relevant comment of mine here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic, thanks @sebastien-prudhomme !
Description
This PR fixes #280 partially. The CI needs also to be modified to build multiple Docker images corresponding to different compilation options.
Notes for Reviewers
I've choosed to use CUDA 11 as default version. This can be changed when building the Docker image by using the CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION args:
I've also choosed to installed needed librairies only when BUILD_TYPE is "cublas". I've adapted things for "openblas" and "stablediffusion" options in the same way.
Be carefull now that the rebuild made on start of the image will only allow rebuilding with the same options provided at build time.
For people you want to test on Linux, you need a NVIDIA card, a recent NVIDIA driver and the nvidia-container-toolkit. Then just launch the container with
docker run --gpus all ...
and don't forget to configure "gpu_layers" in your model definition.You should see VRAM offloading when the model is loaded:
Signed commits