feat: add CuBLAS support in Docker images #403

sebastien-prudhomme · 2023-05-28T16:16:11Z

Description

This PR fixes #280 partially. The CI needs also to be modified to build multiple Docker images corresponding to different compilation options.

Notes for Reviewers

I've choosed to use CUDA 11 as default version. This can be changed when building the Docker image by using the CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION args:

docker build --build-arg CUDA_MAJOR_VERSION=12 --build-arg CUDA_MINOR_VERSION=1 ...

I've also choosed to installed needed librairies only when BUILD_TYPE is "cublas". I've adapted things for "openblas" and "stablediffusion" options in the same way.

Be carefull now that the rebuild made on start of the image will only allow rebuilding with the same options provided at build time.

For people you want to test on Linux, you need a NVIDIA card, a recent NVIDIA driver and the nvidia-container-toolkit. Then just launch the container with docker run --gpus all ... and don't forget to configure "gpu_layers" in your model definition.

You should see VRAM offloading when the model is loaded:

llama.cpp: loading model from /build/models/vicuna
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 1932.71 MB (+ 2052.00 MB per state)
llama_model_load_internal: [cublas] offloading 32 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 3475 MB

Signed commits

Yes, I signed my commits.

Signed-off-by: Sébastien Prud'homme <sebastien.prudhomme@gmail.com>

ghost · 2023-05-28T23:15:58Z

Hello there! How do you set how many layers are offloaded to GPU?

sebastien-prudhomme · 2023-05-29T06:27:18Z

Hello there! How do you set how many layers are offloaded to GPU?

Hi, you need to setup "gpu_layers" in the model definition:

backend: llama
context_size: 1024
name: vicuna
parameters:
  model: vicuna
  temperature: 0.2
  top_k: 80
  top_p: 0.7
template:
  chat: vicuna-chat
  completion: vicuna-completion
gpu_layers: 32

sebastien-prudhomme · 2023-05-29T06:36:18Z

@marianbastiUNRN for your problem described on Discord: CUDA version?, NVIDIA driver version? See nvidia-smi output.

Try to build the image with the same CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION as your driver.

CUDA and driver compatibility: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-compatibility

mudler · 2023-05-29T07:15:51Z

@sebastien-prudhomme that's amazing! thank you! this is looking good at a first pass, I'll review it later and try to give it a shot locally too

ghost · 2023-05-29T12:11:06Z

@marianbastiUNRN for your problem described on Discord: CUDA version?, NVIDIA driver version? See nvidia-smi output.

Try to build the image with the same CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION as your driver.

CUDA and driver compatibility: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-compatibility

Thanks for the reply!
Im using the image nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04 (CUDA 12.1.1 cudnn 8 as you can tell), same as my drivers. Last night i was able to offload work to GPU by hardcoding the gpu layers on bindings.cpp and options.go and then restarting the container, forcing a rebuild. So the problem is definitely with my config-file.yaml.
As i mentioned on Discord, i get an error like

  line 1: cannot unmarshal !!map into []*api.Config

My config file look like this (formatted in utf-8 and validated ):

---
name: gpt-3.5-turbo
description: |
  Manticore 13B - (previously Wizard Mega) 
license: N/A
config_file: |
  backend: llama
  parameters:
    model: manticore
    top_k: 80
    temperature: 0.2
    top_p: 0.7
  context_size: 1024
  f16: true
  template:
    completion: manticore-completion
    chat: manticore-chat
prompt_templates:
  - name: manticore-completion
    content: |
      ### Instruction: Complete the following sentence: {{.Input}}

      ### Assistant:
  - name: manticore-chat
    content: |
      ### Instruction: {{.Input}}

      ### Assistant:

gpu_layers: 60

Other relevant comment of mine here

Dockerfile

mudler

fantastic, thanks @sebastien-prudhomme !

sebastien-prudhomme added 2 commits May 28, 2023 17:58

feat: add CuBLAS support in Docker images

642e6e9

Signed-off-by: Sébastien Prud'homme <sebastien.prudhomme@gmail.com>

Merge remote-tracking branch 'upstream/master' into dockerfile-cublas

2aee540

mudler reviewed May 29, 2023

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

sebastien-prudhomme added 2 commits May 29, 2023 18:45

Merge remote-tracking branch 'upstream/master' into dockerfile-cublas

a7fe65a

fix: always install OpenBLAS and Stable Diffusion requirements

65809f3

mudler approved these changes May 29, 2023

View reviewed changes

mudler merged commit 2272324 into mudler:master May 29, 2023

mudler added the enhancement New feature or request label May 30, 2023

sebastien-prudhomme deleted the dockerfile-cublas branch May 31, 2023 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CuBLAS support in Docker images #403

feat: add CuBLAS support in Docker images #403

sebastien-prudhomme commented May 28, 2023 •

edited

Loading

ghost commented May 28, 2023

sebastien-prudhomme commented May 29, 2023

sebastien-prudhomme commented May 29, 2023

mudler commented May 29, 2023

ghost commented May 29, 2023 •

edited by ghost

Loading

mudler left a comment

feat: add CuBLAS support in Docker images #403

feat: add CuBLAS support in Docker images #403

Conversation

sebastien-prudhomme commented May 28, 2023 • edited Loading

ghost commented May 28, 2023

sebastien-prudhomme commented May 29, 2023

sebastien-prudhomme commented May 29, 2023

mudler commented May 29, 2023

ghost commented May 29, 2023 • edited by ghost Loading

mudler left a comment

Choose a reason for hiding this comment

sebastien-prudhomme commented May 28, 2023 •

edited

Loading

ghost commented May 29, 2023 •

edited by ghost

Loading