CUDA Error 801: Operation Not Supported #870

t19cs045-sub · 2023-11-04T13:12:03Z

I encountered a CUDA error while running a script that uses the Llama model. The error message is “CUDA error 801 at ggml-cuda.cu:6799: operation not supported”. The current device is 0.

Code Snippet:

def question(message):
# LLM setup
llm = Llama(model_path="./japanese-stablelm-instruct-gamma-7b-q8_0.gguf",
n_gpu_layers=32)

  # Run inference
  output = llm(
      prompt,
      temperature=1,
      top_p=0.95,
      stop=["指示:", "入力:", "応答:"],
      echo=False,
      max_tokens=1024
  )

Error Message:
llm_load_tensors: ggml ctx size = 0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 132.92 MB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors: VRAM used: 7205.83 MB
...................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: kv self size = 64.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 79.63 MB
llama_new_context_with_model: VRAM scratch buffer: 73.00 MB
llama_new_context_with_model: total VRAM used: 7342.83 MB (model: 7205.83 MB, context: 137.00 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

CUDA error 801 at ggml-cuda.cu:6799: operation not supported
current device: 0

Environment:

NVIDIA-SMI 545.23.06
Driver Version: 545.23.06
CUDA Version: 12.3
GPU: Nvidia Quadro M4000 8GB
Any help in resolving this issue would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

Ph0rk0z · 2023-11-04T13:37:32Z

This is an upstream bug that broke multi-gpu.

abetlen · 2023-11-08T06:18:25Z

@Ph0rk0z do you have a link to the upstream issue for this?

Ph0rk0z · 2023-11-08T14:04:40Z

ggerganov/llama.cpp#3930 (comment)

ggerganov/llama.cpp#3944

It's resolved for me. But I think latest refactoring broke llama.cpp_hf in textgen. Something to do with swapping the ctx in the cache.

al-fk · 2024-08-01T14:19:28Z

I have the same error.
@t19cs045-sub Did you find a solution?

abetlen closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Error 801: Operation Not Supported #870

CUDA Error 801: Operation Not Supported #870

t19cs045-sub commented Nov 4, 2023 •

edited

Loading

Ph0rk0z commented Nov 4, 2023

abetlen commented Nov 8, 2023

Ph0rk0z commented Nov 8, 2023

al-fk commented Aug 1, 2024

CUDA Error 801: Operation Not Supported #870

CUDA Error 801: Operation Not Supported #870

Comments

t19cs045-sub commented Nov 4, 2023 • edited Loading

Ph0rk0z commented Nov 4, 2023

abetlen commented Nov 8, 2023

Ph0rk0z commented Nov 8, 2023

al-fk commented Aug 1, 2024

t19cs045-sub commented Nov 4, 2023 •

edited

Loading