CUDA problems (no kernel image is available for execution on the device) #474

joshuachris2001 · 2023-07-13T01:26:28Z

joshuachris2001
Jul 13, 2023

lately I've been having the common no kernel image is available for execution on the device error, but nothing I do seems to fix it. I have tried manually compiling llama.cpp with CUDA support and it works fine, but not though llama-cpp-python. my usual command is CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir I have also tried in a fresh python environment, still the same error.

am I doing something wrong? to note my gpu's max CUDA capabilities is 5.0.

here's a log:
`Using embedded DuckDB with persistence: data will be stored in: db
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GPU, compute capability 5.0
llama.cpp: loading model from models/13B/Manticore/Manticore-13B.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2500
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 0 repeating layers to GPU
llama_model_load_internal: offloaded 0/43 layers to GPU
llama_model_load_internal: total VRAM used: 516 MB
llama_new_context_with_model: kv self size = 1953.12 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

Enter a query: Hello?
CUDA error 209 at /tmp/pip-install-y26ymipg/llama-cpp-python_df790ba1ff86401bb68c221d8a0e2d7b/vendor/llama.cpp/ggml-cuda.cu:2830: no kernel image is available for execution on the device`

I don't know what is going on. even pytorch works semi-perfectly.

joshuachris2001 · 2023-07-13T05:14:31Z

joshuachris2001
Jul 13, 2023
Author

some reason if I go CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python==0.1.65 --force-reinstall --upgrade --no-cache-dir it works fine. I'm confused.

2 replies

Hamiedamr Mar 7, 2024

Same isue here
ggml_init_cublas: found 1 CUDA devices:
Device 0: Quadro M1000M, compute capability 5.0, VMM: yes
CUDA error: no kernel image is available for execution on the device
current device: 0, in function ggml_cuda_op_mul_mat at C:\Users\HP\AppData\Local\Temp\pip-install-la6sgiiz\llama-cpp-python_233bff54b65845778c7c3d3b32c02969\vendor\llama.cpp\ggml-cuda.cu:10301
cudaGetLastError()
GGML_ASSERT: C:\Users\HP\AppData\Local\Temp\pip-install-la6sgiiz\llama-cpp-python_233bff54b65845778c7c3d3b32c02969\vendor\llama.cpp\ggml-cuda.cu:255: !"CUDA error"

msk-nightly May 2, 2024

I also have a similar GPU with CUDA Compute Capability 5.0 and am facing the same issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA problems (no kernel image is available for execution on the device) #474

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

CUDA problems (no kernel image is available for execution on the device) #474

joshuachris2001 Jul 13, 2023

Replies: 1 comment · 2 replies

joshuachris2001 Jul 13, 2023 Author

Hamiedamr Mar 7, 2024

msk-nightly May 2, 2024

joshuachris2001
Jul 13, 2023

Replies: 1 comment 2 replies

joshuachris2001
Jul 13, 2023
Author