CUDA problems (no kernel image is available for execution on the device) #474
Unanswered
joshuachris2001
asked this question in
Q&A
Replies: 1 comment 2 replies
-
some reason if I go |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
lately I've been having the common
no kernel image is available for execution on the device
error, but nothing I do seems to fix it. I have tried manually compiling llama.cpp with CUDA support and it works fine, but not though llama-cpp-python. my usual command isCMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
I have also tried in a fresh python environment, still the same error.am I doing something wrong? to note my gpu's max CUDA capabilities is 5.0.
here's a log:
`Using embedded DuckDB with persistence: data will be stored in: db
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GPU, compute capability 5.0
llama.cpp: loading model from models/13B/Manticore/Manticore-13B.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2500
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 0 repeating layers to GPU
llama_model_load_internal: offloaded 0/43 layers to GPU
llama_model_load_internal: total VRAM used: 516 MB
llama_new_context_with_model: kv self size = 1953.12 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
Enter a query: Hello?
CUDA error 209 at /tmp/pip-install-y26ymipg/llama-cpp-python_df790ba1ff86401bb68c221d8a0e2d7b/vendor/llama.cpp/ggml-cuda.cu:2830: no kernel image is available for execution on the device`
I don't know what is going on. even pytorch works semi-perfectly.
Beta Was this translation helpful? Give feedback.
All reactions