-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA error: unspecified launch failure on inference on Nvidia V100 GPUs #1624
Comments
I am having the same issue. The fact that it complains the kernel wasn't compiled for an arch version (700) that is later listed in the compiled list is very LoL. mmq.cuh:2422: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compiled for: 500,520,530,600,610,620,700,720,750,800,860,870,890,900 Environment: |
nvidia smi - """+---------------------------------------------------------------------------------------+ +---------------------------------------------------------------------------------------+ |
The issue looks like ollama/ollama#5571 In this case, it will be enough to delete the following line
|
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I'm running the llama-cpp-python OpenAI-compatible API servers on my VM that has 1x Nvidia V100 16GB GPU allocated to it. The server can start, but once a request is sent to the server, it falls over.
Current Behavior
When the server receives a request it errors with a CUDA error. The error is identical to an issue already reported in the Ollama GitHub page ollama/ollama#5571
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
CPU family: 6
Model: 79
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 1
Stepping: 1
BogoMIPS: 5187.98
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid ss
e4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
Virtualization features:
Hypervisor vendor: Microsoft
Virtualization type: full
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 1.5 MiB (6 instances)
L3: 35 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-5
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX unsupported
L1tf: Mitigation; PTE Inversion
Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
Meltdown: Mitigation; PTI
Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
Srbds: Not affected
Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
22.04.1-Ubuntu SMP Mon Jun 17 18:38:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Failure Information (for bugs)
The text was updated successfully, but these errors were encountered: