Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Could not build wheels for llama-cpp-python #1617

Open
inst32i opened this issue Jul 23, 2024 · 16 comments
Open

ERROR: Could not build wheels for llama-cpp-python #1617

inst32i opened this issue Jul 23, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@inst32i
Copy link

inst32i commented Jul 23, 2024

Current Behavior

I run the following:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

an error occured:
ERROR: Failed building wheel for llama-cpp-python

Environment and Context

  • Physical hardware:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Address sizes: 46 bits physical, 48 bits virtual
    Byte Order: Little Endian
    CPU(s): 16
    On-line CPU(s) list: 0-15
    Vendor ID: GenuineIntel
    Model name: Intel Xeon Processor (Skylake, IBRS)
    CPU family: 6
    Model: 85
    Thread(s) per core: 1
    Core(s) per socket: 1
    Socket(s): 16
    Stepping: 4
    BogoMIPS: 4389.68
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopol
    ogy cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3d
    nowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt cl
    wb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear
    Virtualization features:
    Hypervisor vendor: KVM
    Virtualization type: full
    Caches (sum of all):
    L1d: 512 KiB (16 instances)
    L1i: 512 KiB (16 instances)
    L2: 64 MiB (16 instances)
    L3: 256 MiB (16 instances)
    NUMA:
    NUMA node(s): 1
    NUMA node0 CPU(s): 0-15
    Vulnerabilities:
    Itlb multihit: KVM: Mitigation: VMX unsupported
    L1tf: Mitigation; PTE Inversion
    Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
    Meltdown: Mitigation; PTI
    Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
    Retbleed: Mitigation; IBRS
    Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
    Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
    Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
    Srbds: Not affected
    Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown

  • Operating System:
    ubuntu1~22.04

  • SDK version:

$ python3 --3.11
$ make --4.3
$ g++ --11.4.0

Failure Information (for bugs)

...
FAILED: vendor/llama.cpp/examples/llava/llama-llava-cli
: && /usr/bin/g++ -pthread -B /mnt/x_env/compiler_compat -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llama-llava-cli -Wl,-rpath,/tmp/tmp6bws6ysg/build/vendor/llama.cpp/src:/tmp/tmp6bws6ysg/build/vendor/llama.cpp/ggml/src: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/src/libllama.so vendor/llama.cpp/ggml/src/libggml.so && :
/mnt/x_env/compiler_compat/ld: warning: libcuda.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link)
/mnt/x_env/compiler_compat/ld: warning: libgomp.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link)
/mnt/x_env/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
/mnt/x_env/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
/mnt/x_env/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemCreate' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_barrier@GOMP_1.0'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressReserve' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemUnmap'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_parallel@GOMP_4.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemSetAccess'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGet' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_thread_num@OMP_1.0'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressFree' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuGetErrorString'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_single_start@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGetAttribute'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemMap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemRelease'
/mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_num_threads@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemGetAllocationGranularity'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

*** CMake build failed
error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

Steps to Reproduce

  1. conda activate <my_env>
  2. CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

Nvidia Driver Version: 550.54.14
Cuda Tookit Verion: V12.4.99

@gillbates
Copy link

same issue here ...

@bteinstein
Copy link

same issure here too

1 similar comment
@XingchenMengxiang
Copy link

same issure here too

@TobiasKlapper
Copy link

Same here

@SweetestRug
Copy link

Same here as well.

@bodybreaker
Copy link

Same here too

@bodybreaker
Copy link

Current Behavior

I run the following: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

an error occured: ERROR: Failed building wheel for llama-cpp-python

Environment and Context

  • Physical hardware:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Address sizes: 46 bits physical, 48 bits virtual
    Byte Order: Little Endian
    CPU(s): 16
    On-line CPU(s) list: 0-15
    Vendor ID: GenuineIntel
    Model name: Intel Xeon Processor (Skylake, IBRS)
    CPU family: 6
    Model: 85
    Thread(s) per core: 1
    Core(s) per socket: 1
    Socket(s): 16
    Stepping: 4
    BogoMIPS: 4389.68
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopol
    ogy cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3d
    nowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt cl
    wb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear
    Virtualization features:
    Hypervisor vendor: KVM
    Virtualization type: full
    Caches (sum of all):
    L1d: 512 KiB (16 instances)
    L1i: 512 KiB (16 instances)
    L2: 64 MiB (16 instances)
    L3: 256 MiB (16 instances)
    NUMA:
    NUMA node(s): 1
    NUMA node0 CPU(s): 0-15
    Vulnerabilities:
    Itlb multihit: KVM: Mitigation: VMX unsupported
    L1tf: Mitigation; PTE Inversion
    Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
    Meltdown: Mitigation; PTI
    Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
    Retbleed: Mitigation; IBRS
    Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
    Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
    Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
    Srbds: Not affected
    Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
  • Operating System:
    ubuntu1~22.04
  • SDK version:
$ python3 --3.11
$ make --4.3
$ g++ --11.4.0

Failure Information (for bugs)

... FAILED: vendor/llama.cpp/examples/llava/llama-llava-cli : && /usr/bin/g++ -pthread -B /mnt/x_env/compiler_compat -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llama-llava-cli -Wl,-rpath,/tmp/tmp6bws6ysg/build/vendor/llama.cpp/src:/tmp/tmp6bws6ysg/build/vendor/llama.cpp/ggml/src: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/src/libllama.so vendor/llama.cpp/ggml/src/libggml.so && : /mnt/x_env/compiler_compat/ld: warning: libcuda.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libgomp.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemCreate' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_barrier@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressReserve' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemUnmap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_parallel@GOMP_4.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemSetAccess' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGet' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_thread_num@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressFree' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuGetErrorString' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_single_start@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGetAttribute' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemMap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemRelease' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_num_threads@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemGetAllocationGranularity' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

*** CMake build failed error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

Steps to Reproduce

  1. conda activate <my_env>
  2. CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

Nvidia Driver Version: 550.54.14 Cuda Tookit Verion: V12.4.99

I solved this problem.
This happend when Cuda version is different with Cuda toolkit version.

You need to check
cuda-version with nvidia-smi

and check cuda-toolkit version wih conda list | grep cuda-toolkit

My version were 12.2 , 11.8

@Viagounet
Copy link

Viagounet commented Jul 31, 2024

Same here.
Installation worked fine with CMAKE_ARGS="-DLLAMA_CUBLAS=on" for llama-cpp-python <= 2.79.0.
I now get the same error as OP for llama-cpp-python >= 2.80.0, whether I use CMAKE_ARGS="-DLLAMA_CUBLAS=on" or CMAKE_ARGS="-DGGML_CUDA=on"

@hhhhpaaa
Copy link

same issure here too in WSL2

@gilbertc
Copy link

same issue here too, WSL2 on Windows 10.

@tigert1998
Copy link

same issue here

@tigert1998
Copy link

I found a workaround to fix this issue:

  1. clone this project and check out the version you would like to install
  2. build this project with CMake
  3. then here comes the key part: overwrite pyproject.toml with the following content
# [build-system]
# requires = ["scikit-build-core[pyproject]>=0.9.2"]
# build-backend = "scikit_build_core.build"

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "llama_cpp_python"
dynamic = ["version"]
description = "Python bindings for the llama.cpp library"
readme = "README.md"
license = { text = "MIT" }
authors = [
    { name = "Andrei Betlen", email = "abetlen@gmail.com" },
]
dependencies = [
    "typing-extensions>=4.5.0",
    "numpy>=1.20.0",
    "diskcache>=5.6.1",
    "jinja2>=2.11.3",
]
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]


[project.optional-dependencies]
server = [
    "uvicorn>=0.22.0",
    "fastapi>=0.100.0",
    "pydantic-settings>=2.0.1",
    "sse-starlette>=1.6.1",
    "starlette-context>=0.3.6,<0.4",
    "PyYAML>=5.1",
]
test = [
    "pytest>=7.4.0",
    "httpx>=0.24.1",
    "scipy>=1.10",
]
dev = [
    "black>=23.3.0",
    "twine>=4.0.2",
    "mkdocs>=1.4.3",
    "mkdocstrings[python]>=0.22.0",
    "mkdocs-material>=9.1.18",
    "pytest>=7.4.0",
    "httpx>=0.24.1",
]
all = [
    "llama_cpp_python[server,test,dev]",
]

# [tool.scikit-build]
# wheel.packages = ["llama_cpp"]
# cmake.verbose = true
# cmake.minimum-version = "3.21"
# minimum-version = "0.5.1"
# sdist.include = [".git", "vendor/llama.cpp/*"]

[tool.setuptools.packages.find]
include = ["llama_cpp"]

[tool.setuptools.package-data]
"llama_cpp" = ["lib/*"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "llama_cpp/__init__.py"

[project.urls]
Homepage = "https://github.com/abetlen/llama-cpp-python"
Issues = "https://github.com/abetlen/llama-cpp-python/issues"
Documentation = "https://llama-cpp-python.readthedocs.io/en/latest/"
Changelog = "https://llama-cpp-python.readthedocs.io/en/latest/changelog/"

[tool.pytest.ini_options]
testpaths = "tests"
  1. run pip install . --verbose

@blkqi
Copy link

blkqi commented Aug 29, 2024

Adding the path to libcuda.so to the LD_LIBRARY_PATH environment variable allows the examples to link so that the build can succeed.

@PurnaChandraPanda
Copy link

Hello @blkqi

How did it work for you? Can you please share what all env or path settings to try?

@JHH11
Copy link

JHH11 commented Sep 16, 2024

Thank you @blkqi. Your advice really helped me. In my case, I used Dockerfile like that

ENV LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat/libcuda.so
RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.2.90

@levn
Copy link

levn commented Oct 6, 2024

sudo apt install libcuda-12.4-1
which installs
/usr/lib/x86_64-linux-gnu/libcuda.so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests