-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python package looks for library in wrong path #5106
Comments
Hi @david-cortes, thank you for your report. Can you provide some more details on how you installed the package? I've had this happened to me when I installed |
I didn't install it, I compiled it from source from the current master branch using the cmake system and tried to import it in python from its folder. Doing that imports it correctly in other setups. |
I see. Did you have it installed already in your environment? I just tried: conda create -n lgb_test -y python=3.9 scipy
conda activate lgb_test
python -c 'import lightgbm' and it works correctly. |
I did have it installed from pip, and it works correctly if I import the pip-installed version, but it doesn't work in this setup if I import the non-installed version that's compiled from source. |
I guess if you |
Still the same error after uninstalling the pip version in this conda environment. |
I think either of the two options I described (creating a new environment or using |
Tried in different conda environments and got the same issue. Tried also installing it with |
You can build a wheel file first and after that install it via Line 153 in 46ebfd2
Line 170 in 46ebfd2
|
Still the same error from building a binary wheel and then installing it with pip. |
While working on #5169, I ran into this issue and have been doing some investigation. I think I've identified a fully reproducible example (using a container) and a strategy for fixing this, based on what I've learned about the way that Will post those details here in the next few days, when I have time. Just wanted to post to let others here know I'm actively looking into this. |
Ok, so! I've been able to collect my thoughts on this. Reproducible ExampleGiven the following Dockerfile, pinned to the Dockerfile (click me)
docker build \
--no-cache \
-t lgb-glibc-demo:local \
- < ./Dockerfile Installing docker run \
--rm \
--workdir /usr/local/src/LightGBM/python-package \
-it lgb-glibc-demo:local \
/bin/bash -c "pip install . && python -c 'import lightgbm'" But if you try docker run \
--rm \
--workdir /usr/local/src/LightGBM/python-package \
-it lgb-glibc-demo:local \
/bin/bash -c "conda install -y -n base libstdcxx-ng && pip install . && python -c 'import lightgbm'"
The use of a Root Cause (short description)When When Workarounds with no changes to LightGBM1. Use conda's CMake and compilers to build LightGBM from source (click me)
# install the problematic library
conda install -y -n base \
libstdcxx-ng
# confirm that it results in a `libstdc++.so.6` being added in conda env
find / -name 'libstdc++.so.6'
# /root/miniforge/lib/libstdc++.so.6
# /root/miniforge/pkgs/libstdcxx-ng-11.2.0-he4da1e4_16/lib/libstdc++.so.6
# /usr/lib/x86_64-linux-gnu/libstdc++.so.6
# get conda compilers
conda install -y -n base \
cmake \
gcc_linux-64 \
gxx_linux-64
# it's important to activate the target conda env, to set
# the relevant environment variables pointing to conda's compilers
source activate base
# you can see the effect of this by checking env variables
echo $CC
# /root/miniforge/bin/x86_64-conda-linux-gnu-cc
echo $CXX
# /root/miniforge/bin/x86_64-conda-linux-gnu-c++
cd /usr/local/src/LightGBM
pip uninstall -y lightgbm
rm -rf ./build
rm -f ./lib_lightgbm.so
cd ./python-package
pip install .
# confirm that importing works
python -c "import lightgbm; print(lightgbm.__version__)"
# 3.3.2.99
# confirm that the maximum GLIBCXX version is less than
# the one from the error message, and that the libstdc++.so.6 linked
# is the one from /root/miniforge
LIB_LIGHTGBM_IN_CONDA=$(
find /root/miniforge -name 'lib_lightgbm.so' \
| head -1
)
ldd -v \
"${LIB_LIGHTGBM_IN_CONDA}" 2. point LD_PRELOAD at the non-conda lib/ directory prior to starting python (click me)# install the problematic library
conda install -y -n base \
libstdcxx-ng
# confirm that it resulted in a `libstdc++.so.6` being added in conda env
find / -name 'libstdc++.so.6'
# /root/miniforge/lib/libstdc++.so.6
# /root/miniforge/pkgs/libstdcxx-ng-11.2.0-he4da1e4_16/lib/libstdc++.so.6
# /usr/lib/x86_64-linux-gnu/libstdc++.so.6
# build LightGBM from source
cd /usr/local/src/LightGBM
pip uninstall -y lightgbm
rm -rf ./build
rm -f ./lib_lightgbm.so
cd ./python-package
pip install .
# try loading lightgbm (this will fail)
python -c "import lightgbm; print(lightgbm.__version__)"
# try loading lightgbm with LD_LIBRARY_PATH set to the same paths
# referenced in lib_lightgbm.so
LD_PRELOAD="${LD_PRELOAD}:/usr/lib/x86_64-linux-gnu/libstdc++.so.6" \
python -c "import lightgbm; print(lightgbm.__version__)" NOTE: this cannot be done from inside Python. The following code will fail. import os
os.environ["LD_PRELOAD"] = "/usr/lib/x86_64-linux-gnu/libstdc++.so.6"
import lightgbm 3. Modify `lib_lightgbm.so`'s DT_RPATH tag so that it points at the place where it found `libstdc++.so.6` (click me)See https://man7.org/linux/man-pages/man3/dlopen.3.html and https://stackoverflow.com/a/20333550/3986677. rpath is a way to embed a hint about where to find include dirs in a shared object. cd /root/miniforge/lib/python3.9/site-packages/lightgbm/
cp lib_lightgbm.so lib_lightgbm2.so
# shows no rpath
chrpath -l lib_lightgbm2.so
# fails
python -c \
"import ctypes; ctypes.cdll.LoadLibrary('/root/miniforge/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so')"
# patch the rpath
patchelf --set-rpath '/usr/lib/x86_64-linux-gnu' lib_lightgbm2.so
# shows rpath
chrpath -l lib_lightgbm2.so
# succeeds!
python -c \
"import ctypes; ctypes.cdll.LoadLibrary('/root/miniforge/lib/python3.9/site-packages/lightgbm/lib_lightgbm2.so')" Root Cause (longer description)I've found this topic very complicated (or at least, new to me), so have been capturing my running notes and example code snippets at https://github.com/jameslamb/lgb-glibc-demo. Click below to see a summary of the issue that is more detailed than much longer description (click me)Whenever LightGBM/python-package/lightgbm/basic.py Line 117 in 416ecd5
The From "Finding shared libraries" in the
And from "loading shared libraries" (doc)
"underlying platform's For example, see https://man7.org/linux/man-pages/man3/dlopen.3.html for Linux. From those docs, when searching for a library, the following are checked in order:
Try this, using the container image built higher up in this description. docker run \
--rm \
--workdir /usr/local/src/LightGBM/python-package \
-it lgb-glibc-demo:local \
/bin/bash
readelf -d /root/miniforge/bin/python \
| grep RPATH Which yields the following.
That says "look in If you look at the copy of conda create --name test-env python=3.9
readelf -d /root/miniforge/envs/test-env/bin/python \
| grep RPATH That shows the same output.
Which this time means "first look in If you source activate test-env
cd /usr/local/src/LightGBM/python-package
pip install .
LIB_LIGHTGBM_IN_CONDA=$(
find /root/miniforge -name 'lib_lightgbm.so' \
| head -1
)
# /root/miniforge/envs/test-env/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so
ldd -v ${LIB_LIGHTGBM_IN_CONDA} That output contains a lot of information, including the following key line: libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9ee66e2000) That says " Changes LightGBM could make to mitigate thisI think the most reliable, portable way for LightGBM to handle this is to attach a DT_RPATH to See CMake's docs on this at https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling#default-rpath-settings.
And https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling#always-full-rpath
I haven't tested that yet, but I think it's worth exploring to try to mitigate this issue. |
Both EDIT: Actually from a look at the code, using the linker from python's configured compiler would imply some large changes in the setup and compilation logic. |
@jameslamb Wow, brilliant investigation, thanks a lot! If attaching DT_RPATH to I remember this conda behavior was the reason why we statically link Lines 323 to 325 in 6de9baf
|
@david-cortes more details on "letting the linker do the job" and "using the linker from python's configured compiler" (like a link to an example or relevant documentation) would be greatly appreciated. I don't know what pattern you're referring to. |
The current logic is to use a That way, if e.g. running it on windows, there would be a |
Thanks for that information. I'm not familiar with that pattern for Python projects, will look around for some examples and documentation on it. As discussed in #5061 , I think it's possible that the package's strategy for compiling BUT...I also think, based on my investigation above, that setting |
Trying to build the python package from source without installing it will somehow try to pick the wrong path for system libraries. I get an error about being unable to import
scipy.sparse
, even though I can import that library in the same session (this is after successfully buildinglib_lightgbm
through the cmake system):The text was updated successfully, but these errors were encountered: