Reduce wheel package size for faiss-gpu CUDA 11.0 build #57

kyamagu · 2022-04-11T05:35:50Z

The CUDA 11.0 build in #56 bloats the wheel package size from 85.5 MB to 216.5 MB. Needs to investigate file size reduction.

kyamagu · 2022-04-11T05:40:08Z

Relevant pytorch/pytorch#56055

kyamagu · 2022-04-11T14:27:29Z

Seems one approach is to drop architecture-specific binary in CUDA libraries via nvprune, like this:

nvprune \
  -gencode arch=compute_60,code=sm_60 \
  -gencode arch=compute_70,code=sm_70 \
  -gencode arch=compute_75,code=sm_75 \
  -gencode arch=compute_80,code=sm_80 \
  -gencode arch=compute_80,code=compute_80 \
  -o /usr/local/cuda/lib64/libcublas_static_slim.a \
  /usr/local/cuda/lib64/libcublas_static.a

Currently there are four dependencies, and applying nvprune slightly reduces the binary size.

libcublas_static.a
libcublasLt_static.a
libcudart_static.a
libculibos.a

In Python 3.9, the original file size of _swigfaiss.cpython-39-x86_64-linux-gnu.so was 341MB, while applying nvprune to all the static libs results in 310MB. This is still huge.

kyamagu · 2022-04-11T14:32:21Z

The major problem is that CUDA 11.0 splits cublasLt API into a different static lib, and that seems to significantly increase the final binary size. In CUDA 10.x, cublasLt API was within the single static lib.

libcublasLt_static.a 224M
libcublas_static.a 82M
libcudart_static.a 910K
libculibos.a 31K

kyamagu · 2022-04-11T14:43:28Z

Strangely, faiss does not use cublasLt API. But when omitting -lcublasLt_static in the linker flag of setup.py, we see the following error on import. Why does that happen?

ImportError: /workspace/faiss-wheels/build/lib.linux-x86_64-3.9/faiss/_swigfaiss.cpython-39-x86_64-linux-gnu.so: undefined symbol: cublasLtMatrixTransformDescDestroy

kyamagu · 2022-04-11T15:09:38Z

Ok, changing the order of linker flag in setup.py seems to reduce the binary size.

kyamagu · 2022-04-12T15:16:40Z

With CUDA 11.6, the resulting wheel further goes up to 345MB in Linux. After nvprune, we get 276MB. This is still not good, as PyPI default limit is 60MB.

kyamagu · 2022-04-12T15:35:05Z

Alternative is to give up static linking and relies on dynamic linking. This will significantly reduce the wheel size, while requires users to install CUDA runtime libraries elsewhere.

kyamagu · 2022-11-17T23:54:04Z

With avx2 extension, the package is ~430MB.

kyamagu · 2023-01-05T12:48:50Z

It seems there are CUDA runtime packages on PyPI.
https://pypi.org/project/nvidia-cuda-runtime-cu11/

theLastOfCats · 2023-03-15T14:21:01Z

Hi!

Did you consider to place package on GitLab PyPI index or place it to dockerhub as image?

ping me if you need help

kyamagu · 2023-03-16T00:22:43Z

@theLastOfCats You can manually download packages from the release page.

Di-Is · 2024-04-22T15:07:13Z

Hi @kyamagu!

For your reference, by changing from static linking to dynamic linking of CUDA, the wheel size has been reduced to 63MB.
It was dynamically linked with the shared libraries of the nvidia-cublas-cu12 and nvidia-cuda-runtime-cu12 packages, which are published on PyPi

It seems possible to reduce the wheel size to less than 60MB by either narrowing down the target architecture or switching from static linking to dynamic linking of OpenBLAS.

Fork Repository: https://github.com/Di-Is/faiss-wheels/tree/pypi-cuda

Build Script

# Test CMD
CPU_TEST_CMD="pytest {project}/faiss/tests && pytest -s {project}/faiss/tests/torch_test_contrib.py"
GPU_TEST_CMD="cp {project}/faiss/tests/common_faiss_tests.py {project}/faiss/faiss/gpu/test/ && pytest {project}/faiss/faiss/gpu/test/test_*.py && pytest {project}/faiss/faiss/gpu/test/torch_*.py"

# Common Setup
export CIBW_BEFORE_ALL="bash scripts/build_Linux.sh"
export CIBW_TEST_COMMAND="${CPU_TEST_CMD}"
export CIBW_BEFORE_TEST_LINUX="pip install torch --index-url https://download.pytorch.org/whl/cpu"
export CIBW_ENVIRONMENT_LINUX="FAISS_OPT_LEVEL=${FAISS_OPT_LEVEL:-generic} BUILD_PARALLELISM=${BUILD_PARALLELISM:-3} CUDA_VERSION=12.1"
export CIBW_DEBUG_KEEP_CONTAINER=TRUE

if [ "$FAISS_ENABLE_GPU" = "ON" ]; then
    if [ "$CONTAINER_GPU_ACCESS" = "ON" ]; then
        export CIBW_TEST_COMMAND="${CIBW_TEST_COMMAND} && ${GPU_TEST_CMD}"
        export CIBW_CONTAINER_ENGINE="docker; create_args: --gpus all"
        export -n CIBW_BEFORE_TEST_LINUX
    fi
    export CIBW_ENVIRONMENT_LINUX="${CIBW_ENVIRONMENT_LINUX} FAISS_ENABLE_GPU=ON"
    export CIBW_REPAIR_WHEEL_COMMAND="auditwheel repair -w {dest_dir} {wheel} --exclude libcublas.so.12 --exclude libcublasLt.so.12 --exclude libcudart.so.12"
else
    export CIBW_ENVIRONMENT_LINUX="${CIBW_ENVIRONMENT_LINUX} FAISS_ENABLE_GPU=OFF"
    export CIBW_REPAIR_WHEEL_COMMAND="auditwheel repair -w {dest_dir} {wheel}"
fi

python3 -m cibuildwheel --output-dir wheelhouse --platform linux

kyamagu · 2024-04-23T00:59:35Z

@Di-Is CUDA backward compatibility is complicated, and the PyPI release should not expect any external dependency other than a few linked to CPython binary. https://github.com/pypa/manylinux

You can build a source package for your environment, but that wheel will not be compatible with other environments.

kyamagu · 2024-04-23T01:03:17Z

Relevant thread https://discuss.python.org/t/what-to-do-about-gpus-and-the-built-distributions-that-support-them/7125/58

Di-Is · 2024-04-23T09:53:21Z

CUDA backward compatibility is complicated,

I believe that installing the appropriate Nvidia drivers is not a matter of package management but rather a part of system setup, and the responsibility for execution lies with the user.
(This is also true for other package managers, e.g., Conda.)
Fortunately, installing the latest drivers will work with any version of CUDA and the binaries linked to it.

the PyPI release should not expect any external dependency other than a few linked to CPython binary.

It is correct that wheel files should be self-contained.
However, regarding this matter, it has been discussed in an auditwheel issue #368, and a feature to relax the restrictions has been merged into auditwheel.

Di-Is · 2024-04-23T09:53:43Z

You can build a source package for your environment, but that wheel will not be compatible with other environments.

If the following conditions are met, Faiss installed from the created wheel should work properly.

Run Faiss in an environment with an Nvidia Driver installed that is compatible with the CUDA being used.
Do not load multiple versions of CUDA shared libraries in a single process (to avoid troublesome issues like symbol conflicts).

1.As mentioned earlier, it is the user's responsibility.
2.The system/package configuration should be reviewed, I believe.

kyamagu · 2024-04-24T00:27:42Z

@Di-Is

However, regarding this matter, it has been discussed in an auditwheel issue pypa/auditwheel#368 (comment), and a feature to relax the restrictions has been merged into auditwheel.

This is not a matter of auditwheel but more fundamental issues in Python dependency management. In the current PyPI policy, managing GPU dependency is hard unless there is a standardized toolchain to build and test wheels for combinations of compiler / CUDA / driver / CPU arch / OS / Python versions, and recently, the compatibility with other packages like PyTorch. At least the current PyPI distribution is not designed well for different CUDA runtimes. If we ignore that and ship wheels for a very specific runtime configuration, we end up seeing a flood of error reports both here and in the upstream, which is obviously not a good thing. Conda is different from PyPI in that conda does manage runtime environments (e.g., CUDA).

My current approach is to at least leave the source distribution that works with any custom environment. Right now, I can't spend time on the GPU binary distribution, but you can try designing a build and test matrix to resolve the issues in the above configurations.

CandiedCode · 2024-07-04T14:48:29Z

@theLastOfCats You can manually download packages from the release page.

Hi @kyamagu,

Will all releases until pypi is resolved, have wheel packages available for download? I see currently only 1.7.3 has this. This is missing in 1.7.4 and 1.8.0 releases.

Thanks!

kyamagu · 2024-07-05T00:02:52Z

@CandiedCode Currently, there is no plan to support GPU binary wheels. You can build the source package on your environment.

kyamagu added the enhancement New feature or request label Apr 11, 2022

kyamagu mentioned this issue Apr 11, 2022

Change the order of CUDA linker flags #58

Closed

kyamagu mentioned this issue Nov 24, 2022

Disable GPU CI workflow #68

Merged

Dario-Mantegazza mentioned this issue Apr 14, 2023

How It is Possible to Import and Use Faiss in Google Colab? facebookresearch/faiss#890

Open

kyamagu mentioned this issue Jan 4, 2024

Cannot install faiss-gpu on Arch Linux #89

Closed

benjaminLuoHC mentioned this issue Feb 13, 2024

Monitor pytesseract and whisper for newer versions hc-sc-ocdo-bdpd/file-processing#142

Open

langdonholmes mentioned this issue Aug 14, 2024

Setting up FAISS for local vector store learlab/itell-api#69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce wheel package size for faiss-gpu CUDA 11.0 build #57

Reduce wheel package size for faiss-gpu CUDA 11.0 build #57

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022 •

edited

Loading

kyamagu commented Apr 11, 2022

kyamagu commented Apr 12, 2022 •

edited

Loading

kyamagu commented Apr 12, 2022

kyamagu commented Nov 17, 2022

kyamagu commented Jan 5, 2023

theLastOfCats commented Mar 15, 2023

kyamagu commented Mar 16, 2023

Di-Is commented Apr 22, 2024

kyamagu commented Apr 23, 2024

kyamagu commented Apr 23, 2024

Di-Is commented Apr 23, 2024

Di-Is commented Apr 23, 2024

kyamagu commented Apr 24, 2024

CandiedCode commented Jul 4, 2024

kyamagu commented Jul 5, 2024

Reduce wheel package size for faiss-gpu CUDA 11.0 build #57

Reduce wheel package size for faiss-gpu CUDA 11.0 build #57

Comments

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022

kyamagu commented Apr 11, 2022 • edited Loading

kyamagu commented Apr 11, 2022

kyamagu commented Apr 12, 2022 • edited Loading

kyamagu commented Apr 12, 2022

kyamagu commented Nov 17, 2022

kyamagu commented Jan 5, 2023

theLastOfCats commented Mar 15, 2023

kyamagu commented Mar 16, 2023

Di-Is commented Apr 22, 2024

kyamagu commented Apr 23, 2024

kyamagu commented Apr 23, 2024

Di-Is commented Apr 23, 2024

Di-Is commented Apr 23, 2024

kyamagu commented Apr 24, 2024

CandiedCode commented Jul 4, 2024

kyamagu commented Jul 5, 2024

kyamagu commented Apr 11, 2022 •

edited

Loading

kyamagu commented Apr 12, 2022 •

edited

Loading