Custom CMAKE_ARGS are overwritten in the "backend/cpp/llama/grpc-server" target for the "llama-cpp" backend #1317

countzero · 2023-11-22T10:36:53Z

LocalAI version:
https://github.com/mudler/LocalAI/tree/763f94ca80827981d0b5e5e41ee6a21fec5f5f67

Environment, CPU architecture, OS, and Version:
Linux 9a4562508d46 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 GNU/Linux
The build is executed in a Docker container based on golang:1.21-bookworm from https://hub.docker.com/_/golang

Describe the bug
All CMAKE_ARGS from the environment are overwritten in the Makefile target backend/cpp/llama/grpc-server:

LocalAI/Makefile

Line 420 in 763f94c

    
           CMAKE_ARGS="${ADDED_CMAKE_ARGS}" LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/llama grpc-server

To Reproduce

Build the following Dockerfile:

FROM golang:1.21-bookworm

LABEL maintainer="stadt.werk GmbH <info@stadtwerk.org>"

SHELL ["/bin/bash", "-c"]

WORKDIR /opt/stadtwerk

RUN apt-get update && \
    apt-get install --yes \
        ca-certificates \
        cmake \
        curl \
        git \
        patch \
        pip \
        software-properties-common && \
    apt-get clean

RUN apt-add-repository contrib && \
    curl -O https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i cuda-keyring_1.1-1_all.deb && \
    rm -f cuda-keyring_1.1-1_all.deb && \
    apt-get update && \
    apt-get install --yes \
        cuda-nvcc-12-3 \
        libcublas-dev-12-3 \
        libcusparse-dev-12-3 \
        libcusolver-dev-12-3 && \
    apt-get clean

RUN git clone https://github.com/mudler/LocalAI && \
    git -C ./LocalAI checkout "763f94c"

WORKDIR /opt/stadtwerk/LocalAI

RUN CMAKE_ARGS="-DLLAMA_NATIVE=OFF" \
    BUILD_GRPC_FOR_BACKEND_LLAMA="ON" \
    GRPC_BACKENDS="backend-assets/grpc/llama-cpp" \
    make BUILD_TYPE="cublas" CUDACXX="/usr/local/cuda/bin/nvcc" GO_TAGS="" build

HEALTHCHECK --interval=1m --timeout=10m --retries=10 \
    CMD curl --fail http://localhost:8080/readyz || exit 1

EXPOSE 8080

ENTRYPOINT ["./local-ai", "--debug"]

Expected behavior

It should pass the CMAKE_ARGS="-DLLAMA_NATIVE=OFF" to the build context of the grpc-server.

Logs

521.8 cd llama.cpp && mkdir -p build && cd build && cmake .. -Dabsl_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/absl -DProtobuf_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/protobuf -Dutf8_range_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/utf8_range -DgRPC_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/grpc -DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/include -DLLAMA_CUBLAS=ON && cmake --build . --config Release
521.9 -- The C compiler identification is GNU 12.2.0
521.9 -- The CXX compiler identification is GNU 12.2.0
521.9 -- Detecting C compiler ABI info
522.0 -- Detecting C compiler ABI info - done
522.0 -- Check for working C compiler: /usr/bin/cc - skipped
522.0 -- Detecting C compile features
522.0 -- Detecting C compile features - done
522.0 -- Detecting CXX compiler ABI info
522.1 -- Detecting CXX compiler ABI info - done
522.1 -- Check for working CXX compiler: /usr/bin/c++ - skipped
522.1 -- Detecting CXX compile features
522.1 -- Detecting CXX compile features - done
522.1 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
522.1 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
522.1 -- Found Threads: TRUE
522.1 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.103")
522.2 -- cuBLAS found
522.9 -- The CUDA compiler identification is NVIDIA 12.3.103
522.9 -- Detecting CUDA compiler ABI info
523.6 -- Detecting CUDA compiler ABI info - done
523.7 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
523.7 -- Detecting CUDA compile features
523.7 -- Detecting CUDA compile features - done
523.7 -- Using CUDA architectures: 52;61;70
523.7 GNU ld (GNU Binutils for Debian) 2.40
523.7 -- CMAKE_SYSTEM_PROCESSOR: x86_64
523.7 -- x86 detected
523.7 -- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.13")
523.7 -- Using protobuf version 24.3.0 | Protobuf_INCLUDE_DIRS:  | CMAKE_CURRENT_BINARY_DIR: /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build/examples/grpc-server
523.7 -- Configuring done
523.8 -- Generating done
523.9 -- Build files have been written to: /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build
523.9 gmake[2]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[3]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[4]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[4]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[4]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 [  1%] Building C object CMakeFiles/ggml.dir/ggml.c.o
526.9 In function 'ggml_op_name',
526.9     inlined from 'ggml_get_n_tasks' at /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c:15698:17:
526.9 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c:2019:24: warning: array subscript 69 is above array bounds of 'const char *[68]' [-Warray-bounds]
526.9  2019 |     return GGML_OP_NAME[op];
526.9       |            ~~~~~~~~~~~~^~~~
526.9 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c: In function 'ggml_get_n_tasks':
526.9 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c:1589:21: note: while referencing 'GGML_OP_NAME'
526.9  1589 | static const char * GGML_OP_NAME[GGML_OP_COUNT] = {
526.9       |                     ^~~~~~~~~~~~
530.4 [  2%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
530.8 [  3%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
531.4 [  4%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
531.7 In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:105,
531.7                  from /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-impl.h:74,
531.7                  from /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.h:3,
531.7                  from /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:1:
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h: In function 'ggml_vec_dot_q4_0_q8_0':
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 gmake[4]: *** [CMakeFiles/ggml.dir/build.make:118: CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
531.7 gmake[4]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
531.7 gmake[3]: *** [CMakeFiles/Makefile2:664: CMakeFiles/ggml.dir/all] Error 2
531.7 gmake[3]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
531.7 gmake[2]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
531.7 gmake[2]: *** [Makefile:146: all] Error 2
531.7 make[1]: *** [Makefile:49: grpc-server] Error 2
531.7 make[1]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama'
531.7 make: *** [Makefile:417: backend/cpp/llama/grpc-server] Error 2

Additional context

As a quick workaround you can add the missing CMAKE_ARGS like:

RUN sed -i 's/CMAKE_ARGS="${ADDED_CMAKE_ARGS}"/CMAKE_ARGS="${CMAKE_ARGS} ${ADDED_CMAKE_ARGS}"/g' Makefile

This also fixes #1196

We have to use CMAKE_ARGS="-DLLAMA_NATIVE=OFF" to fix the Error: inlining failed in call to ‘always_inline’ ‘_mm256_cvtph_ps’ as described in: ggml-org/llama.cpp#107

The text was updated successfully, but these errors were encountered:

countzero added the bug Something isn't working label Nov 22, 2023

countzero assigned mudler Nov 22, 2023

mudler linked a pull request Nov 25, 2023 that will close this issue

fix: propagate CMAKE_ARGS when building grpc #1334

Merged

1 task

mudler mentioned this issue Nov 25, 2023

fix: propagate CMAKE_ARGS when building grpc #1334

Merged

1 task

mudler closed this as completed in #1334 Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom CMAKE_ARGS are overwritten in the "backend/cpp/llama/grpc-server" target for the "llama-cpp" backend #1317

Custom CMAKE_ARGS are overwritten in the "backend/cpp/llama/grpc-server" target for the "llama-cpp" backend #1317

countzero commented Nov 22, 2023 •

edited

Loading

Custom CMAKE_ARGS are overwritten in the "backend/cpp/llama/grpc-server" target for the "llama-cpp" backend #1317

Custom CMAKE_ARGS are overwritten in the "backend/cpp/llama/grpc-server" target for the "llama-cpp" backend #1317

Comments

countzero commented Nov 22, 2023 • edited Loading

countzero commented Nov 22, 2023 •

edited

Loading