Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

mudler · 2024-02-07T21:29:17Z

I have an Intel Arc 770, I'm trying to run llama.cpp with Docker following https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md but it fails with:

Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Exception caught at file:/app/ggml-sycl.cpp, line:14735, func:operator()
SYCL error: CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!
  in function ggml_backend_sycl_buffer_set_tensor at /app/ggml-sycl.cpp:14735
GGML_ASSERT: /app/ggml-sycl.cpp:2919: !"SYCL error"

I've been trying also to run it with:

docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri llama-cpp-sycl -m "/app/models/c0c3c83d0ec33ffe925657a56b06771b" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33

I'm also trying that with LocalAI where I have been creating manually a container with sycl, however there the error is different (PR mudler/LocalAI#1689 ):

# Build the image
> sudo docker build --build-arg GO_TAGS="none" --build-arg BUILD_TYPE=sycl_f32 --build-arg IMAGE_TYPE=core --build-arg GRPC_BACKENDS=backend-assets/grpc/llama-cpp -t local-ai .
# run it with phi-2
# Note: both -v /dev/dri, --device and --privileged yields same results
> sudo docker run --privileged -e GGML_SYCL_DEBUG=1 -e DEBUG=true -ti -v $PWD/models:/build/models -p 8080:8080 --device /dev/dri --rm local-ai phi-2
....
11:06PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:44983): stderr Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
...
59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host                                            
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr Exception caught at file:/build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:13449, func:operator()          
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_norm                                                                                             
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_mul                                                                                              
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_add                                                                                              
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_add                                                                                              
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr SYCL error: CHECK_TRY_ERROR(dpct::gemm_batch( *g_sycl_handles[g_main_device_index], oneapi::mkl::transpose::tran
s, oneapi::mkl::transpose::nontrans, ne01, ne11, ne10, alpha, (const void **)(ptrs_src.get() + 0 * ne23), dpct::library_data_t::real_half, nb01 / sizeof(sycl::half), (const void **)(ptrs
_src.get() + 1 * ne23), dpct::library_data_t::real_half, nb11 / sizeof(float), beta, (void **)(ptrs_dst.get() + 0 * ne23), cu_data_type, ne01, ne23, cu_compute_type)): Meet error in this
 line code!                                                                                                                                                                               
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr   in function ggml_sycl_mul_mat_mat_batched_sycl at /build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp:13449      
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr GGML_ASSERT: /build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp:2891: !"SYCL error"

Cards detected:

found 6 SYCL devices:                        
  Device 0: Intel(R) Arc(TM) A770 Graphics,     compute capability 1.3,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 1: Intel(R) FPGA Emulation Device,     compute capability 1.2,
        max compute_units 16,   max work group size 67108864,   max sub group size 64,  global mem size 29321728000
  Device 2: AMD Ryzen 7 5700G with Radeon Graphics         ,    compute capability 3.0,
        max compute_units 16,   max work group size 8192,       max sub group size 64,  global mem size 29321728000
  Device 3: Intel(R) Arc(TM) A770 Graphics,     compute capability 3.0,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 4: Intel(R) Arc(TM) A770 Graphics,     compute capability 3.0,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 5: Intel(R) Arc(TM) A770 Graphics,     compute capability 1.3,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device

root@b5f956e23067:/build# sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 7 5700G with Radeon Graphics          OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.27191]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.27191]

I'm trying this on Ubuntu 22.04 LTS Server (fresh install).

ping (sorry to bug you): @NeoZhangJianyu @airMeng , @luoyu-intel , @abhilash1910, @ggerganov

is Docker supposed to work? am I doing something wrong here?

To note everything works here directly from the host without Docker.

The text was updated successfully, but these errors were encountered:

NeoZhangJianyu · 2024-02-08T07:14:37Z

@mudler

Could you add the whole log of llama.cpp here?
Please run following cmds in docker and share the whole log:

./build/bin/ls-sycl-device
./examples/sycl/build.sh
./examples/sycl/run-llama2.sh

Will you meet same issue in host (non-docker)?

Note, we only resolve the issue in llama.cpp. If your issue is in the migration code from llama.cpp to other project, we can't make sure it works well. You need to reproduce same issue in llama.cpp.

mudler · 2024-02-08T11:02:46Z

@mudler

1. Could you add the whole log of llama.cpp here?

Re-issuing the same steps now again but this time worked ... so it seems it was something sporadic

   Please run following cmds in docker and share the whole log:
./build/bin/ls-sycl-device ./examples/sycl/build.sh ./examples/sycl/run-llama2.sh
2. Will you meet same issue in host (non-docker)?

No, it works from the host just fine

Note, we only resolve the issue in llama.cpp. If your issue is in the migration code from llama.cpp to other project, we can't make sure it works well. You need to reproduce same issue in llama.cpp.

Well, thanks. LocalAI was one of the first projects supporting llama.cpp and it always will be. We just use the same llama.cpp server code but we have a gRPC Server on top.

However, I find this attitude of yours quite an "unfriendly" approach from a downstream project perspective: If LocalAI cannot consume llama.cpp, other project might face the same issues as well, and having documented errors/solution is helpful and inline on how collaborating between Open source projects works in general.

As I see it LocalAI is bringing users to know llama.cpp and, as it is a library and can be imported, that opens up to have integration problem as well that makes sense to solve in the llama.cpp codebase.

Going to close this one, the problem I had with LocalAI I've solved it by using the intel images directly. Unfortunately using ubuntu 22.04 and following the Intel steps to install the required dependencies didn't worked out and resulted just in a loss of time.

mudler added the bug-unconfirmed label Feb 7, 2024

NeoZhangJianyu added the Intel GPU label Feb 8, 2024

mudler closed this as completed Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

mudler commented Feb 7, 2024 •

edited

Loading

NeoZhangJianyu commented Feb 8, 2024 •

edited

Loading

mudler commented Feb 8, 2024 •

edited

Loading

Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

Comments

mudler commented Feb 7, 2024 • edited Loading

NeoZhangJianyu commented Feb 8, 2024 • edited Loading

mudler commented Feb 8, 2024 • edited Loading

mudler commented Feb 7, 2024 •

edited

Loading

NeoZhangJianyu commented Feb 8, 2024 •

edited

Loading

mudler commented Feb 8, 2024 •

edited

Loading