Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

Closed
mudler opened this issue Feb 7, 2024 · 2 comments
Closed

Running llama.cpp with sycl in Docker fails with "Unknown PII Error" #5400

mudler opened this issue Feb 7, 2024 · 2 comments

Comments

@mudler
Copy link
Contributor

mudler commented Feb 7, 2024

I have an Intel Arc 770, I'm trying to run llama.cpp with Docker following https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md but it fails with:

Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Exception caught at file:/app/ggml-sycl.cpp, line:14735, func:operator()
SYCL error: CHECK_TRY_ERROR((*stream) .memcpy((char *)tensor->data + offset, data, size) .wait()): Meet error in this line code!
  in function ggml_backend_sycl_buffer_set_tensor at /app/ggml-sycl.cpp:14735
GGML_ASSERT: /app/ggml-sycl.cpp:2919: !"SYCL error"

I've been trying also to run it with:

docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri llama-cpp-sycl -m "/app/models/c0c3c83d0ec33ffe925657a56b06771b" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33  

I'm also trying that with LocalAI where I have been creating manually a container with sycl, however there the error is different (PR mudler/LocalAI#1689 ):

# Build the image
> sudo docker build --build-arg GO_TAGS="none" --build-arg BUILD_TYPE=sycl_f32 --build-arg IMAGE_TYPE=core --build-arg GRPC_BACKENDS=backend-assets/grpc/llama-cpp -t local-ai .
# run it with phi-2
# Note: both -v /dev/dri, --device and --privileged yields same results
> sudo docker run --privileged -e GGML_SYCL_DEBUG=1 -e DEBUG=true -ti -v $PWD/models:/build/models -p 8080:8080 --device /dev/dri --rm local-ai phi-2
....
11:06PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:44983): stderr Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
...
59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host                                            
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr Exception caught at file:/build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:13449, func:operator()          
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_norm                                                                                             
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_mul                                                                                              
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_add                                                                                              
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stdout call ggml_sycl_add                                                                                              
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr SYCL error: CHECK_TRY_ERROR(dpct::gemm_batch( *g_sycl_handles[g_main_device_index], oneapi::mkl::transpose::tran
s, oneapi::mkl::transpose::nontrans, ne01, ne11, ne10, alpha, (const void **)(ptrs_src.get() + 0 * ne23), dpct::library_data_t::real_half, nb01 / sizeof(sycl::half), (const void **)(ptrs
_src.get() + 1 * ne23), dpct::library_data_t::real_half, nb11 / sizeof(float), beta, (void **)(ptrs_dst.get() + 0 * ne23), cu_data_type, ne01, ne23, cu_compute_type)): Meet error in this
 line code!                                                                                                                                                                               
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr   in function ggml_sycl_mul_mat_mat_batched_sycl at /build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp:13449      
8:59PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:45725): stderr GGML_ASSERT: /build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp:2891: !"SYCL error"  

Cards detected:

found 6 SYCL devices:                        
  Device 0: Intel(R) Arc(TM) A770 Graphics,     compute capability 1.3,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 1: Intel(R) FPGA Emulation Device,     compute capability 1.2,
        max compute_units 16,   max work group size 67108864,   max sub group size 64,  global mem size 29321728000
  Device 2: AMD Ryzen 7 5700G with Radeon Graphics         ,    compute capability 3.0,
        max compute_units 16,   max work group size 8192,       max sub group size 64,  global mem size 29321728000
  Device 3: Intel(R) Arc(TM) A770 Graphics,     compute capability 3.0,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 4: Intel(R) Arc(TM) A770 Graphics,     compute capability 3.0,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 5: Intel(R) Arc(TM) A770 Graphics,     compute capability 1.3,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
root@b5f956e23067:/build# sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 7 5700G with Radeon Graphics          OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.27191]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.27191]

I'm trying this on Ubuntu 22.04 LTS Server (fresh install).

ping (sorry to bug you): @NeoZhangJianyu @airMeng , @luoyu-intel , @abhilash1910, @ggerganov

is Docker supposed to work? am I doing something wrong here?

To note everything works here directly from the host without Docker.

@NeoZhangJianyu
Copy link
Collaborator

NeoZhangJianyu commented Feb 8, 2024

@mudler

  1. Could you add the whole log of llama.cpp here?
    Please run following cmds in docker and share the whole log:

./build/bin/ls-sycl-device
./examples/sycl/build.sh
./examples/sycl/run-llama2.sh

  1. Will you meet same issue in host (non-docker)?

Note, we only resolve the issue in llama.cpp. If your issue is in the migration code from llama.cpp to other project, we can't make sure it works well. You need to reproduce same issue in llama.cpp.

@mudler
Copy link
Contributor Author

mudler commented Feb 8, 2024

@mudler

1. Could you add the whole log of llama.cpp here?

Re-issuing the same steps now again but this time worked ... so it seems it was something sporadic

   Please run following cmds in docker and share the whole log:

./build/bin/ls-sycl-device ./examples/sycl/build.sh ./examples/sycl/run-llama2.sh

2. Will you meet same issue in host (non-docker)?

No, it works from the host just fine

Note, we only resolve the issue in llama.cpp. If your issue is in the migration code from llama.cpp to other project, we can't make sure it works well. You need to reproduce same issue in llama.cpp.

Well, thanks. LocalAI was one of the first projects supporting llama.cpp and it always will be. We just use the same llama.cpp server code but we have a gRPC Server on top.

However, I find this attitude of yours quite an "unfriendly" approach from a downstream project perspective: If LocalAI cannot consume llama.cpp, other project might face the same issues as well, and having documented errors/solution is helpful and inline on how collaborating between Open source projects works in general.

As I see it LocalAI is bringing users to know llama.cpp and, as it is a library and can be imported, that opens up to have integration problem as well that makes sense to solve in the llama.cpp codebase.

Going to close this one, the problem I had with LocalAI I've solved it by using the intel images directly. Unfortunately using ubuntu 22.04 and following the Intel steps to install the required dependencies didn't worked out and resulted just in a loss of time.

@mudler mudler closed this as completed Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants