Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use ollama in the ipex-llm docker container #12654

Open
ca1ic0 opened this issue Jan 6, 2025 · 10 comments
Open

Unable to use ollama in the ipex-llm docker container #12654

ca1ic0 opened this issue Jan 6, 2025 · 10 comments
Assignees

Comments

@ca1ic0
Copy link
Contributor

ca1ic0 commented Jan 6, 2025

On the Host, i could use ollama and ipex with a Arc750 GPU but,
In the container, i got a fail , the step is :
0.start the container

#/bin/bash
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
sudo docker run -itd \
                --net=host \
                --device=/dev/dri \
                -v ~/.ollama/models:/root/models \
                -e no_proxy=localhost,127.0.0.1 \
                --memory="32G" \
                --name=$CONTAINER_NAME \
                -e bench_model="mistral-7b-v0.1.Q4_0.gguf" \
                -e DEVICE=Arc \
                --shm-size="16g" \
                $DOCKER_IMAGE
  1. Verify the device map
sycl-ls

root@calico-B450M-HDV-R4-0:/llm/scripts# sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 5 5500                                OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A750 Graphics OpenCL 3.0 NEO  [24.39.31294.12]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 1.6 [1.3.31294]
  1. start ollama
cd /llm/scripts/
# set the recommended Env
source ipex-llm-init --gpu --device $DEVICE
bash start-ollama.sh # ctrl+c to exit, and the ollama serve will run on the background

output:

root@calico-B450M-HDV-R4-0:/llm/scripts# source ipex-llm-init --gpu --device $DEVICE
found oneapi in /opt/intel/oneapi/setvars.sh

:: initializing oneAPI environment ...
   bash: BASH_VERSION = 5.1.16(1)-release
   args: Using "$@" for setvars.sh arguments: --force
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.11/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
+++++ Env Variables +++++
Internal:
    ENABLE_IOMP     = 1
    ENABLE_GPU      = 1
    ENABLE_JEMALLOC = 0
    ENABLE_TCMALLOC = 0
    LIB_DIR    = /usr/local/lib
    BIN_DIR    = bin64
    LLM_DIR    = /usr/local/lib/python3.11/dist-packages/ipex_llm

Exported:
    LD_PRELOAD             =
    OMP_NUM_THREADS        =
    MALLOC_CONF            =
    USE_XETLA              = OFF
    ENABLE_SDP_FUSION      =
    SYCL_CACHE_PERSISTENT  = 1
    BIGDL_LLM_XMX_DISABLED =
    SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1
+++++++++++++++++++++++++
Complete.
root@calico-B450M-HDV-R4-0:/llm/scripts# bash start-ollama.sh
root@calico-B450M-HDV-R4-0:/llm/scripts# 2025/01/06 13:59:18 routes.go:1197: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]"
time=2025-01-06T13:59:18.275+08:00 level=INFO source=images.go:753 msg="total blobs: 28"
time=2025-01-06T13:59:18.276+08:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2025-01-06T13:59:18.276+08:00 level=INFO source=routes.go:1248 msg="Listening on 127.0.0.1:11434 (version 0.4.6-ipexllm-20250104)"
time=2025-01-06T13:59:18.276+08:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama4065154349/runners
time=2025-01-06T13:59:18.313+08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[ipex_llm]
  1. invoke a http request to ollama and got error
root@calico-B450M-HDV-R4-0:/llm# curl http://localhost:11434/api/generate -d '
{
   "model": "qwen2.5",
   "prompt": "What is AI?",
   "stream": false
}'
{"error":"llama runner process has terminated: exit status 2"}
  1. then the crack log of ollama is :
root@calico-B450M-HDV-R4-0:/llm/scripts# 2025/01/06 13:59:18 routes.go:1197: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]"
time=2025-01-06T13:59:18.275+08:00 level=INFO source=images.go:753 msg="total blobs: 28"
time=2025-01-06T13:59:18.276+08:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2025-01-06T13:59:18.276+08:00 level=INFO source=routes.go:1248 msg="Listening on 127.0.0.1:11434 (version 0.4.6-ipexllm-20250104)"
time=2025-01-06T13:59:18.276+08:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama4065154349/runners
time=2025-01-06T13:59:18.313+08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[ipex_llm]
time=2025-01-06T14:00:37.530+08:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2025-01-06T14:00:37.530+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.531+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.531+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.535+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.562+08:00 level=INFO source=server.go:105 msg="system memory" total="15.5 GiB" free="14.0 GiB" free_swap="4.0 GiB"
time=2025-01-06T14:00:37.563+08:00 level=INFO source=memory.go:356 msg="offload to device" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[14.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-01-06T14:00:37.563+08:00 level=INFO source=server.go:401 msg="starting llama server" cmd="/tmp/ollama4065154349/runners/ipex_llm/ollama_llama_server --model /root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 6 --no-mmap --parallel 4 --port 40091"
time=2025-01-06T14:00:37.563+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-01-06T14:00:37.563+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-01-06T14:00:37.563+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-01-06T14:00:37.595+08:00 level=INFO source=runner.go:956 msg="starting go runner"
time=2025-01-06T14:00:37.596+08:00 level=INFO source=runner.go:957 msg=system info="AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=6
time=2025-01-06T14:00:37.596+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:40091"
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_head           = 28
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18944
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 7.62 B
llm_load_print_meta: model size       = 4.36 GiB (4.91 BPW)
llm_load_print_meta: general.name     = Qwen2.5 7B Instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: max token length = 256
time=2025-01-06T14:00:37.815+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  4168.09 MiB
llm_load_tensors:  SYCL_Host buffer size =   292.36 MiB
SIGBUS: bus error
PC=0x77c6f51788ca m=3 sigcode=2 addr=0x77c526b14000
signal arrived during cgo execution

goroutine 6 gp=0xc000007dc0 m=3 mp=0xc000077008 [syscall]:
runtime.cgocall(0x56502fb0f9f0, 0xc000085b90)
        runtime/cgocall.go:157 +0x4b fp=0xc000085b68 sp=0xc000085b30 pc=0x56502f89046b
ollama/llama/llamafile._Cfunc_llama_load_model_from_file(0x77c688000d40, {0x3e7, 0x1, 0x0, 0x0, 0x0, 0x56502fb0f3e0, 0xc000014308, 0x0, 0x0, ...})
        _cgo_gotypes.go:692 +0x50 fp=0xc000085b90 sp=0xc000085b68 pc=0x56502f98e310
ollama/llama/llamafile.LoadModelFromFile.func1({0x7ffea36e20ef?, 0x0?}, {0x3e7, 0x1, 0x0, 0x0, 0x0, 0x56502fb0f3e0, 0xc000014308, 0x0, ...})
        ollama/llama/llamafile/llama.go:225 +0xfa fp=0xc000085c78 sp=0xc000085b90 pc=0x56502f990c1a
ollama/llama/llamafile.LoadModelFromFile({0x7ffea36e20ef, 0x62}, {0x3e7, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc0000441a0, ...})
        ollama/llama/llamafile/llama.go:225 +0x2d5 fp=0xc000085db8 sp=0xc000085c78 pc=0x56502f990955
main.(*Server).loadModel(0xc0000ca120, {0x3e7, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc0000441a0, 0x0}, ...)
        ollama/llama/runner/runner.go:861 +0xc5 fp=0xc000085f10 sp=0xc000085db8 pc=0x56502fb0cf65
main.main.gowrap1()
        ollama/llama/runner/runner.go:990 +0xda fp=0xc000085fe0 sp=0xc000085f10 pc=0x56502fb0e95a
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x56502f8f8e81
created by main.main in goroutine 1
        ollama/llama/runner/runner.go:990 +0xc6c

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0xc000050008?, 0x0?, 0xc0?, 0x61?, 0xc000049898?)
        runtime/proc.go:402 +0xce fp=0xc000049860 sp=0xc000049840 pc=0x56502f8c70ae
runtime.netpollblock(0xc0000498f8?, 0x2f88fbc6?, 0x50?)
        runtime/netpoll.go:573 +0xf7 fp=0xc000049898 sp=0xc000049860 pc=0x56502f8bf2f7
internal/poll.runtime_pollWait(0x77c6f4588020, 0x72)
        runtime/netpoll.go:345 +0x85 fp=0xc0000498b8 sp=0xc000049898 pc=0x56502f8f3b45
internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000498e0 sp=0xc0000498b8 pc=0x56502f943a67
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0000f8080)
        internal/poll/fd_unix.go:611 +0x2ac fp=0xc000049988 sp=0xc0000498e0 pc=0x56502f944f2c
net.(*netFD).accept(0xc0000f8080)
        net/fd_unix.go:172 +0x29 fp=0xc000049a40 sp=0xc000049988 pc=0x56502f9b3b49
net.(*TCPListener).accept(0xc0000901e0)
        net/tcpsock_posix.go:159 +0x1e fp=0xc000049a68 sp=0xc000049a40 pc=0x56502f9c487e
net.(*TCPListener).Accept(0xc0000901e0)
        net/tcpsock.go:327 +0x30 fp=0xc000049a98 sp=0xc000049a68 pc=0x56502f9c3bd0
net/http.(*onceCloseListener).Accept(0xc0000ca1b0?)
        <autogenerated>:1 +0x24 fp=0xc000049ab0 sp=0xc000049a98 pc=0x56502faeade4
net/http.(*Server).Serve(0xc0000fe000, {0x56502fe16560, 0xc0000901e0})
        net/http/server.go:3260 +0x33e fp=0xc000049be0 sp=0xc000049ab0 pc=0x56502fae1bfe
main.main()
        ollama/llama/runner/runner.go:1015 +0x10cd fp=0xc000049f50 sp=0xc000049be0 pc=0x56502fb0e5cd
runtime.main()
        runtime/proc.go:271 +0x29d fp=0xc000049fe0 sp=0xc000049f50 pc=0x56502f8c6c7d
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x56502f8f8e81

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:402 +0xce fp=0xc000070fa8 sp=0xc000070f88 pc=0x56502f8c70ae
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.forcegchelper()
        runtime/proc.go:326 +0xb8 fp=0xc000070fe0 sp=0xc000070fa8 pc=0x56502f8c6f38
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x56502f8f8e81
created by runtime.init.6 in goroutine 1
        runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:402 +0xce fp=0xc000071780 sp=0xc000071760 pc=0x56502f8c70ae
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.bgsweep(0xc0000240e0)
        runtime/mgcsweep.go:278 +0x94 fp=0xc0000717c8 sp=0xc000071780 pc=0x56502f8b1bf4
runtime.gcenable.gowrap1()
        runtime/mgc.go:203 +0x25 fp=0xc0000717e0 sp=0xc0000717c8 pc=0x56502f8a6725
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x56502f8f8e81
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc0000240e0?, 0x56502fb8e208?, 0x1?, 0x0?, 0xc000007340?)
        runtime/proc.go:402 +0xce fp=0xc000071f78 sp=0xc000071f58 pc=0x56502f8c70ae
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.(*scavengerState).park(0x56502ffe0680)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000071fa8 sp=0xc000071f78 pc=0x56502f8af5e9
runtime.bgscavenge(0xc0000240e0)
        runtime/mgcscavenge.go:653 +0x3c fp=0xc000071fc8 sp=0xc000071fa8 pc=0x56502f8afb7c
runtime.gcenable.gowrap2()
        runtime/mgc.go:204 +0x25 fp=0xc000071fe0 sp=0xc000071fc8 pc=0x56502f8a66c5
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x56502f8f8e81
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000070648?, 0x56502f89a025?, 0xa8?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:402 +0xce fp=0xc000070620 sp=0xc000070600 pc=0x56502f8c70ae
runtime.runfinq()
        runtime/mfinal.go:194 +0x107 fp=0xc0000707e0 sp=0xc000070620 pc=0x56502f8a5767
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x56502f8f8e81
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:164 +0x3d

goroutine 7 gp=0xc0000fc000 m=nil [semacquire]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:402 +0xce fp=0xc000072e08 sp=0xc000072de8 pc=0x56502f8c70ae
runtime.goparkunlock(...)
        runtime/proc.go:408
runtime.semacquire1(0xc0000ca128, 0x0, 0x1, 0x0, 0x12)
        runtime/sema.go:160 +0x22c fp=0xc000072e70 sp=0xc000072e08 pc=0x56502f8d94cc
sync.runtime_Semacquire(0x0?)
        runtime/sema.go:62 +0x25 fp=0xc000072ea8 sp=0xc000072e70 pc=0x56502f8f5305
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:116 +0x48 fp=0xc000072ed0 sp=0xc000072ea8 pc=0x56502f913d88
main.(*Server).run(0xc0000ca120, {0x56502fe16ba0, 0xc0000a40a0})
        ollama/llama/runner/runner.go:315 +0x47 fp=0xc000072fb8 sp=0xc000072ed0 pc=0x56502fb09627
main.main.gowrap2()
        ollama/llama/runner/runner.go:995 +0x28 fp=0xc000072fe0 sp=0xc000072fb8 pc=0x56502fb0e848
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x56502f8f8e81
created by main.main in goroutine 1
        ollama/llama/runner/runner.go:995 +0xd3e

goroutine 8 gp=0xc0000fc1c0 m=nil [IO wait]:
runtime.gopark(0x94?, 0xc0000f1958?, 0x40?, 0x19?, 0xb?)
        runtime/proc.go:402 +0xce fp=0xc0000f1910 sp=0xc0000f18f0 pc=0x56502f8c70ae
runtime.netpollblock(0x56502f92d5f8?, 0x2f88fbc6?, 0x50?)
        runtime/netpoll.go:573 +0xf7 fp=0xc0000f1948 sp=0xc0000f1910 pc=0x56502f8bf2f7
internal/poll.runtime_pollWait(0x77c6f4587f28, 0x72)
        runtime/netpoll.go:345 +0x85 fp=0xc0000f1968 sp=0xc0000f1948 pc=0x56502f8f3b45
internal/poll.(*pollDesc).wait(0xc0000f8100?, 0xc000200000?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000f1990 sp=0xc0000f1968 pc=0x56502f943a67
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0000f8100, {0xc000200000, 0x1000, 0x1000})
        internal/poll/fd_unix.go:164 +0x27a fp=0xc0000f1a28 sp=0xc0000f1990 pc=0x56502f9445ba
net.(*netFD).Read(0xc0000f8100, {0xc000200000?, 0xc0000f1a98?, 0x56502f943f25?})
        net/fd_posix.go:55 +0x25 fp=0xc0000f1a70 sp=0xc0000f1a28 pc=0x56502f9b2a45
net.(*conn).Read(0xc000074098, {0xc000200000?, 0x0?, 0xc0000b2ed8?})
        net/net.go:185 +0x45 fp=0xc0000f1ab8 sp=0xc0000f1a70 pc=0x56502f9bcd05
net.(*TCPConn).Read(0xc0000b2ed0?, {0xc000200000?, 0xc0000f8100?, 0xc0000f1af0?})
        <autogenerated>:1 +0x25 fp=0xc0000f1ae8 sp=0xc0000f1ab8 pc=0x56502f9c86e5
net/http.(*connReader).Read(0xc0000b2ed0, {0xc000200000, 0x1000, 0x1000})
        net/http/server.go:789 +0x14b fp=0xc0000f1b38 sp=0xc0000f1ae8 pc=0x56502fad7a0b
bufio.(*Reader).fill(0xc00004e480)
        bufio/bufio.go:110 +0x103 fp=0xc0000f1b70 sp=0xc0000f1b38 pc=0x56502fa94303
bufio.(*Reader).Peek(0xc00004e480, 0x4)
        bufio/bufio.go:148 +0x53 fp=0xc0000f1b90 sp=0xc0000f1b70 pc=0x56502fa94433
net/http.(*conn).serve(0xc0000ca1b0, {0x56502fe16b68, 0xc0000b2db0})
        net/http/server.go:2079 +0x749 fp=0xc0000f1fb8 sp=0xc0000f1b90 pc=0x56502fadd769
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3290 +0x28 fp=0xc0000f1fe0 sp=0xc0000f1fb8 pc=0x56502fae1fe8
runtime.goexit({})
        runtime/asm_amd64.s:1695 +0x1 fp=0xc0000f1fe8 sp=0xc0000f1fe0 pc=0x56502f8f8e81
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3290 +0x4b4

rax    0x77c526b14000
rbx    0x5d727200
rcx    0x3800
rdx    0x3800
rdi    0x77c526b14000
rsi    0x77c644fc4730
rbp    0x77c6961fe980
rsp    0x77c6961fe7a8
r8     0x77c526b14000
r9     0x105b2f000
r10    0x1
r11    0x246
r12    0x77c644fc4730
r13    0x77c526b14000
r14    0x77c68b9daac0
r15    0x77c644fc7f40
rip    0x77c6f51788ca
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
time=2025-01-06T14:00:38.075+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-01-06T14:00:38.325+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2"

@ca1ic0
Copy link
Contributor Author

ca1ic0 commented Jan 6, 2025

i notice there are some unnormal log:

time=2025-01-06T14:00:37.530+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.531+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.531+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"
time=2025-01-06T14:00:37.535+08:00 level=WARN source=gpu.go:732 msg="unable to locate gpu dependency libraries"

@hzjane
Copy link
Contributor

hzjane commented Jan 7, 2025

@ca1ic0 I can't reproduce your problem on your calico-B450M-HDV-R4-0 ENV. I can use curl to access qwen2.5 normally by executing ./ollama pull qwen2.5 after starting ollama.

@ca1ic0
Copy link
Contributor Author

ca1ic0 commented Jan 7, 2025

@ca1ic0 I can't reproduce your problem on your calico-B450M-HDV-R4-0 ENV. I can use curl to access qwen2.5 normally by executing ./ollama pull qwen2.5 after starting ollama.

i got this , it seems like the storage map '-v ~/.ollama/models:/root/models' is wrong and the ipexllm works fine.
Thanks for your support!

@ca1ic0
Copy link
Contributor Author

ca1ic0 commented Jan 7, 2025

@ca1ic0 I can't reproduce your problem on your calico-B450M-HDV-R4-0 ENV. I can use curl to access qwen2.5 normally by executing ./ollama pull qwen2.5 after starting ollama.

it is weird , i reproduce the problem again. Is there any difference between your op and mine?

@ca1ic0
Copy link
Contributor Author

ca1ic0 commented Jan 7, 2025

this time i directly pull the qwen2.5 without map the .ollama dir.

@hzjane
Copy link
Contributor

hzjane commented Jan 7, 2025

@ca1ic0 I can't reproduce your problem on your calico-B450M-HDV-R4-0 ENV. I can use curl to access qwen2.5 normally by executing ./ollama pull qwen2.5 after starting ollama.

it is weird , i reproduce the problem again. Is there any difference between your op and mine?

Do you still meet error? I used your script to start Docker on root@calico-B450M-HDV-R4-0 and followed the steps below to run and got normal results.
image

@ca1ic0
Copy link
Contributor Author

ca1ic0 commented Jan 7, 2025

@ca1ic0 I can't reproduce your problem on your calico-B450M-HDV-R4-0 ENV. I can use curl to access qwen2.5 normally by executing ./ollama pull qwen2.5 after starting ollama.

it is weird , i reproduce the problem again. Is there any difference between your op and mine?

Do you still meet error? I used your script to start Docker on root@calico-B450M-HDV-R4-0 and followed the steps below to run and got normal results. image

Okay, now I know whats the point😳. Actually it might be casued by the command execution in container:

source ipex-llm-init --gpu --device $DEVICE

it i don't execute the command, and directly start ollama in ipex container , it works well.
But this step comes from the page,docker_cpp_xpu_quickstart.md.
So, the doc gives the wrong step and the doc is needed to be fixed ?

@hzjane
Copy link
Contributor

hzjane commented Jan 8, 2025

Thanks for your feedback. And we did not encounter any problems on ARC770 with Intel CPU. Maybe it was caused by AMD CPU.

@Xyz00777
Copy link

hi maybe i have a simmilar issue, but im not so sure. i have an ubuntu 24 lts vm as base and inside of these i have docker with the ollama instance. i mounted the arc770 into the vm with passthrough and i can see it inside the vm. i have given the container access to the gpu with /dev/dri:/dev/dri and its not working.
a few days ago i had a ubuntu 22 machine where i was still building the ollama and because of that also had to download the complete build suite and there the container with the same settings was working.
So what i think, Can it be that the vm is missing the drivers and because of that they also missing inside the container to be used by the llamacpp?

@ca1ic0
Copy link
Contributor Author

ca1ic0 commented Jan 14, 2025

ays ago i had a ubuntu 22 machine where i was still building the ollama and because of that also had to download the complete build suite and there the container with the

I use PVE as hypervisor before, and the passthrough of a750 works fine. Even the vm os is windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants