[bug] ollama returns garbage for longer texts #12702

frzifus · 2025-01-12T11:49:40Z

I have installed ollama on a system with an intel arc a770 and loaded llama3.2:3b.
The initial loading of the model takes a long time, but it works.
Initial requests are successfully answered with ~1000t/s. As the chat continues, things get a bit weird. In the middle of a story, the text turned into javascript and then into pure garbage.

screenshot

Thats the deployment I used.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: open-webui-config
  namespace: ollama
data:
  OLLAMA_BASE_URL: "http://ollama:11434"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
        - name: ollama
          image: intelanalytics/ipex-llm-inference-cpp-xpu:2.2.0-SNAPSHOT
          env:
            - name: OLLAMA_HOST
              value: "0.0.0.0:11434"
            - name: ZES_ENABLE_SYSMAN
              value: "1"
            - name: OLLAMA_INTEL_GPU
              value: "true"
          command:
            - /bin/sh
            - -c
            - |
              mkdir -p /llm/ollama
              cd /llm/ollama
              init-ollama
              ./ollama serve
          ports:
            - containerPort: 11434
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /root/.ollama
              name: ollama-data
          resources:
            requests:
              memory: "4096Mi"
              cpu: "1"
            limits:
              cpu: "4"
              memory: "8192Mi"
      volumes:
        - name: ollama-data
          persistentVolumeClaim:
            claimName: ollama-data

Logs:

found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|    1.6|    512|    1024|   32| 16225M|            1.3.31294|
llama_kv_cache_init:      SYCL0 KV buffer size =   896.00 MiB
llama_new_context_with_model: KV self size  =  896.00 MiB, K (f16):  448.00 MiB, V (f16):  448.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =     2.00 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =   256.50 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    22.01 MiB
llama_new_context_with_model: graph nodes  = 790
llama_new_context_with_model: graph splits = 2
time=2025-01-12T19:31:36.457+08:00 level=WARN source=runner.go:894 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
time=2025-01-12T19:31:45.794+08:00 level=INFO source=server.go:619 msg="llama runner started in 11.28 seconds"

Linking:

Add support for Intel Arc GPUs ollama/ollama#1590

ACupofAir · 2025-01-14T01:22:06Z

The problem cannot be reproduced, and the output is still normal after trying multiple rounds of sessions.

frzifus · 2025-01-14T09:23:39Z

mh.. Let me try again and come back to you.

frzifus · 2025-01-17T01:41:44Z

~~It worked until this log line occured:~~

time=2025-01-17T09:34:37.836+08:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=2175 keep=5 new=2048

Update

~~It seems to have nothing to do with the previously listed log line.~~
~~The next test did not show anything. Nevertheless, the following error occurred:~~

screenshot

Update

It seems to be a problem when the vram is running full. As soon as I reduce the context length, the problem disappears.

sgwhat self-assigned this Jan 13, 2025

glorysdj assigned ACupofAir Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] ollama returns garbage for longer texts #12702

[bug] ollama returns garbage for longer texts #12702

frzifus commented Jan 12, 2025

ACupofAir commented Jan 14, 2025

frzifus commented Jan 14, 2025

frzifus commented Jan 17, 2025 •

edited

Loading

[bug] ollama returns garbage for longer texts #12702

[bug] ollama returns garbage for longer texts #12702

Comments

frzifus commented Jan 12, 2025

ACupofAir commented Jan 14, 2025

frzifus commented Jan 14, 2025

frzifus commented Jan 17, 2025 • edited Loading

frzifus commented Jan 17, 2025 •

edited

Loading