Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc error: code = Unknown desc = unimplemented #800

Open
allenhaozi opened this issue Jul 25, 2023 · 25 comments
Open

rpc error: code = Unknown desc = unimplemented #800

allenhaozi opened this issue Jul 25, 2023 · 25 comments
Assignees
Labels
bug Something isn't working

Comments

@allenhaozi
Copy link

What went wrong? Settings?

quay.io/go-skynet/local-ai:master-cublas-cuda11

request:

{
    "model": "llama-7b-hf",
    "messages": [
        {
            "role": "user",
            "content": "Hello! What is your name?"
        }
    ]
}

response:

{
    "error": {
        "code": 500,
        "message": "rpc error: code = Unknown desc = unimplemented",
        "type": ""
    }
}

log:

@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
@@@@@
ESC[90m5:38AMESC[0m ESC[33mDBGESC[0m no galleries to load
ESC[90m5:38AMESC[0m ESC[32mINFESC[0m Starting LocalAI using 4 threads, with models path: /llm-model-volume
ESC[90m5:38AMESC[0m ESC[32mINFESC[0m LocalAI version: 12fe093 (12fe0932c41246914e455c4175269a431fb8cf60)
ESC[90m5:38AMESC[0m ESC[33mDBGESC[0m Extracting backend assets files to /tmp/localai/backend_data

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.48.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 32  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 14 │ 
 └───────────────────────────────────────────────────┘ 

6:21AM DBG Request received: 
6:21AM DBG Configuration read: &{PredictionOptions:{Model:llama-7b-hf Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:}
6:21AM DBG Parameters: &{PredictionOptions:{Model:llama-7b-hf Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:}
6:21AM DBG Prompt (before templating): Hello! What is your name?
6:21AM DBG Template failed loading: failed loading a template for llama-7b-hf
6:21AM DBG Prompt (after templating): Hello! What is your name?
6:21AM DBG Model already loaded in memory: llama-7b-hf
6:21AM DBG Model 'llama-7b-hf' already loaded
[172.27.128.150]:43283  500  -  POST     /v1/chat/completions


@allenhaozi allenhaozi added the bug Something isn't working label Jul 25, 2023
@allenhaozi
Copy link
Author

Deployed in k8s, the GPU has been configured, but it should not take effect

    resources:
      limits:
        cpu: "1"
        memory: 50Gi
        nvidia.com/gpu: "4"
      requests:
        cpu: "1"
        memory: 50Gi
        nvidia.com/gpu: "4"

@rozek
Copy link

rozek commented Jul 25, 2023

I have the same problem when running LocalAI in a Docker container. The logs contain numerous lines of the form:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33427: connect: connection refused"

with varying port numbers

@rozek
Copy link

rozek commented Jul 25, 2023

FYI: the problem occurs both in local Docker builds and the ":latest" image from go-skynet

@nabbl
Copy link

nabbl commented Jul 25, 2023

Yes same here. used the latest version with GPT4All model and it just gives errors.
Same on Kubernetes and local

@Mer0me
Copy link

Mer0me commented Jul 25, 2023

If it can help, my (very similar) error message :

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}

@rozek
Copy link

rozek commented Jul 26, 2023

I am currently trying to compile a previous release in order to see until when LocalAI worked without this problem.

Unfortunately, the Docker build command seems to expect the source to have been checked-out as a Git project and refuses to build from an unpacked ZIP archive...

Thus, I directly checked out v1.21.0, built a Docker image locally, ran it...and had the same problem as before.

For the records: here is what I did

git clone https://github.com/go-skynet/LocalAI.git --branch v1.21.0
cd LocalAI
docker build -t localai .
docker run --rm --name localai \
  -v "/path/to/your/local/models/folder":/build/models \
  -p 127.0.0.1:8080:8080 \
  localai

I also tried v1.20.1 and v1.20.0 - but these builds failed with "ggml.c:(.text+0x2e860): multiple definition of `clear_numa_thread_affinity'; /build/go-llama/libbinding.a(ggml.o):ggml.c:(.text+0x2e860): first defined here"

Building v1.19.2 succeeded - but it could not load my model (LLaMA 2) which makes it useless for me...

I don't have the time to check every previous version, but perhaps somebody else has...

@mudler
Copy link
Owner

mudler commented Jul 27, 2023

did you tried running with REBUILD=true? also please attach full logs with DEBUG=true

@rozek
Copy link

rozek commented Jul 28, 2023

Ok, so I

  • checked out the latest version from GitHub,
  • uncommented REBUILD=true and DEBUG=true in .env,
  • rebuilt using docker build -t localai . and
  • started like above using
docker run --rm --name localai \
  -v "/path/to/your/local/models/folder":/build/models \
  -p 127.0.0.1:8080:8080 \
  localai
  • and tested with
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json"  -d '{
  "model":"llama-2-13b.ggmlv3.q4_0",
  "messages": [
    {"role":"system", "content":"Perform the instructions to the best of your ability.\n"},
    {"role": "user", "content": "### Instruction: who was Joseph Weizenbaum?\n### Response:"}
  ],
  "temperature": 0.0,
  "max_tokens": 256,
  "stream": false
}'

with the same result as before. Here are the logs (mind the "skipping rebuild")

@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
CPU: no AVX    found
CPU: no AVX2   found
CPU: no AVX512 found
@@@@@
3:31AM DBG no galleries to load
3:31AM INF Starting LocalAI using 4 threads, with models path: /build/models
3:31AM INF LocalAI version: v1.22.0-19-gdde12b4 (dde12b492b2da4f14d66047a42b66bff80e223af)

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.48.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 31  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 14 │ 
 └───────────────────────────────────────────────────┘ 

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38673: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41677: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36709: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37449: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33973: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42041: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45563: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34759: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42351: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44449: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36327: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35585: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45917: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35687: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33939: connect: connection refused"

here is my .env

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=4

## Specify a different bind address (defaults to ":8080")
ADDRESS=0.0.0.0:8080

## Default models context size
CONTEXT_SIZE=4096

## Define galleries.
## models will to install will be visible in `/models/available`
#GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"name":"huggingface","url":"github:go-skynet/model-gallery/huggingface.yaml"}]

## CORS settings
CORS=true
CORS_ALLOW_ORIGINS=*

## Default path for models
# MODELS_PATH=/models

## Enable debug mode
DEBUG=true

## Specify a build type. Available: cublas, openblas, clblas.
# BUILD_TYPE=metal

## Uncomment and set to true to enable rebuilding from source
REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
# GO_TAGS=stablediffusion

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT

@rozek
Copy link

rozek commented Jul 28, 2023

here is the environment of the running container as reported by Docker (mind the "REBUILD=false")

Environment
PATH
/usr/local/cuda/bin:/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

GOLANG_VERSION
1.20.6

GOPATH
/go

BUILD_TYPE

EXTERNAL_GRPC_BACKENDS
huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py

REBUILD
false

HEALTHCHECK_ENDPOINT
http://localhost:8080/readyz

Mounts
/build/models
/Users/andreas/rozek/AI/models/meta-ai/llama2

Port
8080/tcp
127.0.0.1:8080

@Sean-McAuliffe
Copy link

adding to this, same issues here both local docker & EKS via AL2 amd64

I can get through to /v1/models ok, but can't do anything with a model otherwise I get a timeout & various forms of:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp [::1]:33427: connect: connection refused"

@Sean-McAuliffe
Copy link

seems it might just be an issue with the localAI image. building from scratch in a container works ok :

`
FROM public.ecr.aws/amazonlinux/amazonlinux:latest

RUN yum install git -y
RUN yum install golang -y
RUN yum group install "Development Tools" -y
RUN yum install cmake -y

RUN git clone https://github.com/go-skynet/LocalAI.git

WORKDIR /LocalAI

RUN make build

COPY . .

EXPOSE 8080

ENTRYPOINT [ "./local-ai", "--debug", "--models-path", "./models", "" ]
`

@swoh816
Copy link

swoh816 commented Aug 4, 2023

@rozek @nabbl @Mer0me I had precisely the same error message as you had, so our problems may be the same. I inspected the usage of hardware resources by docker containers, and at least in my case, it was the memory limit issue. Docker Desktop (in Ubuntu 22.04) ships with a default memory limit smaller than the size of LLM (gpt4all in my case). So I set the memory limit 10GB, large enough to have gpt4all, and then it worked.

It was difficult to figure out it was the memory limit issue because the error message does not deliver it directly. Also, I don't know well about Docker, nor about LLMs, so it took some time for me to figure out the source of the problem in my machine. I think it will definitely help to include a note about increasing Docker's memory limit enough to have LLM on memory in the getting started page: https://localai.io/basics/getting_started/index.html

Note that I also uncommented REBUILD=true in .env file.
Also, increasing the memory of docker by including --memory when running container did not help either. At least in my machine, I needed to increase it in the Docker Desktop application, and it seems like a common confusion (see https://stackoverflow.com/a/44533437).

@swoh816
Copy link

swoh816 commented Aug 4, 2023

@allenhaozi Also given that your debug log says it failed to load the template, I wonder if it is the issue of (1) the wrong path set to find model template, or (2) not enough memory to load template.

@allenhaozi
Copy link
Author

@allenhaozi Also given that your debug log says it failed to load the template, I wonder if it is the issue of (1) the wrong path set to find model template, or (2) not enough memory to load template.

@swoh816 , use quay.io/go-skynet/local-ai:v1.23.2-cublas-cuda11 image, got the following errors
request:

{
    "model": "chatglm2-6b",
    "messages": [
        {
            "role": "user",
            "content": "How are you?"
        }
    ],
    "temperature": 0.9
}

response:

{
    "error": {
        "code": 500,
        "message": "rpc error: code = Unknown desc = unimplemented",
        "type": ""
    }
}

log:

4:07AM DBG Request received: 
4:07AM DBG Configuration read: &{PredictionOptions:{Model:chatglm2-6b Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:1 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt: RMSNormEps:0 NGQA:0}
4:07AM DBG Parameters: &{PredictionOptions:{Model:chatglm2-6b Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:1 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt: RMSNormEps:0 NGQA:0}
4:07AM DBG Prompt (before templating): How are you?
4:07AM DBG Template failed loading: failed loading a template for chatglm2-6b
4:07AM DBG Prompt (after templating): How are you?
4:07AM DBG Model already loaded in memory: chatglm2-6b
4:07AM DBG Model 'chatglm2-6b' already loaded
[10.100.116.87]:50320  500  -  POST     /v1/chat/completions

@mokkin
Copy link

mokkin commented Sep 5, 2023

I just followed the example and have the same issue here with
docker-compose version 1.29.2, build unknown

@chriswells0
Copy link

Increasing the memory as described by @swoh816 is what resolved this error for me.

Additionally, once that was fixed, text generation was extremely slow. The fix for that was to set threads equal to the number of CPU on the Kubernetes node.

@Mathematinho
Copy link

i increased the memory limit to 64 G still same message. i am using the example from "getting started".

when i uncommented REBUILD=true in .env file, i got the following error

curl: (56) Recv failure: Connection reset by peer

anything else i can try?

@shankara-n
Copy link

shankara-n commented Oct 15, 2023

Could someone share what hardware/system configuration this does build and run successfully in?

@kkkkkkjd
Copy link

I followed the example and ran into the same problem here

@TheRealAlexV
Copy link

TheRealAlexV commented Dec 4, 2023

Also getting a similar issue here.

.env

THREADS=8
CONTEXT_SIZE=4096
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
MODELS_PATH=/models
DEBUG=true
COMPEL=0
SINGLE_ACTIVE_BACKEND=true
BUILD_TYPE=cublas
REBUILD=true
GO_TAGS=stablediffusion
IMAGE_PATH=/tmp

docker-compose.yaml

version: '3.6'
services:
  api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: quay.io/go-skynet/local-ai:sha-238fec2-cublas-cuda12-ffmpeg-core
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Request & Error

…/AI/LocalAI שׂ master via 🐹 on ☁️ (us-east-1) 
🕙 19:58:46 ❯❯ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
                    "model": "llama2-7b-chat-gguf",
                    "messages": [{"role": "user", "content": "How are you?"}],
                    "temperature": 0.9 
                  }'
{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}⏎                                          

Container Logs

2023-12-03 19:58:44 12:58AM ERR error processing message {SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions Role:User: RoleName:user Content:How are you? MessageIndex:0} using template "llama2-7b-chat-gguf-chat": template: prompt:3:5: executing "prompt" at <.Input>: can't evaluate field Input in type model.ChatMessageTemplateData. Skipping!
2023-12-03 19:58:44 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46545: connect: connection refused"

Debug

2023-12-03 20:06:12 [127.0.0.1]:39930 200 - GET /readyz
2023-12-03 20:06:52 1:06AM DBG Request received: 
2023-12-03 20:06:52 1:06AM DBG Configuration read: &{PredictionOptions:{Model: Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama2-7b-chat-gguf F16:false Threads:8 Debug:true Roles:map[assistant:Assitant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama2-7b-chat-gguf-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:06:52 1:06AM DBG Parameters: &{PredictionOptions:{Model: Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama2-7b-chat-gguf F16:false Threads:8 Debug:true Roles:map[assistant:Assitant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama2-7b-chat-gguf-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:06:52 1:06AM ERR error processing message {SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions Role:User: RoleName:user Content:How are you? MessageIndex:0} using template "llama2-7b-chat-gguf-chat": template: prompt:3:5: executing "prompt" at <.Input>: can't evaluate field Input in type model.ChatMessageTemplateData. Skipping!
2023-12-03 20:06:52 1:06AM DBG Prompt (before templating): User:How are you?
2023-12-03 20:06:52 1:06AM DBG Template failed loading: failed loading a template for 
2023-12-03 20:06:52 1:06AM DBG Prompt (after templating): User:How are you?
2023-12-03 20:06:52 1:06AM DBG Loading model llama from 
2023-12-03 20:06:52 1:06AM DBG Stopping all backends except ''
2023-12-03 20:06:52 1:06AM DBG Loading model in memory from file: /models
2023-12-03 20:06:52 1:06AM DBG Loading Model  with gRPC (file: /models) (backend: llama): {backendString:llama model: threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0001da5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:true parallelRequests:false}
2023-12-03 20:06:52 1:06AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
2023-12-03 20:06:52 1:06AM DBG GRPC Service for  will be running at: '127.0.0.1:34533'
2023-12-03 20:06:52 1:06AM DBG GRPC Service state dir: /tmp/go-processmanager3341423294
2023-12-03 20:06:52 1:06AM DBG GRPC Service Started
2023-12-03 20:06:53 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34533: connect: connection refused"
2023-12-03 20:06:53 1:06AM DBG GRPC(-127.0.0.1:34533): stderr 2023/12/04 01:06:53 gRPC Server listening at 127.0.0.1:34533
2023-12-03 20:06:55 1:06AM DBG GRPC Service Ready
2023-12-03 20:06:55 1:06AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model: ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr create_gpt_params_cuda: loading model /models
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr ggml_init_cublas: found 1 CUDA devices:
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr   Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr gguf_init_from_file: invalid magic number 00000000
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr error loading model: llama_model_loader: failed to load model from /models
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr 
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr llama_load_model_from_file: failed to load model
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr llama_init_from_gpt_params: error: failed to load model '/models'
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr load_binding_model: error: unable to load model
2023-12-03 20:06:55 [172.18.0.1]:54898 500 - POST /v1/chat/completions
2023-12-03 20:07:12 [127.0.0.1]:37870 200 - GET /readyz

If I change to the lunademo model from the model-gallery (also used in the model setup how-to), I get many more errors in debug:

2023-12-03 20:12:13 [127.0.0.1]:51240 200 - GET /readyz
2023-12-03 20:12:28 1:12AM DBG Request received: 
2023-12-03 20:12:28 1:12AM DBG Configuration read: &{PredictionOptions:{Model:luna-ai-llama2-uncensored.Q4_K_M.gguf Language: N:0 TopP:0 TopK:0 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:lunademo F16:false Threads:10 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:luna-chat-message ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:12:28 1:12AM DBG Parameters: &{PredictionOptions:{Model:luna-ai-llama2-uncensored.Q4_K_M.gguf Language: N:0 TopP:0 TopK:0 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:lunademo F16:false Threads:10 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:luna-chat-message ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:12:28 1:12AM DBG Prompt (before templating): How are you?
2023-12-03 20:12:28 1:12AM DBG Template found, input modified to: How are you?
2023-12-03 20:12:28 
2023-12-03 20:12:28 ASSISTANT:
2023-12-03 20:12:28 
2023-12-03 20:12:28 1:12AM DBG Prompt (after templating): How are you?
2023-12-03 20:12:28 
2023-12-03 20:12:28 ASSISTANT:
2023-12-03 20:12:28 
2023-12-03 20:12:28 1:12AM DBG Loading model llama from luna-ai-llama2-uncensored.Q4_K_M.gguf
2023-12-03 20:12:28 1:12AM DBG Stopping all backends except 'luna-ai-llama2-uncensored.Q4_K_M.gguf'
2023-12-03 20:12:28 1:12AM DBG [single-backend] Stopping 
2023-12-03 20:12:28 1:12AM DBG Loading model in memory from file: /models/luna-ai-llama2-uncensored.Q4_K_M.gguf
2023-12-03 20:12:28 1:12AM DBG Loading Model luna-ai-llama2-uncensored.Q4_K_M.gguf with gRPC (file: /models/luna-ai-llama2-uncensored.Q4_K_M.gguf) (backend: llama): {backendString:llama model:luna-ai-llama2-uncensored.Q4_K_M.gguf threads:10 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0001da5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:true parallelRequests:false}
2023-12-03 20:12:28 1:12AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
2023-12-03 20:12:28 1:12AM DBG GRPC Service for luna-ai-llama2-uncensored.Q4_K_M.gguf will be running at: '127.0.0.1:45223'
2023-12-03 20:12:28 1:12AM DBG GRPC Service state dir: /tmp/go-processmanager3150385545
2023-12-03 20:12:28 1:12AM DBG GRPC Service Started
2023-12-03 20:12:28 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45223: connect: connection refused"
2023-12-03 20:12:28 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr 2023/12/04 01:12:28 gRPC Server listening at 127.0.0.1:45223
2023-12-03 20:12:30 1:12AM DBG GRPC Service Ready
2023-12-03 20:12:30 1:12AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:luna-ai-llama2-uncensored.Q4_K_M.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:10 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/luna-ai-llama2-uncensored.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr create_gpt_params_cuda: loading model /models/luna-ai-llama2-uncensored.Q4_K_M.gguf
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr ggml_init_cublas: found 1 CUDA devices:
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr   Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /models/luna-ai-llama2-uncensored.Q4_K_M.gguf (version GGUF V2 (latest))
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    6:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    7:              blk.0.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    8:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor    9:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   10:              blk.1.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   11:              blk.1.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   12:              blk.1.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   13:         blk.1.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   14:            blk.1.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   15:            blk.1.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   16:              blk.1.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   17:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   18:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   19:              blk.2.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   20:              blk.2.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   21:              blk.2.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   22:         blk.2.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   23:            blk.2.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   24:            blk.2.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   25:              blk.2.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   26:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   27:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   28:              blk.3.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   29:              blk.3.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   30:              blk.3.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   31:         blk.3.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   32:            blk.3.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   33:            blk.3.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   34:              blk.3.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   35:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   36:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   37:              blk.4.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   38:              blk.4.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   39:              blk.4.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   40:         blk.4.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   41:            blk.4.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   42:            blk.4.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   43:              blk.4.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   44:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   45:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   46:              blk.5.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   47:              blk.5.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   48:              blk.5.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   49:         blk.5.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   50:            blk.5.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   51:            blk.5.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   52:              blk.5.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   53:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   54:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   55:              blk.6.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   56:              blk.6.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   57:              blk.6.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   58:         blk.6.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   59:            blk.6.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   60:            blk.6.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   61:              blk.6.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   62:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   63:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   64:              blk.7.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   65:              blk.7.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   66:              blk.7.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   67:         blk.7.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   68:            blk.7.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   69:            blk.7.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   70:              blk.7.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   71:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   72:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   73:              blk.8.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   74:              blk.8.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   75:              blk.8.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   76:         blk.8.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   77:            blk.8.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   78:            blk.8.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   79:              blk.8.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   80:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   81:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   82:              blk.9.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   83:              blk.9.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   84:              blk.9.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   85:         blk.9.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   86:            blk.9.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   87:            blk.9.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   88:              blk.9.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   89:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   90:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   91:             blk.10.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   92:             blk.10.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   93:             blk.10.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   94:        blk.10.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   95:           blk.10.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   96:           blk.10.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   97:             blk.10.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   98:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor   99:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  100:             blk.11.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  101:             blk.11.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  102:             blk.11.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  103:        blk.11.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  104:           blk.11.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  105:           blk.11.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  106:             blk.11.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  107:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  109:             blk.12.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  110:             blk.12.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  111:             blk.12.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  112:        blk.12.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  113:           blk.12.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  114:           blk.12.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  115:             blk.12.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  116:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  117:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  118:             blk.13.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  119:             blk.13.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  120:             blk.13.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  121:        blk.13.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  122:           blk.13.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  123:           blk.13.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  124:             blk.13.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  125:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  126:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  127:             blk.14.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  128:             blk.14.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  129:             blk.14.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  130:        blk.14.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  131:           blk.14.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  132:           blk.14.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  133:             blk.14.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  134:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  135:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  136:             blk.15.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  137:             blk.15.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  138:             blk.15.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  139:        blk.15.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  140:           blk.15.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  141:           blk.15.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  142:             blk.15.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  143:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  144:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  145:             blk.16.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  146:             blk.16.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  147:             blk.16.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  148:        blk.16.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  149:           blk.16.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  150:           blk.16.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  151:             blk.16.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  152:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  153:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  154:             blk.17.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  155:             blk.17.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  156:             blk.17.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  157:        blk.17.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  158:           blk.17.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  159:           blk.17.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  160:             blk.17.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  161:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  162:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  163:             blk.18.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  164:             blk.18.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  165:             blk.18.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  166:        blk.18.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  167:           blk.18.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  168:           blk.18.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  169:             blk.18.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  170:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  171:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  172:             blk.19.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  173:             blk.19.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  174:             blk.19.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  175:        blk.19.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  176:           blk.19.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  177:           blk.19.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  178:             blk.19.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  179:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  180:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  181:             blk.20.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  182:             blk.20.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  183:             blk.20.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  184:        blk.20.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  185:           blk.20.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  186:           blk.20.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  187:             blk.20.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  188:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  189:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  190:             blk.21.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  191:             blk.21.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  192:             blk.21.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  193:        blk.21.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  194:           blk.21.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  195:           blk.21.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  196:             blk.21.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  197:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  198:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  199:             blk.22.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  200:             blk.22.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  201:             blk.22.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  202:        blk.22.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  203:           blk.22.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  204:           blk.22.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  205:             blk.22.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  206:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  207:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  208:             blk.23.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  209:             blk.23.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  210:             blk.23.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  211:        blk.23.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  212:           blk.23.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  213:           blk.23.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  214:             blk.23.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  215:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  216:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  217:             blk.24.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  218:             blk.24.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  219:             blk.24.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  220:        blk.24.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  221:           blk.24.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  222:           blk.24.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  223:             blk.24.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  224:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  225:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  226:             blk.25.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  227:             blk.25.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  228:             blk.25.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  229:        blk.25.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  230:           blk.25.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  231:           blk.25.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  232:             blk.25.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  233:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  234:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  235:             blk.26.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  236:             blk.26.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  237:             blk.26.attn_v.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  238:        blk.26.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  239:           blk.26.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  240:           blk.26.ffn_down.weight q4_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  241:             blk.26.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  242:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  243:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  244:             blk.27.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  245:             blk.27.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  246:             blk.27.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  247:        blk.27.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  248:           blk.27.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  249:           blk.27.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  250:             blk.27.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  251:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  252:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  253:             blk.28.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  254:             blk.28.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  255:             blk.28.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  256:        blk.28.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  257:           blk.28.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  258:           blk.28.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  259:             blk.28.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  260:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  261:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  262:             blk.29.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  263:             blk.29.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  264:             blk.29.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  265:        blk.29.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  266:           blk.29.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  267:           blk.29.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  268:             blk.29.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  269:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  270:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  271:             blk.30.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  272:             blk.30.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  273:             blk.30.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  274:        blk.30.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  275:           blk.30.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  276:           blk.30.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  277:             blk.30.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  278:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  279:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  280:             blk.31.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  281:             blk.31.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  282:             blk.31.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  283:        blk.31.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  284:           blk.31.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  285:           blk.31.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  286:             blk.31.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  287:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  288:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  289:               output_norm.weight f32      [  4096,     1,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor  290:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   0:                       general.architecture str     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   1:                               general.name str     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   2:                       llama.context_length u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   3:                     llama.embedding_length u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   4:                          llama.block_count u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   5:                  llama.feed_forward_length u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   6:                 llama.rope.dimension_count u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   7:                 llama.attention.head_count u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   8:              llama.attention.head_count_kv u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  10:                          general.file_type u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv  18:               general.quantization_version u32     
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - type  f32:   65 tensors
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - type q4_K:  193 tensors
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - type q6_K:   33 tensors
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: format         = GGUF V2 (latest)
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: arch           = llama
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: vocab type     = SPM
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_vocab        = 32000
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_merges       = 0
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_ctx_train    = 2048
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_ctx          = 4096
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_embd         = 4096
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_head         = 32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_head_kv      = 32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_layer        = 32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_rot          = 128
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_gqa          = 1
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: f_norm_eps     = 0.0e+00
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: f_norm_rms_eps = 1.0e-05
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_ff           = 11008
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: freq_base      = 10000.0
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: freq_scale     = 1
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model type     = 7B
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model ftype    = mostly Q4_K - Medium
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model params   = 6.74 B
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model size     = 3.80 GiB (4.84 BPW) 
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: general.name   = tap-m_luna-ai-llama2-uncensored
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: BOS token = 1 '<s>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: EOS token = 2 '</s>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: UNK token = 0 '<unk>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: PAD token = 0 '<unk>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: LF token  = 13 '<0x0A>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: ggml ctx size = 3891.34 MB
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: using CUDA for GPU acceleration
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: mem required  = 3891.34 MB (+ 4096.00 MB per state)
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: offloading 0 repeating layers to GPU
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: offloaded 0/35 layers to GPU
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: VRAM used: 0 MB
2023-12-03 20:12:35 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr ..................................................................................................
2023-12-03 20:12:37 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_new_context_with_model: kv self size  = 4096.00 MB
2023-12-03 20:12:37 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_new_context_with_model: compute buffer total size =  281.47 MB
2023-12-03 20:12:37 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_new_context_with_model: VRAM scratch buffer: 280.00 MB
2023-12-03 20:13:13 [127.0.0.1]:34806 200 - GET /readyz

@benm5678
Copy link

benm5678 commented Dec 6, 2023

Is there any solution/workaround? I get the same error with various models when deployed to EKS (locally I can run it fine on minikube).

@chris-hatton
Copy link

Another frustrated user here; I can't get anything to work, including the 'Getting started' instructions. Trying a cublas build with Docker. I get the feeling the Local.AI architecture is failing to surface errors from the back-end that would tell the problem. Requested #1416

@FarhanSajid1
Copy link

same issue

@benm5678
Copy link

benm5678 commented Jan 24, 2024

In case it helps, I was facing similar errors trying to host llama2 model on AWS EKS with A10 gpu. First we upgraded Nvidia to latest 5.* driver. Second, I needed to also deploy a model yaml file to set f16/gpu_layers (it wasn't enough just to have those as env params as the helm chart pushes). The LocalAI API methods below help you do it all easily -- can search for a model via their gallery & push it with the settings you want (here you can also specify the backend, so it doesn't guess):

Get available from gallery

curl http://localhost:8000/models/available | jq '.[] | select(.name | contains("llama2"))'

Install from gallery

curl http://localhost:8000/models/apply -H "Content-Type: application/json" -d '{
     "id": "thebloke__llama2-chat-ayt-13b-gguf__llama2-chat-ayt-13b.q5_k_s.gguf",
     "overrides": {
        "backend": "llama",
        "f16": true,
        "gpu_layers": 43
     }
   }'

@jeryaiwei
Copy link

@mudler

class BackendServiceImpl final : public backend::Backend::Service {
public:
grpc::Status Health(ServerContext* context, const backend::HealthMessage* request, backend::Reply* reply) {
// Implement Health RPC
reply->set_message("OK");
return Status::OK;
}
grpc::Status LoadModel(ServerContext* context, const backend::ModelOptions* request, backend::Result* result) {
// Implement LoadModel RPC
gpt_params params;
params_parse(request, params);
llama_backend_init();
llama_numa_init(params.numa);
// load the model
if (!llama.load_model(params))
{
result->set_message("Failed loading model");
result->set_success(false);
return Status::CANCELLED;
}
llama.initialize();
result->set_message("Loading succeeded");
result->set_success(true);
loaded_model = true;
return Status::OK;
}
grpc::Status PredictStream(grpc::ServerContext* context, const backend::PredictOptions* request, grpc::ServerWriter<backend::Reply>* writer) override {
json data = parse_options(true, request, llama);
const int task_id = llama.queue_tasks.get_new_id();
llama.queue_results.add_waiting_task_id(task_id);
llama.request_completion(task_id, data, false, false, -1);
while (true)
{
task_result result = llama.queue_results.recv(task_id);
if (!result.error) {
const std::string str =
"data: " +
result.result_json.dump(-1, ' ', false, json::error_handler_t::replace) +
"\n\n";
LOG_VERBOSE("data stream", {
{ "to_send", str }
});
backend::Reply reply;
// print it
std::string completion_text = result.result_json.value("content", "");
reply.set_message(completion_text);
// Send the reply
writer->Write(reply);
if (result.stop) {
break;
}
} else {
break;
}
}
return grpc::Status::OK;
}
grpc::Status Predict(ServerContext* context, const backend::PredictOptions* request, backend::Reply* reply) {
json data = parse_options(false, request, llama);
const int task_id = llama.queue_tasks.get_new_id();
llama.queue_results.add_waiting_task_id(task_id);
llama.request_completion(task_id, data, false, false, -1);
std::string completion_text;
task_result result = llama.queue_results.recv(task_id);
if (!result.error && result.stop) {
completion_text = result.result_json.value("content", "");
reply->set_message(completion_text);
}
else
{
return grpc::Status::OK;
}
return grpc::Status::OK;
}
};

1 not unimplemented grpc::Status Embedding(ServerContext* context, const backend::PredictOptions* request, backend::EmbeddingResult* reply) method.
if backend =llama-cpp:
{ "error": { "code": 500, "message": "rpc error: code = Unknown desc = unimplemented", "type": "" } }
Bert.cpp has been integrated into llama.cpp! See ggerganov/llama.cpp#5423 and the discussions
Updated forks: iamlemec/bert.cpp xyzhang626/embeddings.cpp
2 backend/go/llm/llama not used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests