-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc error: code = Unknown desc = unimplemented #800
Comments
Deployed in k8s, the GPU has been configured, but it should not take effect
|
I have the same problem when running LocalAI in a Docker container. The logs contain numerous lines of the form:
with varying port numbers |
FYI: the problem occurs both in local Docker builds and the ":latest" image from go-skynet |
Yes same here. used the latest version with GPT4All model and it just gives errors. |
If it can help, my (very similar) error message :
|
I am currently trying to compile a previous release in order to see until when LocalAI worked without this problem. Unfortunately, the Docker build command seems to expect the source to have been checked-out as a Git project and refuses to build from an unpacked ZIP archive... Thus, I directly checked out v1.21.0, built a Docker image locally, ran it...and had the same problem as before. For the records: here is what I did
I also tried v1.20.1 and v1.20.0 - but these builds failed with "ggml.c:(.text+0x2e860): multiple definition of `clear_numa_thread_affinity'; /build/go-llama/libbinding.a(ggml.o):ggml.c:(.text+0x2e860): first defined here" Building v1.19.2 succeeded - but it could not load my model (LLaMA 2) which makes it useless for me...
|
did you tried running with |
Ok, so I
with the same result as before. Here are the logs (mind the "skipping rebuild")
here is my .env
|
here is the environment of the running container as reported by Docker (mind the "REBUILD=false")
|
adding to this, same issues here both local docker & EKS via AL2 amd64 I can get through to /v1/models ok, but can't do anything with a model otherwise I get a timeout & various forms of:
|
seems it might just be an issue with the localAI image. building from scratch in a container works ok : ` RUN yum install git -y RUN git clone https://github.com/go-skynet/LocalAI.git WORKDIR /LocalAI RUN make build COPY . . EXPOSE 8080 ENTRYPOINT [ "./local-ai", "--debug", "--models-path", "./models", "" ] |
@rozek @nabbl @Mer0me I had precisely the same error message as you had, so our problems may be the same. I inspected the usage of hardware resources by docker containers, and at least in my case, it was the memory limit issue. Docker Desktop (in Ubuntu 22.04) ships with a default memory limit smaller than the size of LLM (gpt4all in my case). So I set the memory limit 10GB, large enough to have gpt4all, and then it worked. It was difficult to figure out it was the memory limit issue because the error message does not deliver it directly. Also, I don't know well about Docker, nor about LLMs, so it took some time for me to figure out the source of the problem in my machine. I think it will definitely help to include a note about increasing Docker's memory limit enough to have LLM on memory in the getting started page: https://localai.io/basics/getting_started/index.html Note that I also uncommented |
@allenhaozi Also given that your debug log says it failed to load the template, I wonder if it is the issue of (1) the wrong path set to find model template, or (2) not enough memory to load template. |
@swoh816 , use quay.io/go-skynet/local-ai:v1.23.2-cublas-cuda11 image, got the following errors
response:
log:
|
I just followed the example and have the same issue here with |
Increasing the memory as described by @swoh816 is what resolved this error for me. Additionally, once that was fixed, text generation was extremely slow. The fix for that was to set threads equal to the number of CPU on the Kubernetes node. |
i increased the memory limit to 64 G still same message. i am using the example from "getting started". when i uncommented REBUILD=true in .env file, i got the following error curl: (56) Recv failure: Connection reset by peer anything else i can try? |
Could someone share what hardware/system configuration this does build and run successfully in? |
I followed the example and ran into the same problem here |
Also getting a similar issue here. .env
docker-compose.yaml
Request & Error
Container Logs
Debug
If I change to the
|
Is there any solution/workaround? I get the same error with various models when deployed to EKS (locally I can run it fine on minikube). |
Another frustrated user here; I can't get anything to work, including the 'Getting started' instructions. Trying a cublas build with Docker. I get the feeling the Local.AI architecture is failing to surface errors from the back-end that would tell the problem. Requested #1416 |
same issue |
In case it helps, I was facing similar errors trying to host llama2 model on AWS EKS with A10 gpu. First we upgraded Nvidia to latest 5.* driver. Second, I needed to also deploy a model yaml file to set f16/gpu_layers (it wasn't enough just to have those as env params as the helm chart pushes). The LocalAI API methods below help you do it all easily -- can search for a model via their gallery & push it with the settings you want (here you can also specify the backend, so it doesn't guess): Get available from gallery
Install from gallery
|
LocalAI/backend/cpp/llama/grpc-server.cpp Lines 2284 to 2369 in 3851b51
1 not unimplemented grpc::Status Embedding(ServerContext* context, const backend::PredictOptions* request, backend::EmbeddingResult* reply) method. if backend =llama-cpp: { "error": { "code": 500, "message": "rpc error: code = Unknown desc = unimplemented", "type": "" } } Bert.cpp has been integrated into llama.cpp! See ggerganov/llama.cpp#5423 and the discussions Updated forks: iamlemec/bert.cpp xyzhang626/embeddings.cpp 2 backend/go/llm/llama not used |
What went wrong? Settings?
quay.io/go-skynet/local-ai:master-cublas-cuda11
request:
response:
log:
The text was updated successfully, but these errors were encountered: