Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crashes while processing 2 concurrent requests #256

Closed
LLukas22 opened this issue Apr 30, 2024 · 2 comments · Fixed by #268
Closed

Server crashes while processing 2 concurrent requests #256

LLukas22 opened this issue Apr 30, 2024 · 2 comments · Fixed by #268
Labels
bug Something isn't working

Comments

@LLukas22
Copy link
Contributor

Describe the bug
If two requests are sent to the server at roughly the same time, it will start to respond to both requests and then crash with the following error message:

ERROR mistralrs_core::engine: completion - Model failed with error: ShapeMismatchCat { dim: 0, first_shape: [2, 32, 111, 96], n: 2, nth_shape: [4, 32, 1, 96] }

used docker-compose:

version: '3.8'

services:
  text-generation:
    image: ghcr.io/ericlbuehler/mistral.rs:cuda-89-latest
    ports:
        - 12005:80
    volumes:
        - /data/hf-cache:/data:z
    command: --isq Q4K plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
    environment:
       - HUGGING_FACE_HUB_TOKEN=[TOKEN]
       - KEEP_ALIVE_INTERVAL=100
    healthcheck:
      test: curl --fail http://localhost/health || exit 1
      interval: 30s
      retries: 5
      start_period: 300s
      timeout: 10s
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: [gpu]
            count: all

This could also be an error with phi-3 i have to do some further testing.

Latest commit
4ffe68d

@LLukas22 LLukas22 added the bug Something isn't working label Apr 30, 2024
@EricLBuehler
Copy link
Owner

This should not be a problem with the phi3 model specifically. I'll look into what could be the cause.

@EricLBuehler
Copy link
Owner

I was able to reproduce the error by running the following in quick succession.

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "",
"messages": [
{
    "role": "system",
    "content": "You are Mistral.rs, an AI assistant."
},
{
    "role": "user",
    "content": "Write a story about Rust error handling."
}
]
}' &

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants