Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Context window crashes web window when full #12221

Open
1 task done
seabastard opened this issue Jan 20, 2025 · 1 comment
Open
1 task done

[Usage]: Context window crashes web window when full #12221

seabastard opened this issue Jan 20, 2025 · 1 comment
Labels
usage How to use vllm

Comments

@seabastard
Copy link

Your current environment

PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-130-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40

Nvidia driver version: 565.57.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

How would you like to use vllm

Im toying around with setting up a small self hosted LLM for a small pool of users, i have an Ubuntu vm with 2x Nvidia a40 gpu's.
Im using open-webui in docker as a front end and vllm docker container to handle the LLM, its all working fine but no matter what i try and what model i use when ever the context window fills up the LLM grinds to a halt and then crashes with an error for the context window being full (for that chat window). Am i doing something stupid or have i missed something as i would have thought that there would have been some kind of sliding context window or some other way of managing it without having to start a new chat windows every time it fills up. i just want each user to have a context window of around 4000 at any one time.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@seabastard seabastard added the usage How to use vllm label Jan 20, 2025
@DavideHe
Copy link

DavideHe commented Jan 20, 2025

how do you create the [Usage] issue? When I create a new issue , Always Pop-up dialog box with Unable to create issue.

@seabastard seabastard changed the title [Usage]: [Usage]: Context window crashes web window when full Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants