[Usage]: Context window crashes web window when full #12221

seabastard · 2025-01-20T11:00:44Z

Your current environment

PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-130-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40

Nvidia driver version: 565.57.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

How would you like to use vllm

Im toying around with setting up a small self hosted LLM for a small pool of users, i have an Ubuntu vm with 2x Nvidia a40 gpu's.
Im using open-webui in docker as a front end and vllm docker container to handle the LLM, its all working fine but no matter what i try and what model i use when ever the context window fills up the LLM grinds to a halt and then crashes with an error for the context window being full (for that chat window). Am i doing something stupid or have i missed something as i would have thought that there would have been some kind of sliding context window or some other way of managing it without having to start a new chat windows every time it fills up. i just want each user to have a context window of around 4000 at any one time.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DavideHe · 2025-01-20T12:48:48Z

how do you create the [Usage] issue? When I create a new issue , Always Pop-up dialog box with Unable to create issue.

seabastard added the usage How to use vllm label Jan 20, 2025

seabastard changed the title ~~[Usage]:~~ [Usage]: Context window crashes web window when full Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Context window crashes web window when full #12221

[Usage]: Context window crashes web window when full #12221

seabastard commented Jan 20, 2025

DavideHe commented Jan 20, 2025 •

edited

Loading

[Usage]: Context window crashes web window when full #12221

[Usage]: Context window crashes web window when full #12221

Comments

seabastard commented Jan 20, 2025

Your current environment

How would you like to use vllm

Before submitting a new issue...

DavideHe commented Jan 20, 2025 • edited Loading

DavideHe commented Jan 20, 2025 •

edited

Loading