[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

zifeitong · 2024-08-27T17:45:10Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

🐛 Describe the bug

@robertgshaw2-neuralmagic, @njhill

I am running vllm @ 6653040 which includes #7394.

Reproducer:

# vllm serve meta-llama/Meta-Llama-3-8B-Instruct  --disable-log-requests

import openai
import asyncio

N = 800

client = openai.AsyncOpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

async def generate_streaming(prompt: str):
    async for req_output in await client.completions.create(
      model="meta-llama/Meta-Llama-3-8B-Instruct",
      prompt=prompt,
      stream=True,
    ):
        yield req_output.choices[0].text

async def generate_output(prompt: str):
    async for output in generate_streaming(prompt):
       final_output = output
    return final_output


async def main():
    prompts = [str(i) for i in range(N)]
    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(generate_output(prompt)) for prompt in prompts]

asyncio.run(main())

Error message:

    | Traceback (most recent call last):
    |   File ".venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File ".venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "vllm/vllm/entrypoints/openai/serving_completion.py", line 231, in completion_stream_generator
    |     async for prompt_idx, res in result_generator:
    |   File "vllm/vllm/utils.py", line 468, in merge_async_iterators
    |     item = await d
    |            ^^^^^^^
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 424, in generate
    |     await self.abort(request_id)
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 350, in abort
    |     await self._send_one_way_rpc_request(
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 256, in _send_one_way_rpc_request
    |     with self.to_proxy_socket() as socket:
    |   File "/usr/lib/python3.11/contextlib.py", line 137, in __enter__
    |     return next(self.gen)
    |            ^^^^^^^^^^^^^^
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 195, in to_proxy_socket
    |     socket = self.context.socket(zmq.constants.DEALER)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File ".venv/lib/python3.11/site-packages/zmq/sugar/context.py", line 354, in socket
    |     socket_class(  # set PYTHONTRACEMALLOC=2 to get the calling frame
    |   File ".venv/lib/python3.11/site-packages/zmq/_future.py", line 218, in __init__
    |     super().__init__(context, socket_type, **kwargs)  # type: ignore
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File ".venv/lib/python3.11/site-packages/zmq/sugar/socket.py", line 156, in __init__
    |     super().__init__(
    |   File "_zmq.py", line 690, in zmq.backend.cython._zmq.Socket.__init__
    | zmq.error.ZMQError: Too many open files

This arguably is not normal online serving traffic. With that said, if --disable-frontend-multiprocessing is on, the server can handle N=8192 with no issue.

strace shows lots of eventfd, which might be related to https://www.mail-archive.com/zeromq-dev@lists.zeromq.org/msg31244.html

730059 eventfd2(0, EFD_CLOEXEC)         = 976
730059 eventfd2(0, EFD_CLOEXEC)         = 977
730059 eventfd2(0, EFD_CLOEXEC)         = 978
730059 eventfd2(0, EFD_CLOEXEC)         = 979
730059 eventfd2(0, EFD_CLOEXEC)         = 980
730059 eventfd2(0, EFD_CLOEXEC)         = 981
730059 eventfd2(0, EFD_CLOEXEC)         = 982
730059 eventfd2(0, EFD_CLOEXEC)         = 983
730059 eventfd2(0, EFD_CLOEXEC)         = 984
730059 eventfd2(0, EFD_CLOEXEC)         = 985
730059 eventfd2(0, EFD_CLOEXEC)         = 986
730059 eventfd2(0, EFD_CLOEXEC)         = 987
730059 eventfd2(0, EFD_CLOEXEC)         = 988
730059 eventfd2(0, EFD_CLOEXEC)         = 989
730059 eventfd2(0, EFD_CLOEXEC)         = 990
730059 eventfd2(0, EFD_CLOEXEC)         = 991
730059 eventfd2(0, EFD_CLOEXEC)         = 992
730059 eventfd2(0, EFD_CLOEXEC)         = 993
730059 eventfd2(0, EFD_CLOEXEC)         = 994
730059 eventfd2(0, EFD_CLOEXEC)         = 995
730059 eventfd2(0, EFD_CLOEXEC)         = 996
730059 eventfd2(0, EFD_CLOEXEC)         = 997
730059 eventfd2(0, EFD_CLOEXEC)         = 998
730059 eventfd2(0, EFD_CLOEXEC)         = 999
730059 eventfd2(0, EFD_CLOEXEC)         = 1000
730059 eventfd2(0, EFD_CLOEXEC)         = 1001
730059 eventfd2(0, EFD_CLOEXEC)         = 1002
730059 eventfd2(0, EFD_CLOEXEC)         = 1003
730059 eventfd2(0, EFD_CLOEXEC <unfinished ...>
730059 <... eventfd2 resumed>)          = 1004
730059 eventfd2(0, EFD_CLOEXEC)         = 1005
730059 eventfd2(0, EFD_CLOEXEC)         = 1006
730059 eventfd2(0, EFD_CLOEXEC)         = 1007
730059 eventfd2(0, EFD_CLOEXEC)         = 1008
730059 eventfd2(0, EFD_CLOEXEC)         = 1009
730059 eventfd2(0, EFD_CLOEXEC)         = 1010
730059 eventfd2(0, EFD_CLOEXEC)         = 1011
730059 eventfd2(0, EFD_CLOEXEC)         = 1012
730059 eventfd2(0, EFD_CLOEXEC)         = 1013
730059 eventfd2(0, EFD_CLOEXEC)         = 1014
730059 eventfd2(0, EFD_CLOEXEC)         = 1015
730059 eventfd2(0, EFD_CLOEXEC)         = 1016
730059 eventfd2(0, EFD_CLOEXEC)         = 1017
730059 eventfd2(0, EFD_CLOEXEC)         = 1018
730059 eventfd2(0, EFD_CLOEXEC)         = 1019
730059 eventfd2(0, EFD_CLOEXEC)         = 1020
730059 eventfd2(0, EFD_CLOEXEC)         = 1021
730059 eventfd2(0, EFD_CLOEXEC)         = 1022
730059 eventfd2(0, EFD_CLOEXEC)         = 1023
730059 eventfd2(0, EFD_CLOEXEC)         = -1 EMFILE (Too many open files)

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

Br1tBreaker · 2024-08-27T17:50:25Z

Right, so it's a different issue then, bruv. "Too many open files", eh? That's a classic. Sounds like your system's getting overwhelmed with all those requests.

First thing's first, check your system's open file limit. It's probably set too low. You can bump it up with ulimit -n. Give it a generous number, like 65536 or even higher.

Next, have a look at how you're handling those asyncio tasks. Are you creating too many at once? Try limiting the number of concurrent requests with asyncio.Semaphore.

If that doesn't do the trick, you might need to get a bit more creative. Consider using a connection pool to manage your requests. That way, you can reuse connections instead of opening new ones all the time.

And if you're still hitting that limit, it might be time to look at your system architecture. Are you running vllm on a machine with enough resources? Maybe it's time for an upgrade, mate.

Don't let this get you down, bruv. We'll get those files under control.

robertgshaw2-neuralmagic · 2024-08-27T18:10:54Z

Can you share the value of ulimit?

zifeitong · 2024-08-27T18:13:48Z

Can you share the value of ulimit?

It's the Ubuntu default 1024.

youkaichao · 2024-08-27T21:39:15Z

does this mean zmq still opens one file (fortunately it's not a socket) for every connection?

robertgshaw2-neuralmagic · 2024-08-27T21:42:07Z

@youkaichao i need to dig in more. There is only 1 ipc socket created. The rest are inproc, so I don’t get what’s going on. I’m going to get something up on an Ubuntu server to try to repro

In my testing, I manually verified that socket usage was not growing by monitoring system stats, so I’m not sure what is going on here

linpan · 2024-08-28T09:09:54Z

zmq.error.ZMQError: Too many open files ,

linpan · 2024-08-28T09:10:00Z

+1

robertgshaw2-neuralmagic · 2024-09-02T21:48:50Z

We have a new design that should resolve this issue:

WIP PR: [Core][Bugfix][Perf] Refactor Server to Avoid AsyncLLMEngine #8092

zifeitong added the bug Something isn't working label Aug 27, 2024

This was referenced Sep 2, 2024

[Core][Bugfix][Perf] Refactor Server to Avoid AsyncLLMEngine #8092

Closed

[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH #8157

Merged

robertgshaw2-neuralmagic closed this as completed in #8157 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

zifeitong commented Aug 27, 2024 •

edited

Loading

Br1tBreaker commented Aug 27, 2024

robertgshaw2-neuralmagic commented Aug 27, 2024

zifeitong commented Aug 27, 2024

youkaichao commented Aug 27, 2024

robertgshaw2-neuralmagic commented Aug 27, 2024 •

edited

Loading

linpan commented Aug 28, 2024

linpan commented Aug 28, 2024

robertgshaw2-neuralmagic commented Sep 2, 2024 •

edited

Loading

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

Comments

zifeitong commented Aug 27, 2024 • edited Loading

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Br1tBreaker commented Aug 27, 2024

robertgshaw2-neuralmagic commented Aug 27, 2024

zifeitong commented Aug 27, 2024

youkaichao commented Aug 27, 2024

robertgshaw2-neuralmagic commented Aug 27, 2024 • edited Loading

linpan commented Aug 28, 2024

linpan commented Aug 28, 2024

robertgshaw2-neuralmagic commented Sep 2, 2024 • edited Loading

zifeitong commented Aug 27, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 27, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Sep 2, 2024 •

edited

Loading