Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

Closed
1 task done
zifeitong opened this issue Aug 27, 2024 · 8 comments · Fixed by #8157
Closed
1 task done
Labels
bug Something isn't working

Comments

@zifeitong
Copy link
Contributor

zifeitong commented Aug 27, 2024

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

@robertgshaw2-neuralmagic, @njhill

I am running vllm @ 6653040 which includes #7394.

Reproducer:

# vllm serve meta-llama/Meta-Llama-3-8B-Instruct  --disable-log-requests

import openai
import asyncio

N = 800

client = openai.AsyncOpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

async def generate_streaming(prompt: str):
    async for req_output in await client.completions.create(
      model="meta-llama/Meta-Llama-3-8B-Instruct",
      prompt=prompt,
      stream=True,
    ):
        yield req_output.choices[0].text

async def generate_output(prompt: str):
    async for output in generate_streaming(prompt):
       final_output = output
    return final_output


async def main():
    prompts = [str(i) for i in range(N)]
    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(generate_output(prompt)) for prompt in prompts]

asyncio.run(main())

Error message:

    | Traceback (most recent call last):
    |   File ".venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File ".venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "vllm/vllm/entrypoints/openai/serving_completion.py", line 231, in completion_stream_generator
    |     async for prompt_idx, res in result_generator:
    |   File "vllm/vllm/utils.py", line 468, in merge_async_iterators
    |     item = await d
    |            ^^^^^^^
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 424, in generate
    |     await self.abort(request_id)
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 350, in abort
    |     await self._send_one_way_rpc_request(
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 256, in _send_one_way_rpc_request
    |     with self.to_proxy_socket() as socket:
    |   File "/usr/lib/python3.11/contextlib.py", line 137, in __enter__
    |     return next(self.gen)
    |            ^^^^^^^^^^^^^^
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 195, in to_proxy_socket
    |     socket = self.context.socket(zmq.constants.DEALER)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File ".venv/lib/python3.11/site-packages/zmq/sugar/context.py", line 354, in socket
    |     socket_class(  # set PYTHONTRACEMALLOC=2 to get the calling frame
    |   File ".venv/lib/python3.11/site-packages/zmq/_future.py", line 218, in __init__
    |     super().__init__(context, socket_type, **kwargs)  # type: ignore
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File ".venv/lib/python3.11/site-packages/zmq/sugar/socket.py", line 156, in __init__
    |     super().__init__(
    |   File "_zmq.py", line 690, in zmq.backend.cython._zmq.Socket.__init__
    | zmq.error.ZMQError: Too many open files

This arguably is not normal online serving traffic. With that said, if --disable-frontend-multiprocessing is on, the server can handle N=8192 with no issue.

strace shows lots of eventfd, which might be related to https://www.mail-archive.com/zeromq-dev@lists.zeromq.org/msg31244.html

730059 eventfd2(0, EFD_CLOEXEC)         = 976
730059 eventfd2(0, EFD_CLOEXEC)         = 977
730059 eventfd2(0, EFD_CLOEXEC)         = 978
730059 eventfd2(0, EFD_CLOEXEC)         = 979
730059 eventfd2(0, EFD_CLOEXEC)         = 980
730059 eventfd2(0, EFD_CLOEXEC)         = 981
730059 eventfd2(0, EFD_CLOEXEC)         = 982
730059 eventfd2(0, EFD_CLOEXEC)         = 983
730059 eventfd2(0, EFD_CLOEXEC)         = 984
730059 eventfd2(0, EFD_CLOEXEC)         = 985
730059 eventfd2(0, EFD_CLOEXEC)         = 986
730059 eventfd2(0, EFD_CLOEXEC)         = 987
730059 eventfd2(0, EFD_CLOEXEC)         = 988
730059 eventfd2(0, EFD_CLOEXEC)         = 989
730059 eventfd2(0, EFD_CLOEXEC)         = 990
730059 eventfd2(0, EFD_CLOEXEC)         = 991
730059 eventfd2(0, EFD_CLOEXEC)         = 992
730059 eventfd2(0, EFD_CLOEXEC)         = 993
730059 eventfd2(0, EFD_CLOEXEC)         = 994
730059 eventfd2(0, EFD_CLOEXEC)         = 995
730059 eventfd2(0, EFD_CLOEXEC)         = 996
730059 eventfd2(0, EFD_CLOEXEC)         = 997
730059 eventfd2(0, EFD_CLOEXEC)         = 998
730059 eventfd2(0, EFD_CLOEXEC)         = 999
730059 eventfd2(0, EFD_CLOEXEC)         = 1000
730059 eventfd2(0, EFD_CLOEXEC)         = 1001
730059 eventfd2(0, EFD_CLOEXEC)         = 1002
730059 eventfd2(0, EFD_CLOEXEC)         = 1003
730059 eventfd2(0, EFD_CLOEXEC <unfinished ...>
730059 <... eventfd2 resumed>)          = 1004
730059 eventfd2(0, EFD_CLOEXEC)         = 1005
730059 eventfd2(0, EFD_CLOEXEC)         = 1006
730059 eventfd2(0, EFD_CLOEXEC)         = 1007
730059 eventfd2(0, EFD_CLOEXEC)         = 1008
730059 eventfd2(0, EFD_CLOEXEC)         = 1009
730059 eventfd2(0, EFD_CLOEXEC)         = 1010
730059 eventfd2(0, EFD_CLOEXEC)         = 1011
730059 eventfd2(0, EFD_CLOEXEC)         = 1012
730059 eventfd2(0, EFD_CLOEXEC)         = 1013
730059 eventfd2(0, EFD_CLOEXEC)         = 1014
730059 eventfd2(0, EFD_CLOEXEC)         = 1015
730059 eventfd2(0, EFD_CLOEXEC)         = 1016
730059 eventfd2(0, EFD_CLOEXEC)         = 1017
730059 eventfd2(0, EFD_CLOEXEC)         = 1018
730059 eventfd2(0, EFD_CLOEXEC)         = 1019
730059 eventfd2(0, EFD_CLOEXEC)         = 1020
730059 eventfd2(0, EFD_CLOEXEC)         = 1021
730059 eventfd2(0, EFD_CLOEXEC)         = 1022
730059 eventfd2(0, EFD_CLOEXEC)         = 1023
730059 eventfd2(0, EFD_CLOEXEC)         = -1 EMFILE (Too many open files)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@zifeitong zifeitong added the bug Something isn't working label Aug 27, 2024
@Br1tBreaker
Copy link

Right, so it's a different issue then, bruv. "Too many open files", eh? That's a classic. Sounds like your system's getting overwhelmed with all those requests.

First thing's first, check your system's open file limit. It's probably set too low. You can bump it up with ulimit -n. Give it a generous number, like 65536 or even higher.

Next, have a look at how you're handling those asyncio tasks. Are you creating too many at once? Try limiting the number of concurrent requests with asyncio.Semaphore.

If that doesn't do the trick, you might need to get a bit more creative. Consider using a connection pool to manage your requests. That way, you can reuse connections instead of opening new ones all the time.

And if you're still hitting that limit, it might be time to look at your system architecture. Are you running vllm on a machine with enough resources? Maybe it's time for an upgrade, mate.

Don't let this get you down, bruv. We'll get those files under control.

@robertgshaw2-neuralmagic
Copy link
Collaborator

Can you share the value of ulimit?

@zifeitong
Copy link
Contributor Author

Can you share the value of ulimit?

It's the Ubuntu default 1024.

@youkaichao
Copy link
Member

does this mean zmq still opens one file (fortunately it's not a socket) for every connection?

@robertgshaw2-neuralmagic
Copy link
Collaborator

robertgshaw2-neuralmagic commented Aug 27, 2024

@youkaichao i need to dig in more. There is only 1 ipc socket created. The rest are inproc, so I don’t get what’s going on. I’m going to get something up on an Ubuntu server to try to repro

In my testing, I manually verified that socket usage was not growing by monitoring system stats, so I’m not sure what is going on here

@linpan
Copy link

linpan commented Aug 28, 2024

zmq.error.ZMQError: Too many open files ,

@linpan
Copy link

linpan commented Aug 28, 2024

+1

@robertgshaw2-neuralmagic
Copy link
Collaborator

robertgshaw2-neuralmagic commented Sep 2, 2024

We have a new design that should resolve this issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants