-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] cleanup zmq ipc sockets on exit #11115
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this atexit
function is handy!
It sure is, but it must be used with care when we're dealing with multiprocessing! I'm pretty sure a forked process will inherit |
active_procs = [p for p in active_procs if p.is_alive()] | ||
for p in active_procs: | ||
p.kill() | ||
self._cleanup_sockets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we clean up the sockets before we terminate the workers, in case the client process gets impatient and sends a SIGKILL to this one while this process waits for the workers to gracefully terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up before terminating workers is more likely to fail because the workers may still have the socket open.
I think anywhere we use SIGKILL is giving up the illusion of proper cleanup. Hopefully that's rare?
This pull request has merge conflicts that must be resolved before it can be |
I noticed that my dev machine had several hundred orphanied files from old zmq ipc sockets that vllm didn't clean up. This change uses `atexit` to ensure that these files are cleaned up. I tested this using `vllm serve` with `--tensor-parallel-size 4`, `VLLM_USE_V1=1`, and `VLLM_ENABLE_V1_MULTIPROCESSING=1` to ensure that all of these code paths were executed. I saw all sockets created and cleaned up when I stopped vllm. Signed-off-by: Russell Bryant <rbryant@redhat.com>
Head branch was pushed to by a user without write access
0135152
to
e51d9b9
Compare
@tlrmchlsmth you'll have to enable auto-merge again. I had to resolve some conflicts. |
Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
I noticed that my dev machine had several hundred orphanied files from
old zmq ipc sockets that vllm didn't clean up.
This change uses
atexit
to ensure that these files are cleaned up.I tested this using
vllm serve
with--tensor-parallel-size 4
,VLLM_USE_V1=1
, andVLLM_ENABLE_V1_MULTIPROCESSING=1
to ensure thatall of these code paths were executed. I saw all sockets created and
cleaned up when I stopped vllm.
Signed-off-by: Russell Bryant rbryant@redhat.com