-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Port binding failure when using pp > 1 after commit 7c7714d856eee6fa94aade729b67f00584f72a4c #8791
Comments
looks strange. @robertgshaw2-neuralmagic do you have any ideas? @dengminhao do you try |
Nothing changed after --disable-frontend-multiprocessing. |
I aslo tried reuse_port=True when calling loop.create_server in uvicorn/server.py. It won't help. |
Oh, another finding. |
After this try, I have an idea and then find a quick fix
|
we use the socket to hold the port, so that engine does not take this port. your change will essentially disable the functionality.
did you try to set |
I know what you expect, so I said it was a quick fix. Roll back my change, and then: |
I upgrade to 0.6.2, the bug still exists. |
is it because of port resource management after fork |
I guess in my case ,the temp socket resource duplicated when fork. But I don't know what's the difference between tp and pp in this case. |
same issue, use 0.6.2. |
@HelloCard can you test if #8537 solve this issue? you can follow https://docs.vllm.ai/en/latest/getting_started/installation.html#install-the-latest-code to install the latest wheel. |
I get failures whether I use the --disable-frontend-multiprocessing argument or not, so I'm not sure my environment will test the results you expect. Anyway, I tried installing the latest version. Successfully installed opencv-python-headless-4.10.0.84 vllm-0.6.3.dev144+gdc4aea67.d20241009
Unfortunately, it seems that I cannot use the nightly wheel because my environment is WSL2? |
you are trying to run a quantized model |
My GPU obviously supports W8A8 quantization, because I have run the model in W8A8 format many times with version 0.6.2. |
Successfully installed vllm-0.6.3.dev172+ge808156f.d20241011
Loading microsoft/Phi-3-medium-4k-instruct everything went well. |
(base) root@DESKTOP-PEPA2G9:/mnt/c/Windows/system32# python3 -m vllm.entrypoints.openai.api_server --model /mnt/e/Code/models/Phi-3-medium-4k-instruct --max-model-len 4096 --gpu-memory-utilization 0.7 --swap_space=0 --tensor-parallel-size 2 --dtype=half --max-num-seqs=1 --disable-frontend-multiprocessing
|
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
After a binary search, I found that after commit 7c7714d, the main port binding will fail when pp> 1. But if we only set tp>1, the binding will success.
For example:
vllm serve /home/ai/ai/model/Qwen2.5-3B-Instruct/ --served-model-name qwen2.5-3B -pp 2 --trust-remote-code --max-model-len 4096 --enforce-eager --port 18004 --gpu-memory-utilization 1 --preemption-mode swap
will fail with ERROR:
ERROR: [Errno 98] error while attempting to bind on address ('0.0.0.0', 18004): address already in use
But
vllm serve /home/ai/ai/model/Qwen2.5-3B-Instruct/ --served-model-name qwen2.5-3B -tp 2 --trust-remote-code --max-model-len 4096 --enforce-eager --port 18004 --gpu-memory-utilization 1 --preemption-mode swap
will run successfully with:
INFO: Uvicorn running on http://0.0.0.0:18004 (Press CTRL+C to quit)
If we checkout to commit 9d104b5 in main branch, we can launch successfully with pp>1
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: