-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefill is ok, but decode stucked #8
Comments
You can try to check the connectivity between two nodes (different config files in TCP & RDMA mode). The "KV send DONE" message just implies that the KVCache entry has been submitted, rather than delivered by remote. |
thanks for reply, but i run on single one node, so does this mean possibly it is etcd that does not run well? |
The problem may be caused by incorrect confs (e.g., the
If the above steps cannot solve your problem, you can try to run our Transfer Engine Bench with |
I run on one single node, can you help me correct it ?
|
and the TransferEngineBench works well,
|
I encountered a similar issue and found that the code (https://github.com/kvcache-ai/vllm/blob/9c319eee04652df9be39377378fb569a6762935e/vllm/distributed/kv_transfer/kv_pipe/mooncake_distributed_pipe.py#L86) was causing the sender and receiver of the prefill and decod to not connect properly on one single node. I modified it as follows:
Additionally, I configured the mooncake.json as follows (prefill_port = decode_port + 5):
The above changes resolved the issue |
Thanks for the digging. This code was originally used to run the inter-node disaggregated prefill demo by default, so we did not consider the port occupation problem of multiple instances on the same node, which will be solved in the future. FYI, if you want to run a disaggregated prefill demo on the same node, you can try vllm-project/vllm#10502, which has already been merged into the main branch of vLLM. |
Hello, could you help me check the correctness of the configuration? When I run the following configuration on a single machine, the decode process gets blocked at this step: Initializing an LLM engine. { "prefill_url": "127.0.0.1:31287", "decode_url": "127.0.0.1:31282", "metadata_server": "127.0.0.19:2379", "protocol": "tcp", "device_name": "" }
Could you please tell me whether the VLLM_PORT settings for prefill and decode in a single machine need to be different? |
the VLLM_PORT and VLLM_HOST_IP needs to be same, because they're used to init process group, otherwise you'll get stucked. And please add more debug log as I listed below |
Thank you, the service can be started now, but I encountered another issue: after inferring N requests(input_len=512 output_len=128 qps= 0.5), a bug occurred during the decode node,prefill node is ok。
My startup parameters are set as follows:
{
|
Hello, thank you for trying this PR. We have released a nightly version that is based on the main branch of vLLM, which also addresses the ports conflict and supports tp. You can have it a try to see if this problem still occurs. Also, I have noticed that you use a strange prompt:
Maybe there exists some bytes encoding problem in your proxy server, which triggers a bug fail of pickle. |
Thank you for your reply; you are right. |
hi, do you have plan to support XpYd based on current version? As we can see, the vLLM's PR's roadmap is including it |
Yes, we are working on XpYd and the scheduler in an inner version. Also, we have found some teams working on this too. I think the vLLM community will welcome various XpYd implementations, which will help the community find the most efficient and practical way to support this. |
So, when will you release the XpYd ? |
So, when will you release the XpYd version ? |
This feature is still under development and testing, and there is no clear release time yet. If you are interested, you can keep up and follow the progress of this project and the vllm community. |
Since v0.2 has been released, which addresses the port conflict issues, I think we can close this issue for now. Feel free to reopen it or raise a new issue if needed. |
the prefill already send KV
but the decoder, stucked in drop_select
did i miss something?
The text was updated successfully, but these errors were encountered: