-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 使用两个GPU运行“Qwen2-VL-2B”,启动后还没有任何请求就有一颗GPU自动满载运行 #2582
Comments
@wangaocheng 我看到这个issue了,建议你要么换成2B先试一试,要么或者不要用docker,直接跑在原生系统里试一试能否看到更多的错误信息。 |
Windows 无法安装Triton,会出现 test failed!报错无法启动。 |
正常的,nccl 等同步的 kernel |
@grimoire 请问这是目前的pytorch backend的局限性吗?有没有什么办法解决这个问题吗? |
空跑的 nccl kernel 应该没啥开销的,也不影响其他 stream 上 kernel 的使用的,也就显示的利用率高点,应该不是啥大问题吧 |
不是利用率高点,是一直保持在100%。。。 |
理论上等待时用的 cuda core 很少的,就算100%也能同时在其他 stream 里跑别的 kernel 的。#2607 (review) 这里加了个 barrier,应该就不会 100% 了。 |
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response. |
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now. |
Checklist
Describe the bug
使用两个GPU运行
Qwen/Qwen2-VL-2B-Instruct
,启动后还没有任何请求就有一颗GPU自动满载运行Reproduction
Environment
Error traceback
The text was updated successfully, but these errors were encountered: