Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run transformer-xl with parallel experts with single gpu? #211

Open
HudashiNeo opened this issue Sep 10, 2024 · 6 comments
Open

how to run transformer-xl with parallel experts with single gpu? #211

HudashiNeo opened this issue Sep 10, 2024 · 6 comments

Comments

@HudashiNeo
Copy link

seems fast-moe still cannot archive running multi experts in parallel with single gpu card?

@laekov
Copy link
Owner

laekov commented Sep 10, 2024

It is supported using multiple cuda streams. Refer to the class FMoELinear for details.

@HudashiNeo
Copy link
Author

多谢回复~
我看FMoE和FMoETransformerMLP这两个都是基于one_experts的(给FMoELinear里面的num_expert设置为1),如果不设置num_expert=1的话,代码可以跑是吗

@laekov
Copy link
Owner

laekov commented Sep 11, 2024

多谢回复~ 我看FMoE和FMoETransformerMLP这两个都是基于one_experts的(给FMoELinear里面的num_expert设置为1),如果不设置num_expert=1的话,代码可以跑是吗

可以设成更大的数, 能跑.

@HudashiNeo
Copy link
Author

可以了,确实很强大。我的配置上,代码比for-loop快了3倍以上,而且topk基本不影响速度,赞👍🏻

@HudashiNeo
Copy link
Author

@laekov 我把gpu设置成cuda:1的话,就会报错:

  File "/usr/local/lib/python3.10/dist-packages/fastmoe-1.1.0-py3.10-linux-x86_64.egg/fmoe/linear.py", line 20, in forward
    global_output_buf = fmoe_cuda.linear_forward(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@laekov
Copy link
Owner

laekov commented Sep 12, 2024

可能有的数据还在 cuda:0 上?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants