-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModuleNotFoundError: No module named 'fmoe_cuda' #177
Comments
You are supposed to compile and install the cuda module of fastmoe using |
I'm getting when attempting to use fmoefy. I did install the cuda module using
Are there specific CUDA-related requirements that I may be missing/needing to downgrade? |
I have not tried to compile the fmoe_cuda module with a different nvcc, so I am not sure if you should do the downgrade. I think you should first check whether the |
Describe the bug
I adapt fmoe into Megatron as the tutorial and want to run a script to train gpt. But when I run
pretrain_gpt.sh
, it raises the error called "ModuleNotFoundError: No module named 'fmoe_cuda'". In detail, I git clone the Megatron-LM repository and modify the functions mentioned infastmoe/examples/megatron/fmoefy-v2.2.patch
. Then, I git clone thefastmoe
and put it in the Megatron folder like "./Megatron-LM/fastmoe" to avoid ModuleNotFoundError that may raise. But when I run thepretrain_gpt.sh
, it still raises the error. I don't know quite a lot about the module compilation, so I'm here to ask for your great help. Thanks a lot!!To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect it trains a moefy-Megatron smoothly.
Logs
The text was updated successfully, but these errors were encountered: