We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APIs
from pipegoose.nn.expert_parallel import ExpertParallel, ExpertLoss parallel_context = ParallelContext.from_torch(expert_parallel_size=8) mlp = CustomExpert() router = CustomRouter() noise_policy = CustomNoisePolicy() loss_func = nn.CrossEntropy() model = ExpertParallel( model, expert=mlp, router=router, noise_policy=noise_policy, enable_tensor_parallelism=True, parallel_context=parallel_context, ).parallelize() loss_func = ExpertLoss(loss_func, aux_weight=0.1)
TODOs
Top-1, Top-2 router
ExpertParallel (turn a 🤗 transformers to a MoE automatically)
ExpertParallel
transformers
Does expert embedding need to multiply its corresponding router probability?
Make ExpertParallel work with data parallelism
Optionally apply tensor parallelism to an expert layer
Make ExpertParallel work in pipeline parallelism
Make ExpertParallel work with ZeRO-1
Loss function (include aux and z loss)
Move inputs to target expert device
Engineering Reading
MoE Reading
The text was updated successfully, but these errors were encountered:
xrsrke
No branches or pull requests
APIs
TODOs
Top-1, Top-2 router
ExpertParallel
(turn a 🤗transformers
to a MoE automatically)Does expert embedding need to multiply its corresponding router probability?
Make
ExpertParallel
work with data parallelismOptionally apply tensor parallelism to an expert layer
Make
ExpertParallel
work in pipeline parallelismMake
ExpertParallel
work with ZeRO-1Loss function (include aux and z loss)
Move inputs to target expert device
Engineering Reading
MoE Reading
The text was updated successfully, but these errors were encountered: