-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MODEL] Add qwen moe support #24
Conversation
@bozheng-hit We have refractored your code to be more generic so that the same |
TEST PASSED: Native PPL: 6.4653 Note we only a rudimentary quant for testing and not a full-quality quant. The PPL values here is only for regression/sanity check. |
* support Qwen2MoE * add description * change order * add type and type hint * fix wrong args name and order * fix need Qwen2MoeGPTQ * update README.md * add require_true_sequential property * add dynamic_expert_layer_index property * add todo * shorten name * use getattr() * reduce torch/triton requirements to torch/triton 2.0.0 * rename inside_layer_modules to layer_modules * print layers log * Update perplexity.py * Update model.py * rename * remove unused log --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Bo Zheng <368586905@qq.com> Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: diegomontoya <xing@fictionpress.com> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
No description provided.