-
-
Notifications
You must be signed in to change notification settings - Fork 958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(model): apply gate fp32 only for mixtral #1241
Conversation
src/axolotl/utils/models.py
Outdated
@@ -671,7 +671,7 @@ def load_model( | |||
if not cfg.fsdp: | |||
# FSDP doesn't like mixed Float and BFloat16 | |||
for name, module in model.named_modules(): | |||
if any(m in name for m in ["norm", "gate"]): | |||
if "norm" in name or (model_config.model_type == "mixtral" and "gate" in name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "norm" in name or (model_config.model_type == "mixtral" and "gate" in name): | |
if "norm" in name or name == "gate": |
there are other MoE's out there too. also, "gate" in name
will still still catch the gate_proj
module in mixtral.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do all MoE's strictly use the same name gate
? Should we specifically check all those model types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe gate in name and proj not in name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think name == "gate"
would fail as:
for name, module in model.named_modules():
if "gate" in name:
print(name)
# model.layers.0.block_sparse_moe.gate
# model.layers.1.block_sparse_moe.gate
..
I have also checked, and I could not find gate_proj
in either the modeling code or after initiating the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unresolving back as I think you resolved this while I was midtyping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other suggestion of yours may work: "gate" in name and "proj" not in name
.
858575d
to
218b892
Compare
yep. that works! |
* fix(model): apply gate fp32 only for mixtral * Update src/axolotl/utils/models.py * fix gate layer check --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
Gate was accidentally moved to fp32 for non-mixtral models.
This has been discovered by Nafnlaus on discord: https://discord.com/channels/1104757954588196865/1104757955204743201/1202445059438542899
Description
Details on negative effect: https://discord.com/channels/1104757954588196865/1104757955204743201/1202287752486604853
Worse eval loss.
Social Handles (Optional)
His discord handle is
@nafnlaus00
. Github: @enn-nafnlaus