Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(model): apply gate fp32 only for mixtral #1241

Merged
merged 3 commits into from
Feb 1, 2024

Conversation

NanoCode012
Copy link
Collaborator

@NanoCode012 NanoCode012 commented Feb 1, 2024

Gate was accidentally moved to fp32 for non-mixtral models.

This has been discovered by Nafnlaus on discord: https://discord.com/channels/1104757954588196865/1104757955204743201/1202445059438542899

Description

Details on negative effect: https://discord.com/channels/1104757954588196865/1104757955204743201/1202287752486604853

Worse eval loss.

Social Handles (Optional)

His discord handle is @nafnlaus00. Github: @enn-nafnlaus

@@ -671,7 +671,7 @@ def load_model(
if not cfg.fsdp:
# FSDP doesn't like mixed Float and BFloat16
for name, module in model.named_modules():
if any(m in name for m in ["norm", "gate"]):
if "norm" in name or (model_config.model_type == "mixtral" and "gate" in name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "norm" in name or (model_config.model_type == "mixtral" and "gate" in name):
if "norm" in name or name == "gate":

there are other MoE's out there too. also, "gate" in name will still still catch the gate_proj module in mixtral.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do all MoE's strictly use the same name gate? Should we specifically check all those model types?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe gate in name and proj not in name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think name == "gate" would fail as:

for name, module in model.named_modules():
    if "gate" in name:
        print(name)

# model.layers.0.block_sparse_moe.gate
# model.layers.1.block_sparse_moe.gate
..

I have also checked, and I could not find gate_proj in either the modeling code or after initiating the model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving back as I think you resolved this while I was midtyping

Copy link
Collaborator Author

@NanoCode012 NanoCode012 Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other suggestion of yours may work: "gate" in name and "proj" not in name.

@winglian winglian force-pushed the NanoCode012-fix-gate-fp32 branch from 858575d to 218b892 Compare February 1, 2024 17:36
@NanoCode012
Copy link
Collaborator Author

yep. that works!

@winglian winglian merged commit 2d65f47 into main Feb 1, 2024
7 checks passed
@winglian winglian deleted the NanoCode012-fix-gate-fp32 branch February 1, 2024 18:55
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* fix(model): apply gate fp32 only for mixtral

* Update src/axolotl/utils/models.py

* fix gate layer check

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants