Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Deepnorm self-attention value proj init scaling missing #284

Merged
merged 2 commits into from
Apr 28, 2022

Conversation

blefaudeux
Copy link
Contributor

@blefaudeux blefaudeux commented Apr 28, 2022

What does this PR do?

Hotfix an init of the projection matrix for the value being skipped when using deepnorm and self attention. See bottom of #219 for context.

TODO:

  • Init seperately Q, K, V weights when using a common buffer
  • Add a unit test to catch faulty weight inits with deepnorm

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 🙃
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you update the changelog? (if needed)
    • N/A

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2022
@blefaudeux blefaudeux marked this pull request as draft April 28, 2022 05:30
@blefaudeux blefaudeux changed the title Possible fix [fix] Possible fix for deepnorm convergence issues with ViT & ImageNet Apr 28, 2022
@blefaudeux
Copy link
Contributor Author

blefaudeux commented Apr 28, 2022

cc @jramapuram, may not be enough to fix but sorry for this bug already, I should have added a test when penning this option

)
trainer = pl.Trainer(
gpus=GPUS,
max_epochs=MAX_EPOCHS,
detect_anomaly=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SeanNaren we discussed this at some point I think, but this flag completely destroys the speed, on a single GPU (so it's not because of communication). May be worth a warning/explanation ?

@blefaudeux
Copy link
Contributor Author

Nothing to write home about on vit/cifar, but this is only 6 layers. (blue is pre-norm, orange is deepnorm)

Screenshot from 2022-04-27 23-05-01

@codecov-commenter
Copy link

Codecov Report

Merging #284 (c69b98e) into main (ac94252) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #284      +/-   ##
==========================================
+ Coverage   92.72%   92.74%   +0.02%     
==========================================
  Files          61       61              
  Lines        3407     3417      +10     
==========================================
+ Hits         3159     3169      +10     
  Misses        248      248              
Flag Coverage Δ
Python 92.74% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
xformers/components/residual.py 97.10% <ø> (ø)
xformers/factory/model_factory.py 96.12% <100.00%> (+0.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac94252...c69b98e. Read the comment docs.

@blefaudeux blefaudeux changed the title [fix] Possible fix for deepnorm convergence issues with ViT & ImageNet [fix] Deepnorm self-attention value proj init missing Apr 28, 2022
@blefaudeux blefaudeux changed the title [fix] Deepnorm self-attention value proj init missing [fix] Deepnorm self-attention value proj init scaling missing Apr 28, 2022
@blefaudeux blefaudeux assigned fmassa and unassigned fmassa Apr 28, 2022
@blefaudeux blefaudeux requested a review from fmassa April 28, 2022 17:05
@blefaudeux blefaudeux marked this pull request as ready for review April 28, 2022 17:05
@blefaudeux blefaudeux merged commit 0102fb7 into main Apr 28, 2022
@blefaudeux blefaudeux deleted the deepnorm_vit branch April 28, 2022 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants