[fix] Deepnorm self-attention value proj init scaling missing #284

blefaudeux · 2022-04-28T05:30:49Z

What does this PR do?

Hotfix an init of the projection matrix for the value being skipped when using deepnorm and self attention. See bottom of #219 for context.

TODO:

Init seperately Q, K, V weights when using a common buffer
Add a unit test to catch faulty weight inits with deepnorm

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

blefaudeux · 2022-04-28T05:33:16Z

cc @jramapuram, may not be enough to fix but sorry for this bug already, I should have added a test when penning this option

blefaudeux · 2022-04-28T05:46:35Z

examples/microViT.py

    )
    trainer = pl.Trainer(
        gpus=GPUS,
        max_epochs=MAX_EPOCHS,
-        detect_anomaly=True,


@SeanNaren we discussed this at some point I think, but this flag completely destroys the speed, on a single GPU (so it's not because of communication). May be worth a warning/explanation ?

blefaudeux · 2022-04-28T06:06:07Z

Nothing to write home about on vit/cifar, but this is only 6 layers. (blue is pre-norm, orange is deepnorm)

codecov-commenter · 2022-04-28T06:12:37Z

Codecov Report

Merging #284 (c69b98e) into main (ac94252) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #284      +/-   ##
==========================================
+ Coverage   92.72%   92.74%   +0.02%     
==========================================
  Files          61       61              
  Lines        3407     3417      +10     
==========================================
+ Hits         3159     3169      +10     
  Misses        248      248

Flag	Coverage Δ
Python	`92.74% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
xformers/components/residual.py	`97.10% <ø> (ø)`
xformers/factory/model_factory.py	`96.12% <100.00%> (+0.32%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac94252...c69b98e. Read the comment docs.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2022

blefaudeux marked this pull request as draft April 28, 2022 05:30

blefaudeux changed the title ~~Possible fix~~ [fix] Possible fix for deepnorm convergence issues with ViT & ImageNet Apr 28, 2022

blefaudeux force-pushed the deepnorm_vit branch from 7275539 to a0a6c44 Compare April 28, 2022 05:32

Possible fix

c69b98e

blefaudeux force-pushed the deepnorm_vit branch from a0a6c44 to c69b98e Compare April 28, 2022 05:43

blefaudeux commented Apr 28, 2022

View reviewed changes

blefaudeux force-pushed the deepnorm_vit branch from a9ad400 to c0dde74 Compare April 28, 2022 17:02

blefaudeux changed the title ~~[fix] Possible fix for deepnorm convergence issues with ViT & ImageNet~~ [fix] Deepnorm self-attention value proj init missing Apr 28, 2022

blefaudeux changed the title ~~[fix] Deepnorm self-attention value proj init missing~~ [fix] Deepnorm self-attention value proj init scaling missing Apr 28, 2022

blefaudeux requested review from jieru-hu and dianaml0 April 28, 2022 17:03

blefaudeux assigned fmassa and unassigned fmassa Apr 28, 2022

blefaudeux requested a review from fmassa April 28, 2022 17:05

blefaudeux marked this pull request as ready for review April 28, 2022 17:05

dianaml0 approved these changes Apr 28, 2022

View reviewed changes

Adding a unit test

d968d21

blefaudeux force-pushed the deepnorm_vit branch from c0dde74 to d968d21 Compare April 28, 2022 17:19

blefaudeux merged commit 0102fb7 into main Apr 28, 2022

blefaudeux deleted the deepnorm_vit branch April 28, 2022 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Deepnorm self-attention value proj init scaling missing #284

[fix] Deepnorm self-attention value proj init scaling missing #284

blefaudeux commented Apr 28, 2022 •

edited

Loading

blefaudeux commented Apr 28, 2022 •

edited

Loading

blefaudeux Apr 28, 2022

blefaudeux commented Apr 28, 2022

codecov-commenter commented Apr 28, 2022

[fix] Deepnorm self-attention value proj init scaling missing #284

[fix] Deepnorm self-attention value proj init scaling missing #284

Conversation

blefaudeux commented Apr 28, 2022 • edited Loading

What does this PR do?

Before submitting

PR review

blefaudeux commented Apr 28, 2022 • edited Loading

blefaudeux Apr 28, 2022

Choose a reason for hiding this comment

blefaudeux commented Apr 28, 2022

codecov-commenter commented Apr 28, 2022

Codecov Report

blefaudeux commented Apr 28, 2022 •

edited

Loading

blefaudeux commented Apr 28, 2022 •

edited

Loading