-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Deepnorm self-attention value proj init scaling missing #284
Conversation
7275539
to
a0a6c44
Compare
cc @jramapuram, may not be enough to fix but sorry for this bug already, I should have added a test when penning this option |
a0a6c44
to
c69b98e
Compare
) | ||
trainer = pl.Trainer( | ||
gpus=GPUS, | ||
max_epochs=MAX_EPOCHS, | ||
detect_anomaly=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SeanNaren we discussed this at some point I think, but this flag completely destroys the speed, on a single GPU (so it's not because of communication). May be worth a warning/explanation ?
Codecov Report
@@ Coverage Diff @@
## main #284 +/- ##
==========================================
+ Coverage 92.72% 92.74% +0.02%
==========================================
Files 61 61
Lines 3407 3417 +10
==========================================
+ Hits 3159 3169 +10
Misses 248 248
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
a9ad400
to
c0dde74
Compare
c0dde74
to
d968d21
Compare
What does this PR do?
Hotfix an init of the projection matrix for the value being skipped when using deepnorm and self attention. See bottom of #219 for context.
TODO:
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.