-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Layer positions labelling & layernorm #348
[fix] Layer positions labelling & layernorm #348
Conversation
@@ -149,10 +149,12 @@ def __init__( | |||
for i in range(config.num_layers): | |||
# Label where this layer is in the stack | |||
# (for instance useful for the positional encoding, or late layer norm) | |||
if i > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would count within the repeated layers, not the overall layer stack
e5f22e4
to
ebc4f6f
Compare
67ba72f
to
f8f4972
Compare
@@ -175,10 +175,7 @@ def __init__(self, config: xFormerEncoderConfig, **kwargs): | |||
# Optional patch embedding | |||
self.patch_emb: Optional[nn.Module] = None | |||
|
|||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was not respecting the config, while trying to do the right thing: if the config was asking for a patch embedding, but the layer was not first, it would not be instantiated. In retrospect I think that it's risky, not doing what the API says it will do, plus it only worked in practice because is_first() was often wrong. I think now that it's better to respect the config no matter what, and not silently diverge
Codecov Report
@@ Coverage Diff @@
## hierachical_models_improvement #348 +/- ##
===============================================================
Coverage 93.95% 93.95%
===============================================================
Files 70 70
Lines 3984 3984
===============================================================
Hits 3743 3743
Misses 241 241
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
) * handling different normalizations + layer repetition * bugfix localizing the layers in the stack (#348) * renaming the layer_norm_style param when building from config Co-authored-by: Benjamin Lefaudeux <lefaudeux@Benjamins-MacBook-Pro.local>
What does this PR do?
Fixes #347
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.