-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
facebook\opt layer norm #24243
Comments
Hi @CompressTeam, thanks for raising this issue! I believe this behaviour is likely coming from the fact that the layer norm layers are instantiated with Playing quickly with the snippet provided, I can see that the biases are all different values, so it would seem that either only the biases were updated when training the model, there's been a error in weight conversion or an issue with weight saving. I'll hand over to @younesbelkada who added the model as is most familiar with layer norm related logic like |
Hi @CompressTeam I think that this is expected, see this interesting thread from the authors: #17653 and in particular these 2 messages: #17653 (comment) / #17653 (comment) from what I have understood the models somehow learned to get a layer norm of 1 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers version 4.28.1.
I notice that in the facebook\optX models the LayerNorm weight is equal to 1 in all layers, means no parameter changed.
I checked the sizes 125m, 1.3b, 2.7b, 6.7b, 13b
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
from transformers import OPTModel
import torch
model = OPTModel.from_pretrained("facebook/opt-13b")
for m in model.modules():
if isinstance(m,torch.nn.LayerNorm):
(m.weight == 1).all()
Expected behavior
I get: (Expected to get different values)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
The text was updated successfully, but these errors were encountered: