facebook\opt layer norm #24243

ghost · 2023-06-13T12:06:44Z

System Info

transformers version 4.28.1.

I notice that in the facebook\optX models the LayerNorm weight is equal to 1 in all layers, means no parameter changed.

I checked the sizes 125m, 1.3b, 2.7b, 6.7b, 13b

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import OPTModel
import torch

model = OPTModel.from_pretrained("facebook/opt-13b")
for m in model.modules():
if isinstance(m,torch.nn.LayerNorm):
(m.weight == 1).all()

Expected behavior

I get: (Expected to get different values)

tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)
tensor(True)

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-06-13T13:18:00Z

Hi @CompressTeam, thanks for raising this issue!

I believe this behaviour is likely coming from the fact that the layer norm layers are instantiated with elementwise_affine=True e.g. here (as default config value is True). This instantiates the layer with all weight values as 1, and biases as 0.

Playing quickly with the snippet provided, I can see that the biases are all different values, so it would seem that either only the biases were updated when training the model, there's been a error in weight conversion or an issue with weight saving.

I'll hand over to @younesbelkada who added the model as is most familiar with layer norm related logic like config._remove_final_layer_norm

younesbelkada · 2023-06-13T16:35:26Z

Hi @CompressTeam

I think that this is expected, see this interesting thread from the authors: #17653 and in particular these 2 messages: #17653 (comment) / #17653 (comment) from what I have understood the models somehow learned to get a layer norm of 1

github-actions · 2023-07-13T15:02:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

facebook\opt layer norm #24243

facebook\opt layer norm #24243

ghost commented Jun 13, 2023

amyeroberts commented Jun 13, 2023

younesbelkada commented Jun 13, 2023

github-actions bot commented Jul 13, 2023

facebook\opt layer norm #24243

facebook\opt layer norm #24243

Comments

ghost commented Jun 13, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Jun 13, 2023

younesbelkada commented Jun 13, 2023

github-actions bot commented Jul 13, 2023