change layernorm code to pytorch's native layer norm #1089

dhpollack · 2019-08-23T10:18:23Z

The current code basically recreates pytorch's native LayerNorm code. The only difference is that the default eps in the pytorch function is 1e-5 instead of 1e-12. PyTorch's native version is optimized for cudnn so it should be faster than this version.

codecov-io · 2019-08-23T10:22:23Z

Codecov Report

Merging #1089 into master will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1089      +/-   ##
==========================================
+ Coverage   79.61%   79.66%   +0.04%     
==========================================
  Files          42       42              
  Lines        6898     6898              
==========================================
+ Hits         5492     5495       +3     
+ Misses       1406     1403       -3

Impacted Files	Coverage Δ
pytorch_transformers/modeling_bert.py	`87.98% <ø> (ø)`	⬆️
pytorch_transformers/file_utils.py	`73.94% <0%> (+2.11%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e00b4ff...e13465f. Read the comment docs.

honnibal · 2019-08-29T08:50:40Z

I think your PR misses the point though? The models need to be 100% accurate reproductions of the Tensorflow code, right down to differences in eps values. Otherwise if you run the activations and get different results, you don't know whether there's a bug. You also can't reason about different results, and whether they matter.

dhpollack · 2019-08-29T12:03:43Z

@honnibal but looking at the code, every call of BertLayerNorm explicitly sets the eps, thus the actual values used in the BERT models does not change. Only the default value, but this default value is never used. Additionally, if APEX is available then you use FusedLayerNorm, which uses the same default eps of 1e-5 as the pytorch default LayerNorm. So you already have an inconsistency, but you solved this by explicitly setting the eps every time you use the layer.

honnibal · 2019-08-29T13:21:29Z

Oh right! Fair point, sorry.

thomwolf · 2019-08-30T12:45:51Z

Yes @dhpollack is right we can switch to PyTorch official LayerNorm.

What made me reimplement the LayerNorm when I was working on Bert last year was actually a typo in PyTorch's doc formula for computing the LayerNorm which indicated, at that time, that the epsilon was added to the square root of the variance instead of being added to the variance it-self. This typo is now corrected in pytorch/pytorch#8545.

Everything is right and we can drop these custom LayerNorms.

julien-c · 2019-08-30T12:51:02Z

Are we sure the names of the parameters are the same though? (eps vs. variance_epsilon)

@LysandreJik

…etting it on the model Instead we correctly store it on the config (regenerating the hosted config files) cc @LysandreJik

@thomwolf

See #1089 cc @thomwolf @LysandreJik Also @dhpollack

* [RAG] Bumping up transformers version to 3.3.x * Use Pytorch's native LayerNorm code, with default eps as 1e-12. Refer huggingface/transformers#1089 Signed-off-by: lalitpagaria <pagaria.lalit@gmail.com> * Using apex's FusedLayerNorm if available instead of Pytorch LayerNorm * Remove pooling layer before converting to transformers Co-authored-by: Bogdan Kostić <bogdankostic@web.de>

change layernorm code to pytorch's native layer norm

e13465f

thomwolf merged commit 41f35d0 into huggingface:master Aug 30, 2019

Zenglinxiao mentioned this pull request Aug 30, 2019

[won't merge - v1 codebase] Bert OpenNMT/OpenNMT-py#1543

Open

julien-c referenced this pull request Aug 31, 2019

[RoBERTa] LayerNorm's eps is not a nn.Parameter so there's no point s…

574c5b3

…etting it on the model Instead we correctly store it on the config (regenerating the hosted config files) cc @LysandreJik

julien-c added a commit that referenced this pull request Aug 31, 2019

[XLNet] Use pytorch's layernorm like in BERT

1d438f1

See #1089 cc @thomwolf @LysandreJik Also @dhpollack

thomwolf mentioned this pull request Sep 2, 2019

apex fp16 FusedLayerNorm type issues #1172

Closed

2 tasks

leezu mentioned this pull request Nov 13, 2019

[Bugfix] TransformerXL LayerNorm eps and XLNet pretrained model config dmlc/gluon-nlp#1005

Merged

5 tasks

lalitpagaria mentioned this pull request Oct 13, 2020

[RAG] Bumping up transformers version to 3.3.x deepset-ai/FARM#579

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change layernorm code to pytorch's native layer norm #1089

change layernorm code to pytorch's native layer norm #1089

dhpollack commented Aug 23, 2019

codecov-io commented Aug 23, 2019 •

edited

Loading

honnibal commented Aug 29, 2019 •

edited

Loading

dhpollack commented Aug 29, 2019

honnibal commented Aug 29, 2019

thomwolf commented Aug 30, 2019

julien-c commented Aug 30, 2019

change layernorm code to pytorch's native layer norm #1089

change layernorm code to pytorch's native layer norm #1089

Conversation

dhpollack commented Aug 23, 2019

codecov-io commented Aug 23, 2019 • edited Loading

Codecov Report

honnibal commented Aug 29, 2019 • edited Loading

dhpollack commented Aug 29, 2019

honnibal commented Aug 29, 2019

thomwolf commented Aug 30, 2019

julien-c commented Aug 30, 2019

codecov-io commented Aug 23, 2019 •

edited

Loading

honnibal commented Aug 29, 2019 •

edited

Loading