Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change layernorm code to pytorch's native layer norm #1089

Merged
merged 1 commit into from
Aug 30, 2019

Conversation

dhpollack
Copy link
Contributor

The current code basically recreates pytorch's native LayerNorm code. The only difference is that the default eps in the pytorch function is 1e-5 instead of 1e-12. PyTorch's native version is optimized for cudnn so it should be faster than this version.

@codecov-io
Copy link

codecov-io commented Aug 23, 2019

Codecov Report

Merging #1089 into master will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1089      +/-   ##
==========================================
+ Coverage   79.61%   79.66%   +0.04%     
==========================================
  Files          42       42              
  Lines        6898     6898              
==========================================
+ Hits         5492     5495       +3     
+ Misses       1406     1403       -3
Impacted Files Coverage Δ
pytorch_transformers/modeling_bert.py 87.98% <ø> (ø) ⬆️
pytorch_transformers/file_utils.py 73.94% <0%> (+2.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e00b4ff...e13465f. Read the comment docs.

@honnibal
Copy link

honnibal commented Aug 29, 2019

I think your PR misses the point though? The models need to be 100% accurate reproductions of the Tensorflow code, right down to differences in eps values. Otherwise if you run the activations and get different results, you don't know whether there's a bug. You also can't reason about different results, and whether they matter.

@dhpollack
Copy link
Contributor Author

@honnibal but looking at the code, every call of BertLayerNorm explicitly sets the eps, thus the actual values used in the BERT models does not change. Only the default value, but this default value is never used. Additionally, if APEX is available then you use FusedLayerNorm, which uses the same default eps of 1e-5 as the pytorch default LayerNorm. So you already have an inconsistency, but you solved this by explicitly setting the eps every time you use the layer.

@honnibal
Copy link

Oh right! Fair point, sorry.

@thomwolf
Copy link
Member

Yes @dhpollack is right we can switch to PyTorch official LayerNorm.

What made me reimplement the LayerNorm when I was working on Bert last year was actually a typo in PyTorch's doc formula for computing the LayerNorm which indicated, at that time, that the epsilon was added to the square root of the variance instead of being added to the variance it-self. This typo is now corrected in pytorch/pytorch#8545.

Everything is right and we can drop these custom LayerNorms.

@thomwolf thomwolf merged commit 41f35d0 into huggingface:master Aug 30, 2019
@julien-c
Copy link
Member

Are we sure the names of the parameters are the same though? (eps vs. variance_epsilon)

julien-c referenced this pull request Aug 31, 2019
…etting it on the model

Instead we correctly store it on the config

(regenerating the hosted config files)

cc @LysandreJik
tholor pushed a commit to deepset-ai/FARM that referenced this pull request Oct 19, 2020
* [RAG] Bumping up transformers version to 3.3.x

* Use Pytorch's native LayerNorm code, with default eps as 1e-12. Refer huggingface/transformers#1089

Signed-off-by: lalitpagaria <pagaria.lalit@gmail.com>

* Using apex's FusedLayerNorm if available instead of Pytorch LayerNorm

* Remove pooling layer before converting to transformers

Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants