-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change layernorm code to pytorch's native layer norm #1089
change layernorm code to pytorch's native layer norm #1089
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1089 +/- ##
==========================================
+ Coverage 79.61% 79.66% +0.04%
==========================================
Files 42 42
Lines 6898 6898
==========================================
+ Hits 5492 5495 +3
+ Misses 1406 1403 -3
Continue to review full report at Codecov.
|
I think your PR misses the point though? The models need to be 100% accurate reproductions of the Tensorflow code, right down to differences in eps values. Otherwise if you run the activations and get different results, you don't know whether there's a bug. You also can't reason about different results, and whether they matter. |
@honnibal but looking at the code, every call of |
Oh right! Fair point, sorry. |
Yes @dhpollack is right we can switch to PyTorch official LayerNorm. What made me reimplement the LayerNorm when I was working on Bert last year was actually a typo in PyTorch's doc formula for computing the LayerNorm which indicated, at that time, that the epsilon was added to the square root of the variance instead of being added to the variance it-self. This typo is now corrected in pytorch/pytorch#8545. Everything is right and we can drop these custom LayerNorms. |
Are we sure the names of the parameters are the same though? ( |
…etting it on the model Instead we correctly store it on the config (regenerating the hosted config files) cc @LysandreJik
* [RAG] Bumping up transformers version to 3.3.x * Use Pytorch's native LayerNorm code, with default eps as 1e-12. Refer huggingface/transformers#1089 Signed-off-by: lalitpagaria <pagaria.lalit@gmail.com> * Using apex's FusedLayerNorm if available instead of Pytorch LayerNorm * Remove pooling layer before converting to transformers Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
The current code basically recreates pytorch's native LayerNorm code. The only difference is that the default eps in the pytorch function is 1e-5 instead of 1e-12. PyTorch's native version is optimized for cudnn so it should be faster than this version.