Multi-Head Attention Upgrade solution 1 #51
Dongbin-Lee-git
started this conversation in
tip
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
저희 transformers에 layer norm을 없애고 성능향상이랑 학습시간을 단축할수있는 방법이 작년에 논문으로 발표가되었네요 👍
https://arxiv.org/abs/2003.04887
생각보다 어렵지 않구 간단한 작업일 것 같아서 관심있으신분은 issues에 올리시구 적용해보시면 될 것 같아요!
저는 지금 weight initialize보는중이라..
Beta Was this translation helpful? Give feedback.
All reactions