Multi-Head Attention Upgrade solution 1 #51

Dongbin-Lee-git started this conversation in tip

Dongbin-Lee-git
Jun 5, 2021
Maintainer

저희 transformers에 layer norm을 없애고 성능향상이랑 학습시간을 단축할수있는 방법이 작년에 논문으로 발표가되었네요 👍
https://arxiv.org/abs/2003.04887

생각보다 어렵지 않구 간단한 작업일 것 같아서 관심있으신분은 issues에 올리시구 적용해보시면 될 것 같아요!
저는 지금 weight initialize보는중이라..

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment