Skip to content

Why not use nn.MultiheadAttention in vit? #283

Answered by rwightman
ZhiyuanChen asked this question in Q&A
Discussion options

You must be logged in to vote

@ZhiyuanChen I wasn't quite sure how the official version would look wrt to the attention module and how close it'd be to the PyTorch impl when I started. Plus it was pretty straightforward to just implement it as it is. I don't think the current PyTorch impl s much faster. The Apex one would likely be, but it's harder to work with.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ZhiyuanChen
Comment options

Answer selected by ZhiyuanChen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants