This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
Fused Attention
#724
-
Is Fused Attention as described here also planned to be implemented for other ISAs, like AVX_VNNI? |
Beta Was this translation helpful? Give feedback.
Answered by
DDEle
Nov 21, 2023
Replies: 1 comment 1 reply
-
Thank you for your interest of the Fused Attention optimization. We are planning on this. However, it will take some time as Fused Attention works only on activation which are more prone to quantization in terms of accuracy. In addition, we need to be more careful in terms of performance as there is no zero-cost weight quantization. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
DDEle
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for your interest of the Fused Attention optimization. We are planning on this. However, it will take some time as Fused Attention works only on activation which are more prone to quantization in terms of accuracy. In addition, we need to be more careful in terms of performance as there is no zero-cost weight quantization.