Fused Attention #724

bearn01d · 2023-11-20T09:58:04Z

bearn01d
Nov 20, 2023

Is Fused Attention as described here also planned to be implemented for other ISAs, like AVX_VNNI?

Nov 21, 2023

Thank you for your interest of the Fused Attention optimization. We are planning on this. However, it will take some time as Fused Attention works only on activation which are more prone to quantization in terms of accuracy. In addition, we need to be more careful in terms of performance as there is no zero-cost weight quantization.

View full answer

DDEle · 2023-11-21T07:51:57Z

DDEle
Nov 21, 2023
Collaborator

Thank you for your interest of the Fused Attention optimization. We are planning on this. However, it will take some time as Fused Attention works only on activation which are more prone to quantization in terms of accuracy. In addition, we need to be more careful in terms of performance as there is no zero-cost weight quantization.

1 reply

bearn01d Nov 21, 2023
Author

Sounds good, thanks for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fused Attention #724

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Fused Attention #724

bearn01d Nov 20, 2023

Replies: 1 comment · 1 reply

DDEle Nov 21, 2023 Collaborator

bearn01d Nov 21, 2023 Author

bearn01d
Nov 20, 2023

Replies: 1 comment 1 reply

DDEle
Nov 21, 2023
Collaborator

bearn01d Nov 21, 2023
Author