triton结合flash-attn的方案 #2685
zhuchen1109
started this conversation in
General
Replies: 2 comments 2 replies
-
准确率影响应该不大,只是换了实现方式 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
想请问下,lmdeploy中triton如果支持flash-attn实现这个是否对推理有提升?我在triton的案例中有看到支持flash-attn。
Beta Was this translation helpful? Give feedback.
All reactions