Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flash attention 2 #4200

Closed
adhikjoshi opened this issue Jul 21, 2023 · 3 comments
Closed

Add flash attention 2 #4200

adhikjoshi opened this issue Jul 21, 2023 · 3 comments

Comments

@adhikjoshi
Copy link

Flash attention 2 helps in faster speeds, here is birch labs implementation in diffusers

https://gist.github.com/Birch-san/4315701264b72bb72e8eac5a529ee93a

@patrickvonplaten

@sayakpaul
Copy link
Member

sayakpaul commented Jul 24, 2023

Can we have some inference timing comparison of diffusers with

  • SDPA
  • xformers
  • Flash attention 2?

Without a good handle on how much gains we can expect from using Flash Attention 2, it's hard for us to decide here.

Cc: @williamberman

@patrickvonplaten
Copy link
Contributor

Very cool! I think we should support as soon as it's in xformers - think we're close to getting a new pypi version for xfromers: facebookresearch/xformers#795 (comment)

@williamberman
Copy link
Contributor

Yeah once the xformers release is cut, you should have access to it. The api is the same so we shouldn't have to update the diffusers code. I'm going to close the issue since I don't think we need to make any changes to diffusers source :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants