Add flash attention 2 #4200

adhikjoshi · 2023-07-21T18:47:56Z

Flash attention 2 helps in faster speeds, here is birch labs implementation in diffusers

https://gist.github.com/Birch-san/4315701264b72bb72e8eac5a529ee93a

@patrickvonplaten

sayakpaul · 2023-07-24T02:30:12Z

Can we have some inference timing comparison of diffusers with

SDPA
xformers
Flash attention 2?

Without a good handle on how much gains we can expect from using Flash Attention 2, it's hard for us to decide here.

Cc: @williamberman

patrickvonplaten · 2023-07-24T18:32:12Z

Very cool! I think we should support as soon as it's in xformers - think we're close to getting a new pypi version for xfromers: facebookresearch/xformers#795 (comment)

williamberman · 2023-07-25T01:10:36Z

Yeah once the xformers release is cut, you should have access to it. The api is the same so we shouldn't have to update the diffusers code. I'm going to close the issue since I don't think we need to make any changes to diffusers source :)

williamberman closed this as completed Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flash attention 2 #4200

Add flash attention 2 #4200

adhikjoshi commented Jul 21, 2023

sayakpaul commented Jul 24, 2023 •

edited

Loading

patrickvonplaten commented Jul 24, 2023

williamberman commented Jul 25, 2023

Add flash attention 2 #4200

Add flash attention 2 #4200

Comments

adhikjoshi commented Jul 21, 2023

sayakpaul commented Jul 24, 2023 • edited Loading

patrickvonplaten commented Jul 24, 2023

williamberman commented Jul 25, 2023

sayakpaul commented Jul 24, 2023 •

edited

Loading