Add Flash Attention v2 to Ops #1970

IzzyPutterman · 2023-07-20T03:20:40Z

I also dropped the do_scaled as it is no longer needed (no scaling done to the do in v2).

ptillet · 2023-07-20T04:40:08Z

It seems like the performance significantly decreased. Not sure why 🤔 May be worth comparing the TTIR and TTGIR of the tutorial vs what the ops generate

IzzyPutterman · 2023-07-20T04:44:11Z

I think the main change in the forward was the switch back from the MODES to the CAUSAL in 1 pass. I think converting the MODE==3 to FA v2 might require a little work.

I'll take a look at the ttir/ttgir as well as the MODEs

ptillet · 2023-07-20T16:31:06Z

Eh, I think this PR forgets to update the block size and num_stages for the fwd pass.

…/triton into iputterman/fa2-ops

IzzyPutterman · 2023-07-20T19:06:21Z

Yep, you are correct, I adjusted blocks and stages, locally on my A6000 forward perf is much improved.

IzzyPutterman · 2023-07-23T18:06:16Z

Updated the new forward pass numbers for FA based on CI.

I also dropped the do_scaled as it is no longer needed (no scaling done to the do in v2). --------- Co-authored-by: Philippe Tillet <phil@openai.com>

IzzyPutterman added 2 commits July 19, 2023 20:11

Flash Attention v2

cc12c8d

drop do_scaled in tutorial

4892cc4

IzzyPutterman requested a review from ptillet as a code owner July 20, 2023 03:20

ptillet approved these changes Jul 20, 2023

View reviewed changes

Merge branch 'main' into iputterman/fa2-ops

cce12d4

This was referenced Jul 20, 2023

support FlashAttention-2 pytorch/pytorch#105474

Closed

Flash Attention 2 facebookresearch/xformers#795

Closed

IzzyPutterman added 2 commits July 20, 2023 12:04

Adjust the blocks and num stages

ee9514c

Merge branch 'iputterman/fa2-ops' of https://github.com/IzzyPutterman…

081f103

…/triton into iputterman/fa2-ops

Update forward pass numbers

ac9403c

ptillet merged commit de6f053 into triton-lang:main Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flash Attention v2 to Ops #1970

Add Flash Attention v2 to Ops #1970

IzzyPutterman commented Jul 20, 2023

ptillet commented Jul 20, 2023

IzzyPutterman commented Jul 20, 2023

ptillet commented Jul 20, 2023

IzzyPutterman commented Jul 20, 2023

IzzyPutterman commented Jul 23, 2023

Add Flash Attention v2 to Ops #1970

Add Flash Attention v2 to Ops #1970

Conversation

IzzyPutterman commented Jul 20, 2023

ptillet commented Jul 20, 2023

IzzyPutterman commented Jul 20, 2023

ptillet commented Jul 20, 2023

IzzyPutterman commented Jul 20, 2023

IzzyPutterman commented Jul 23, 2023