[Backport] Make a FlashAttention Wrapper #6827

alanwaketan · 2024-03-27T00:52:32Z

Summary:
This pull request introduces a FlashAttention wrapper that aims to:

Test Plan:
PJRT_DEVICE=TPU python test/test_pallas.py -v -k test_flash_attention_wrapper

tmp introduce flash_attention Add test case Fix the test Fix linters

JackCaoG

How many more pallas related pr you need to backport? Ideally we should not backport feature to the 2.3 branch anymore.

alanwaketan · 2024-03-27T00:58:53Z

Most of them are landed last week. Just they all have dependence. So I have to back port them one by one...

alanwaketan · 2024-03-27T01:00:15Z

The flash attention forward related feature are done. I will only backport the fixes from now on.

For backward and distributed, I won't backport them to 2.3.

tmp

ed5254e

tmp introduce flash_attention Add test case Fix the test Fix linters

alanwaketan requested review from lsy323 and JackCaoG March 27, 2024 00:52

alanwaketan mentioned this pull request Mar 27, 2024

2.3 backport PR request list #6676

Closed

JackCaoG approved these changes Mar 27, 2024

View reviewed changes

lsy323 approved these changes Mar 27, 2024

View reviewed changes

lsy323 merged commit db7112a into r2.3 Mar 27, 2024
17 checks passed

Provide feedback