Bump flash-attn to v2.0.4 #816

tmm1 · 2023-08-03T18:02:34Z

What does this PR do?

Fixes #805

see https://github.com/Dao-AILab/flash-attention/commits/main for recent fixes

cc #712 Dao-AILab/flash-attention#359 Dao-AILab/flash-attention#334
cc #795 @danthe3rd

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

codecov-commenter · 2023-08-04T19:34:05Z

Codecov Report

Patch coverage: 96.15% and project coverage change: +0.12% 🎉

Comparison is base (f525106) 81.73% compared to head (e115d8e) 81.85%.
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #816      +/-   ##
==========================================
+ Coverage   81.73%   81.85%   +0.12%     
==========================================
  Files          96       96              
  Lines        6401     6427      +26     
==========================================
+ Hits         5232     5261      +29     
+ Misses       1169     1166       -3

Flag	Coverage Δ
Python	`81.85% <96.15%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
xformers/ops/fmha/flash.py	`48.46% <80.00%> (-1.17%)`	⬇️
xformers/ops/fmha/triton.py	`70.37% <100.00%> (+14.67%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ghunkins · 2023-08-08T13:54:34Z

Would love to see this added!

danthe3rd

Hi,
Thanks for opening this PR!
Happy to merge once you have reverted the changes to nvcc_flags in setup.py

danthe3rd · 2023-08-11T07:30:16Z

setup.py

            ]
-        extra_compile_args["nvcc"] = nvcc_flags
-
-        ext_modules += get_flash_attention_extensions(
-            cuda_version=cuda_version, extra_compile_args=extra_compile_args
-        )

-        # NOTE: This should not be applied to Flash-Attention
-        # see https://github.com/Dao-AILab/flash-attention/issues/359
-        extra_compile_args["nvcc"] += [
+        nvcc_flags += [
            # Workaround for a regression with nvcc > 11.6
            # See https://github.com/facebookresearch/xformers/issues/712
            "--ptxas-options=-O2",
            "--ptxas-options=-allow-expensive-optimizations=true",
        ]
+        extra_compile_args["nvcc"] = nvcc_flags
+
+        ext_modules += get_flash_attention_extensions(
+            cuda_version=cuda_version, extra_compile_args=extra_compile_args
+        )



Can we revert these changes? I don't think we need to use O2 for Flash-attention (and also might make performance worse)

sure, i pulled out that commit. thanks for the feedback.

danthe3rd

Awesome! Thank you for your contribution

fmassa · 2023-08-11T09:41:43Z

Looks like this PR gives wrong results for flashattn backend. We need to investigate this, but in the meantime we might revert it

tmm1 · 2023-08-11T14:58:35Z

Is there a failing test somewhere, or what exactly is the issue?

fmassa · 2023-08-11T15:06:25Z

There are some failing tests (which run on internal infra), but not as many as I originally thought.

Here they are

danthe3rd · 2023-08-11T15:49:31Z

Here is a simple repro (I get the failures on A100 at least):

$ python -m pytest tests/test_mem_eff_attention.py -k "test_backward[flshattBv2-cuda-torch.float16-BlockDiagonalCausalMask"

===================================================================================== short test summary info ======================================================================================
FAILED tests/test_mem_eff_attention.py::test_backward[flshattBv2-cuda-torch.float16-BlockDiagonalCausalMask-1-256-2-1-32-32-False-BMHK] - AssertionError: cutlassF+flshattBv2:query: out=nan and ref=0.0 (diff=nan > 0) at (0, 174, 0, 0) of shape (1, 256, 1, 32) / atol=0.09, rtol=0.02/ total failing elements: 0, percentage=0.0
FAILED tests/test_mem_eff_attention.py::test_backward[flshattBv2-cuda-torch.float16-BlockDiagonalCausalMask-1-256-2-1-32-32-True-BMHK] - AssertionError: cutlassF+flshattBv2:query: out=nan and ref=0.0 (diff=nan > 0) at (0, 174, 0, 0) of shape (1, 256, 1, 32) / atol=0.09, rtol=0.02/ total failing elements: 0, percentage=0.0
FAILED tests/test_mem_eff_attention.py::test_backward[flshattBv2-cuda-torch.float16-BlockDiagonalCausalMask-1-256-15-1-32-32-False-BMHK] - AssertionError: flshattFv2+flshattBv2:query: out=nan and ref=0.0 (diff=nan > 0) at (0, 245, 0, 0) of shape (1, 256, 1, 32) / atol=0.09, rtol=0.02/ total failing elements: 0, percentage=0.0
FAILED tests/test_mem_eff_attention.py::test_backward[flshattBv2-cuda-torch.float16-BlockDiagonalCausalMask-1-256-15-1-32-32-True-BMHK] - AssertionError: flshattFv2+flshattBv2:query: out=nan and ref=0.0 (diff=nan > 0) at (0, 245, 0, 0) of shape (1, 256, 1, 32) / atol=0.09, rtol=0.02/ total failing elements: 0, percentage=0.0
==================================================================== 4 failed, 120 passed, 8 skipped, 14123 deselected in 8.11s ====================================================================

I'll investigate this and send an MVP to Tri if it's a bug in Flash-Attention

danthe3rd · 2023-08-11T16:01:53Z

This is indeed a bug in Flash-Attention - I opened an issue in Dao-AILab/flash-attention#443

tmm1 · 2023-08-11T16:18:58Z

@danthe3rd in your bug repro it still fails with bfloat16 for me, but the tests for bfloat16 pass here?

pytest tests/test_mem_eff_attention.py -vk "test_backward and flshattBv2-cuda and torch.bfloat16 and BlockDiagonalCausalMask"

tmm1 · 2023-08-11T16:26:07Z

Tests pass again if we revert 698532d as well.

danthe3rd · 2023-08-11T16:39:08Z

Yes indeed. I was hopping that Tri might have some insight on what is causing this bug if there is a narrower condition maybe. Might also be specific to variable sequence length (BlockDiagonalCausalMask)

jinqiua · 2023-08-17T12:58:17Z

I try to change xops.fmha.cutlass.FwOp() to xops.fmha.flash.FwOp(), but it doesn`t get any speedup result.
(I use Xformer 0.0.21+ba5b449.d20230817), @tmm1

Bump flash-attn to v2.0.4

8d1a8e5

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 3, 2023

This was referenced Aug 3, 2023

Flash Attention 2 #795

Closed

Flash Attention V2 vllm-project/vllm#485

Closed

tmm1 added 2 commits August 3, 2023 18:37

flash-attn v2.0.3 fixed this issue

698532d

prompt user to update submodules again if they forgot to use --recursive

fc48943

tmm1 force-pushed the bump-flash-attn branch from 038a4a5 to e115d8e Compare August 4, 2023 18:30

danthe3rd reviewed Aug 11, 2023

View reviewed changes

fix for latest flash-attn function signatures

e654647

tmm1 force-pushed the bump-flash-attn branch from e115d8e to e654647 Compare August 11, 2023 08:53

danthe3rd approved these changes Aug 11, 2023

View reviewed changes

danthe3rd merged commit eadc8c6 into facebookresearch:main Aug 11, 2023
1 check passed

danthe3rd mentioned this pull request Aug 11, 2023

[flashv2/BW] nan in some configurations Dao-AILab/flash-attention#443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump flash-attn to v2.0.4 #816

Bump flash-attn to v2.0.4 #816

tmm1 commented Aug 3, 2023 •

edited

Loading

codecov-commenter commented Aug 4, 2023 •

edited

Loading

ghunkins commented Aug 8, 2023

danthe3rd left a comment

danthe3rd Aug 11, 2023

tmm1 Aug 11, 2023

danthe3rd left a comment

fmassa commented Aug 11, 2023

tmm1 commented Aug 11, 2023

fmassa commented Aug 11, 2023

danthe3rd commented Aug 11, 2023

danthe3rd commented Aug 11, 2023

tmm1 commented Aug 11, 2023 •

edited

Loading

tmm1 commented Aug 11, 2023

danthe3rd commented Aug 11, 2023

jinqiua commented Aug 17, 2023

Bump flash-attn to v2.0.4 #816

Bump flash-attn to v2.0.4 #816

Conversation

tmm1 commented Aug 3, 2023 • edited Loading

What does this PR do?

Before submitting

PR review

codecov-commenter commented Aug 4, 2023 • edited Loading

Codecov Report

ghunkins commented Aug 8, 2023

danthe3rd left a comment

Choose a reason for hiding this comment

danthe3rd Aug 11, 2023

Choose a reason for hiding this comment

tmm1 Aug 11, 2023

Choose a reason for hiding this comment

danthe3rd left a comment

Choose a reason for hiding this comment

fmassa commented Aug 11, 2023

tmm1 commented Aug 11, 2023

fmassa commented Aug 11, 2023

danthe3rd commented Aug 11, 2023

danthe3rd commented Aug 11, 2023

tmm1 commented Aug 11, 2023 • edited Loading

tmm1 commented Aug 11, 2023

danthe3rd commented Aug 11, 2023

jinqiua commented Aug 17, 2023

tmm1 commented Aug 3, 2023 •

edited

Loading

codecov-commenter commented Aug 4, 2023 •

edited

Loading

tmm1 commented Aug 11, 2023 •

edited

Loading