[fix] Nystrom + microGPT + some additive masking #75

blefaudeux · 2021-11-03T23:46:23Z

What does this PR do?

Fixes [bug] Nystrom fails with microGPT #74, microGPT and causal attention work properly with Nystrom
Fix attention masks being bool or additive (not all the codepaths probably, but at least the ones that Nystrom touched)
Fix the small-sequence handling for sparse masks, this was broken prior to this PR and probably explains the hanging CI yesterday

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

blefaudeux · 2021-11-04T05:27:39Z

xformers/components/attention/core.py

        seq = q.shape[-2]
-
-        if not att_mask.is_sparse:
-            att_mask = att_mask[:seq, :seq]


we were testing for smaller sequences, but when the mask was sparse it was not adjusted -> possible memory error which showed up on CI later on

blefaudeux · 2021-11-04T05:50:38Z

converting back to draft, I think that the key padding mask is not handled correctly with my changes

blefaudeux · 2021-11-04T05:51:47Z

xformers/components/attention/nystrom.py

-                mask = (
-                    key_padding_mask
-                    if mask is None
-                    else mask.logical_and(key_padding_mask)


@dianaml0 I'm not sure of how that worked, since mask and key_padding_mask had different dimensions here, no ?

It should automatically broadcast key_padding_mask along the mismatched dimension, similar to https://github.com/pytorch/pytorch/blob/4262c8913c2bddb8d91565888b4871790301faba/torch/nn/functional.py#L5189

codecov-commenter · 2021-11-04T06:30:50Z

Codecov Report

Merging #75 (013a927) into main (962db66) will increase coverage by 0.01%.
The diff coverage is 87.87%.

@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   87.10%   87.12%   +0.01%     
==========================================
  Files          50       50              
  Lines        2428     2447      +19     
==========================================
+ Hits         2115     2132      +17     
- Misses        313      315       +2

Flag	Coverage Δ
Python	`87.12% <87.87%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...formers/components/attention/scaled_dot_product.py	`97.29% <ø> (ø)`
xformers/components/attention/core.py	`87.50% <81.81%> (-1.14%)`	⬇️
xformers/components/attention/nystrom.py	`89.69% <85.71%> (+0.80%)`	⬆️
xformers/components/attention/_sputnik_sparse.py	`95.55% <100.00%> (+0.06%)`	⬆️
xformers/components/attention/utils.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 962db66...013a927. Read the comment docs.

dianaml0

Thanks!! Really helpful changes! Makes masking easier to work with!

dianaml0 · 2021-11-04T15:44:55Z

tests/test_core_attention.py

@@ -30,7 +30,7 @@ def test_core_attention():
 def test_core_attention_mask_types():

    b, s, d = 8, 900, 32
-    prob = 0.5
+    prob = 0.8  # make sure that we trigger the sparse kernels


Oh oops, thanks for catching that!

dianaml0 · 2021-11-04T16:08:24Z

xformers/components/attention/nystrom.py

-                mask = (
-                    key_padding_mask
-                    if mask is None
-                    else mask.logical_and(key_padding_mask)


It should automatically broadcast key_padding_mask along the mismatched dimension, similar to https://github.com/pytorch/pytorch/blob/4262c8913c2bddb8d91565888b4871790301faba/torch/nn/functional.py#L5189

dianaml0 · 2021-11-04T16:18:40Z

xformers/components/attention/nystrom.py

+                )
+                key_padding_mask = bool_mask_to_additive(key_padding_mask)
+
+            assert key_padding_mask is not None  # mypy is drunk


I think if a return type is added to bool_mask_to_additive it may fix the mypy error

ahh good point, I'll fix that, thank you !

…the same thing, but it's not correct

* Add some 2d-specific attention patterns * Add notebook with examples

blefaudeux requested a review from dianaml0 November 3, 2021 23:46

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 3, 2021

blefaudeux requested review from jieru-hu and fmassa November 3, 2021 23:46

blefaudeux force-pushed the nystrom_causal_fix branch 2 times, most recently from 5eddcff to b52b135 Compare November 4, 2021 00:19

blefaudeux changed the title ~~[DRAFT] Nystrom + microGPT Now running~~ [DRAFT] Nystrom + microGPT Nov 4, 2021

blefaudeux force-pushed the nystrom_causal_fix branch from b52b135 to 6f626d7 Compare November 4, 2021 05:15

blefaudeux commented Nov 4, 2021

View reviewed changes

blefaudeux changed the title ~~[DRAFT] Nystrom + microGPT~~ [fix] Nystrom + microGPT Nov 4, 2021

blefaudeux changed the title ~~[fix] Nystrom + microGPT~~ [fix] Nystrom + microGPT + some additive masking Nov 4, 2021

blefaudeux marked this pull request as draft November 4, 2021 05:50

blefaudeux commented Nov 4, 2021

View reviewed changes

blefaudeux force-pushed the nystrom_causal_fix branch from d2ba0e3 to 013a927 Compare November 4, 2021 06:12

blefaudeux marked this pull request as ready for review November 4, 2021 15:35

dianaml0 approved these changes Nov 4, 2021

View reviewed changes

blefaudeux and others added 7 commits November 4, 2021 10:03

Now running

9c249e5

Fixing the sparse case

c5025c2

changelog update

ade2f17

fixing the additive mask test, this now works actually :)

0cfce28

docstrings

14ee1df

removing the mask mul test, passing because both sparse and dense do …

8018f69

…the same thing, but it's not correct

code review, fixing a small mypy issue

11f9d38

blefaudeux force-pushed the nystrom_causal_fix branch from 72defc3 to 11f9d38 Compare November 4, 2021 17:03

blefaudeux merged commit 1c702fb into main Nov 4, 2021

blefaudeux deleted the nystrom_causal_fix branch November 4, 2021 17:13

xwhan pushed a commit to xwhan/xformers that referenced this pull request Feb 8, 2022

[feat] Add some 2d patterns (facebookresearch#75)

2816f1b

* Add some 2d-specific attention patterns * Add notebook with examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Nystrom + microGPT + some additive masking #75

[fix] Nystrom + microGPT + some additive masking #75

blefaudeux commented Nov 3, 2021 •

edited

Loading

blefaudeux Nov 4, 2021

blefaudeux commented Nov 4, 2021

blefaudeux Nov 4, 2021

dianaml0 Nov 4, 2021

codecov-commenter commented Nov 4, 2021

dianaml0 left a comment

dianaml0 Nov 4, 2021

dianaml0 Nov 4, 2021

dianaml0 Nov 4, 2021

blefaudeux Nov 4, 2021

[fix] Nystrom + microGPT + some additive masking #75

[fix] Nystrom + microGPT + some additive masking #75

Conversation

blefaudeux commented Nov 3, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

blefaudeux Nov 4, 2021

Choose a reason for hiding this comment

blefaudeux commented Nov 4, 2021

blefaudeux Nov 4, 2021

Choose a reason for hiding this comment

dianaml0 Nov 4, 2021

Choose a reason for hiding this comment

codecov-commenter commented Nov 4, 2021

Codecov Report

dianaml0 left a comment

Choose a reason for hiding this comment

dianaml0 Nov 4, 2021

Choose a reason for hiding this comment

dianaml0 Nov 4, 2021

Choose a reason for hiding this comment

dianaml0 Nov 4, 2021

Choose a reason for hiding this comment

blefaudeux Nov 4, 2021

Choose a reason for hiding this comment

blefaudeux commented Nov 3, 2021 •

edited

Loading