[feat] Adding Visual Attention #329

blefaudeux · 2022-06-08T21:41:50Z

What does this PR do?

Fixes #319. Note that to reproduce the paper you need the Conv2DFeedforward introduced here #321, and a metaformer-like structure

Before submitting

Did you have fun?
- Make sure you had fun coding 🙃
Did you read the contributor guideline?
Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you make sure to update the docs?
Did you write any new necessary tests?
Did you update the changelog? (if needed)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

blefaudeux · 2022-06-08T21:44:27Z

cc @mannatsingh if you're interested in these things

blefaudeux · 2022-06-08T23:35:28Z

training a 6.7M metaformer-derived model on cifar10 with the visual attention from this PR: (for the record resnet18 is 11M params and goes to 93%)

blefaudeux · 2022-06-08T23:54:12Z

examples/cifarMetaformer.py

@@ -121,8 +121,8 @@ def forward(self, x):

    # Adjust batch depending on the available memory on your machine.
    # You can also use reversible layers to save memory
-    REF_BATCH = 512
-    BATCH = 512  # lower if not enough GPU memory
+    REF_BATCH = 768


looks like a classic default for Cifar10

blefaudeux · 2022-06-08T23:54:53Z

examples/cifarMetaformer.py

@@ -31,6 +31,7 @@ def __init__(
        num_classes=10,
        dim=384,
        attention="scaled_dot_product",
+        feedforward="MLP",


I'm not sure about the defaults here, how to show that you can use these to repro "Visual Attention" for instance ? Should we show different presets ?

blefaudeux · 2022-06-08T23:56:26Z

xformers/components/attention/pooling.py

@@ -62,6 +62,10 @@ def __init__(
        # This operator does not really handle q,k,v
        self.requires_same_k_q_dimensions = True

+        # This attention requires the 2d structure out of the context,
+        # implictly assumed to be a squared length
+        self.requires_squared_context = True


this was already true before, but not formalized like this, I think it's cleaner ? "pooling" (PoolingFormer) and "visual" both recover the 2d structure of and assume a squared context length for that

blefaudeux · 2022-06-08T23:57:17Z

xformers/components/attention/visual.py

+        H = int(math.sqrt(HW))
+        assert H * H == HW
+
+        x = q.transpose(-2, -1).reshape(B, C, H, H)


I've not benchmarked that, but maybe that it's beneficial to .contiguous() here, depending on the Conv2D kernels

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 8, 2022

blefaudeux force-pushed the visual_attention branch 2 times, most recently from c9282b7 to bf34467 Compare June 8, 2022 23:03

blefaudeux changed the title ~~[DRAFT][feat] Adding Visual Attention~~ [feat] Adding Visual Attention Jun 8, 2022

blefaudeux requested review from fmassa and dianaml0 and removed request for fmassa and dianaml0 June 8, 2022 23:34

blefaudeux requested a review from mannatsingh June 8, 2022 23:39

blefaudeux force-pushed the visual_attention branch 2 times, most recently from 3adc3ac to 7fb47e6 Compare June 8, 2022 23:51

Adding Visual Attention

696d178

blefaudeux force-pushed the visual_attention branch from 7fb47e6 to 696d178 Compare June 8, 2022 23:53

blefaudeux commented Jun 8, 2022

View reviewed changes

dianaml0 approved these changes Jun 9, 2022

View reviewed changes

blefaudeux merged commit efdd15a into main Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Adding Visual Attention #329

[feat] Adding Visual Attention #329

blefaudeux commented Jun 8, 2022 •

edited

Loading

blefaudeux commented Jun 8, 2022

blefaudeux commented Jun 8, 2022 •

edited

Loading

blefaudeux Jun 8, 2022

blefaudeux Jun 8, 2022

blefaudeux Jun 8, 2022

blefaudeux Jun 8, 2022

[feat] Adding Visual Attention #329

[feat] Adding Visual Attention #329

Conversation

blefaudeux commented Jun 8, 2022 • edited Loading

What does this PR do?

Before submitting

PR review

blefaudeux commented Jun 8, 2022

blefaudeux commented Jun 8, 2022 • edited Loading

blefaudeux Jun 8, 2022

Choose a reason for hiding this comment

blefaudeux Jun 8, 2022

Choose a reason for hiding this comment

blefaudeux Jun 8, 2022

Choose a reason for hiding this comment

blefaudeux Jun 8, 2022

Choose a reason for hiding this comment

blefaudeux commented Jun 8, 2022 •

edited

Loading

blefaudeux commented Jun 8, 2022 •

edited

Loading