Implement Quiet Softmax (`Attention Is Off By One`) #692

wbrickner · 2023-08-24T22:04:06Z

Pull Request Template

Checklist

Confirm that run-checks script has been executed.

Related Issues/PRs

Support "quiet softmax" (Attention Is Off By One) #691

Changes

I propose a quiet_softmax activation function (motivated by Attention Is Off By One).
I propose a quiet_softmax configuration field for MultiHeadAttention (on MultiHeadAttentionConfig), and to provide this field as well for all the layers that use MultiHeadAttention internally (like TransformerEncoder).
There is a case to be made that this should be the default softmax implementation (It is not default enabled in this PR)

Testing

Run checks.

nathanielsimard

Just a small change, but thanks a lot for implementing this.

The paper is indeed very interesting, and it's nice to provide that option as well. Similarly to using normalization first, it's not the default, but it's probably the better option.

burn-tensor/src/tensor/activation/base.rs

louisfd

Hi @wbrickner
I'm happy to see how as a community we can respond quickly to new things in our field. Thanks a lot!
I have only one request, that the new function be tested, see my comment

burn-tensor/src/tensor/activation/base.rs

nathanielsimard · 2023-09-05T14:30:31Z

@wbrickner Can you fix the comments and the merge conflicts? I think it would be ready to be merged :)

antimora · 2023-11-17T16:38:53Z

@wbrickner, let us know if you need help with this. We do not wish to lose such important addition to Burn.

wbrickner · 2023-11-17T16:45:14Z

got busy, I will implement this again today.

wbrickner · 2023-11-22T02:59:35Z

Hey, so turns out I do need help! Sorry for screwing up the issue format by accidentally closing. I am having a lot of trouble getting burn checks to pass locally after pulling a fresh copy of burn. The implementation has been rewritten in the new copy (since a few days ago). It's fairly simple. My other road block is computing the derivatives by hand to get correct values to test against, the expressions get quite complicated. I suppose I should just write it out by hand and refresh on the differentiation rules haha. Sorry this was left open for so many months! I would like to move forward with it!

antimora · 2023-11-22T04:32:51Z

Sounds good! We will look into this.

nathanielsimard

I think you need to run the correct formating: cargo fmt --all

wbrickner · 2023-11-22T21:18:09Z

I think you need to run the correct formating: cargo fmt --all

To clarify, I should format my changes and update the remote repo attached to this PR?
The checks pass in github actions, is this blocked from merging?

antimora · 2023-11-22T21:57:43Z

I think you need to run the correct formating: cargo fmt --all

To clarify, I should format my changes and update the remote repo attached to this PR? The checks pass in github actions, is this blocked from merging?

OK, I have rebased it and ran the formatting. Now it's passing the checks. We probably need an approval to be merged.

nathanielsimard

There are formatting issues in burn-derive cargo fmt has a hard time fixing them because of macros, but no changes should actually be made in burn-derive.

nathanielsimard · 2023-11-23T17:17:17Z

burn-derive/src/config/analyzer_enum.rs

+      let variant_name = &variant.ident;
+      let (variant_input, variant_output) = self.gen_variant_field(variant);

-            quote! { Self::#variant_name #variant_input => #enum_name::#variant_name #variant_output }
-        });
+      quote! { Self::#variant_name #variant_input => #enum_name::#variant_name #variant_output }
+    });


That formatting seems odd to me and somehow fmt can't update it 🤔

What would you like me to do to resolve this issue and merge the PR? cargo fmt --all results in a bit identical repository. Is the issue these two lines of whitespace? Should I modify them manually? You mention no changes should be made to burn-derive. Would like to get this closed out, apologies on the silliness of these problems.

You can just reset all the changes under the burn-derive directory to origin/main. :)

It's due to the quote! macro. rust-lang/rustfmt#8

If you comment out quote!, format, and uncomment, it'll do the right thing.

aaaaaactshully it's due to the default max_width = 100. If you add a file called rustfmt.toml and in it is max_width = 110 then format, it works. Related. I'll open another PR about this tomorrow since it causes changes elsewhere and deserves its own discussion.

louisfd

This has been pending long enough 😅 I'll merge it, I think the formatting is fine

wbrickner mentioned this pull request Aug 24, 2023

Support "quiet softmax" (Attention Is Off By One) #691

Closed

nathanielsimard approved these changes Aug 25, 2023

View reviewed changes

nathanielsimard requested changes Aug 25, 2023

View reviewed changes

burn-tensor/src/tensor/activation/base.rs Outdated Show resolved Hide resolved

louisfd requested changes Aug 25, 2023

View reviewed changes

burn-tensor/src/tensor/activation/base.rs Outdated Show resolved Hide resolved

burn-tensor/src/tensor/activation/base.rs Show resolved Hide resolved

wbrickner closed this Nov 18, 2023

wbrickner force-pushed the main branch from 17ce603 to 8f1526b Compare November 18, 2023 23:29

wbrickner added 2 commits November 18, 2023 20:29

Added quiet_softmax

9356cb2

Undid bad formatting

395e9e3

wbrickner reopened this Nov 22, 2023

nathanielsimard reviewed Nov 22, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into pr/692

8deb71d

antimora requested review from louisfd and nathanielsimard November 22, 2023 21:57

nathanielsimard requested changes Nov 23, 2023

View reviewed changes

AlexErrant mentioned this pull request Nov 29, 2023

cargo +nightly fmt #1017

Merged

2 tasks

louisfd approved these changes Nov 30, 2023

View reviewed changes

louisfd merged commit 03af140 into tracel-ai:main Nov 30, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Quiet Softmax (`Attention Is Off By One`) #692

Implement Quiet Softmax (`Attention Is Off By One`) #692

wbrickner commented Aug 24, 2023

nathanielsimard left a comment

louisfd left a comment

nathanielsimard commented Sep 5, 2023

antimora commented Nov 17, 2023

wbrickner commented Nov 17, 2023

wbrickner commented Nov 22, 2023

antimora commented Nov 22, 2023

nathanielsimard left a comment

wbrickner commented Nov 22, 2023

antimora commented Nov 22, 2023

nathanielsimard left a comment

nathanielsimard Nov 23, 2023

wbrickner Nov 25, 2023

nathanielsimard Nov 27, 2023

AlexErrant Nov 27, 2023

AlexErrant Nov 27, 2023 •

edited

Loading

louisfd left a comment

Implement Quiet Softmax (Attention Is Off By One) #692

Implement Quiet Softmax (Attention Is Off By One) #692

Conversation

wbrickner commented Aug 24, 2023

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

nathanielsimard left a comment

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

nathanielsimard commented Sep 5, 2023

antimora commented Nov 17, 2023

wbrickner commented Nov 17, 2023

wbrickner commented Nov 22, 2023

antimora commented Nov 22, 2023

nathanielsimard left a comment

Choose a reason for hiding this comment

wbrickner commented Nov 22, 2023

antimora commented Nov 22, 2023

nathanielsimard left a comment

Choose a reason for hiding this comment

nathanielsimard Nov 23, 2023

Choose a reason for hiding this comment

wbrickner Nov 25, 2023

Choose a reason for hiding this comment

nathanielsimard Nov 27, 2023

Choose a reason for hiding this comment

AlexErrant Nov 27, 2023

Choose a reason for hiding this comment

AlexErrant Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

Implement Quiet Softmax (`Attention Is Off By One`) #692

Implement Quiet Softmax (`Attention Is Off By One`) #692

AlexErrant Nov 27, 2023 •

edited

Loading