[feat] Dropout partial bw fusion (second take) #164

blefaudeux · 2021-12-23T05:06:27Z

What does this PR do?

TL:DR:
Dropout(activation(x+bias)) layer, now with partially fused BW and much better FW perf (bandwidth bound mostly)
Previously broken in #144 since the end accuracy was not as good as pytorch's. Now fixed

The longer version:
Re-opening #144, I finally got the time to look into that (being blocked on the mem efficient attention by a triton bug :) ). So the issue was in the random number generation, which calls tl.rand4intx from Triton, starting from a random seed that Pytorch generates. This seems to break if the number of generated numbers is too big. This PR introduces a processing per tile, each of the tiles being covered by one seed initially + these random numbers. The solution was just to limit the size of the tiles / use more seeds, and adjust the other hyperparameters accordingly (number of warps), the perf is not really changed.

Training on minGPT
(orange is the previous take, red is pure pytorch MLP, green is MLP with this layer, scale is log)
one epoch of minGPT:

Friends of my soul and flatterers, your forfeit,
Go, give me leave, and with my comfort than
'Tis mine own greatness.

SICINIUS:
I know you not.

MENENIUS:
Consider that which not.
Being once the tackless rock about his soldier,
Is all the fiery ports of this case,
But only in your resolve I have sold to Ravenspurgh,
And not omity my father's death with me?
Wife, and there I will take such on him
That you shall steal me on the business
And then was cruel for it; for in his pleasure,
Beyond the whole of her faults, and full of farewells;
For welcome but thy father return'd could not,
Call their defence and them accuse their wifes,
Suppose two and to be glads.

First Citizen:
He says let Turkey, will not be altered; since led
you on the foundations?

MISTRESS OVERDONE:
Look, what I have done this holy word to fine
He would contemplate; and there my father
Hath the assistantied by as the chair
Where is he was fulfilled: and his way is rather
Hath some his bonnet state, and in his face

some perf curves (if bias is False then the perf gap grows even more)
-- training

-- forward pass only (still with dropout)

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

using less seeds tiling + vertical seeds Computing the FW and BW per tile over M better scheduling defaults, improves across the board good enough perfs catching the slow case and diverting to pytorch in that case

blefaudeux · 2021-12-23T05:07:50Z

cc @ptillet @suchenzang, should finally be ready and perfs are still good. Turns out randint4x from Triton works fine as long as you don´t request too many numbers

codecov-commenter · 2021-12-23T05:29:11Z

Codecov Report

Merging #164 (3e74a7e) into main (5148844) will decrease coverage by 0.05%.
The diff coverage is 68.00%.

@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
- Coverage   90.61%   90.56%   -0.06%     
==========================================
  Files          56       56              
  Lines        2824     2829       +5     
==========================================
+ Hits         2559     2562       +3     
- Misses        265      267       +2

Flag	Coverage Δ
Python	`90.56% <68.00%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
xformers/triton/dropout.py	`76.00% <68.00%> (-1.15%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5148844...3e74a7e. Read the comment docs.

blefaudeux · 2021-12-26T05:19:03Z

hmm testing on a desktop 3080, the perf for < 2048 buffers is not very good, and it's not just hyper params. The perf above that is a lot better than before, but would be nice to have the best of both worlds...

Edit: fixed with bigger tiles, the accuracy error before was not due to tile size it turns out, but not creating the exact same mask in between FW and BW (in this layer we don't save the mask but only the seeds, the mask is re-created on the fly for each pass. Choosing different scheduling params led to a different use of the seeds, meaning different masks and that's why the training was not as good in the previous PR -and in the layer currently checked in actually-. TLDR is that this is fixed, and the perf is even above the curves displayed in previous comments, while I checked with a full training that the accuracy is maintained

dianaml0

Looks great! :)

xformers/triton/dropout.py

xformers/triton/k_dropout.py

…Triton jit can be a little picky

…pmask generation

* Orthoformer attention Author: Mandela Patrick et al. * only select landmarks amid the queries

flipping the seeds so that it drops down from the top

b0d5f91

using less seeds tiling + vertical seeds Computing the FW and BW per tile over M better scheduling defaults, improves across the board good enough perfs catching the slow case and diverting to pytorch in that case

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 23, 2021

blefaudeux changed the title ~~Dropout bw fusion~~ [feat] Dropout partial bw fusion (second take) Dec 23, 2021

blefaudeux requested a review from dianaml0 December 23, 2021 05:06

Fixing the randomness problem, updating the graphs

9ee09ee

blefaudeux force-pushed the dropout_bw_fusion branch from 13baac5 to 9ee09ee Compare December 23, 2021 05:09

blefaudeux requested review from fmassa and jieru-hu December 23, 2021 22:36

WIP better scheduling, improving perfs all around

b36e8ca

blefaudeux force-pushed the dropout_bw_fusion branch from 4a178b2 to 6103a46 Compare December 26, 2021 22:26

blefaudeux marked this pull request as draft December 26, 2021 22:35

blefaudeux force-pushed the dropout_bw_fusion branch from 6103a46 to 3ffcc2c Compare December 26, 2021 22:36

larger tiles and better speed, full loss curve checked

8b9befa

blefaudeux force-pushed the dropout_bw_fusion branch from 3ffcc2c to 8b9befa Compare December 27, 2021 00:43

blefaudeux marked this pull request as ready for review December 27, 2021 00:44

dianaml0 approved these changes Dec 29, 2021

View reviewed changes

xformers/triton/dropout.py Show resolved Hide resolved

xformers/triton/k_dropout.py Outdated Show resolved Hide resolved

blefaudeux added 2 commits January 2, 2022 21:43

code review, not all factorization options were possible in the end, …

dee2fe8

…Triton jit can be a little picky

tweaking formulations so that they please Triton. Slightly faster dro…

3e74a7e

…pmask generation

blefaudeux force-pushed the dropout_bw_fusion branch from f801c00 to 3e74a7e Compare January 3, 2022 18:23

small cleanup, removing old files and fixing broken links

c40dd64

blefaudeux merged commit 9be9167 into main Jan 3, 2022

blefaudeux deleted the dropout_bw_fusion branch January 3, 2022 19:47

xwhan pushed a commit to xwhan/xformers that referenced this pull request Feb 8, 2022

[feat] Orthoformer attention (facebookresearch#164)

77883ce

* Orthoformer attention Author: Mandela Patrick et al. * only select landmarks amid the queries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Dropout partial bw fusion (second take) #164

[feat] Dropout partial bw fusion (second take) #164

blefaudeux commented Dec 23, 2021 •

edited

Loading

blefaudeux commented Dec 23, 2021

codecov-commenter commented Dec 23, 2021 •

edited

Loading

blefaudeux commented Dec 26, 2021 •

edited

Loading

dianaml0 left a comment

[feat] Dropout partial bw fusion (second take) #164

[feat] Dropout partial bw fusion (second take) #164

Conversation

blefaudeux commented Dec 23, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

blefaudeux commented Dec 23, 2021

codecov-commenter commented Dec 23, 2021 • edited Loading

Codecov Report

blefaudeux commented Dec 26, 2021 • edited Loading

dianaml0 left a comment

Choose a reason for hiding this comment

blefaudeux commented Dec 23, 2021 •

edited

Loading

codecov-commenter commented Dec 23, 2021 •

edited

Loading

blefaudeux commented Dec 26, 2021 •

edited

Loading