Improve build time by ~30% #539

danthe3rd · 2022-11-24T20:30:15Z

Stack from ghstack (oldest at bottom):

... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels

13mn -> 9mn on Sm61 (CI, does not build flash)

[ghstack-poisoned]

ghstack-source-id: 504b6d0a01cf4df9daa82264544e01da466f83a3 Pull Request resolved: #539

[ghstack-poisoned]

ghstack-source-id: 4b2445fe3c83eef3282643862ed83cef85dc5997 Pull Request resolved: #539

[ghstack-poisoned]

ghstack-source-id: eee2e10144ca54bb23e34d6ff5988faa3308c08d Pull Request resolved: #539

[ghstack-poisoned]

ghstack-source-id: bb993fcd50536e112fcb478e7128d76da3b1f195 Pull Request resolved: #539

codecov-commenter · 2022-11-24T21:57:40Z

Codecov Report

Base: 89.79% // Head: 89.79% // No change to project coverage 👍

Coverage data is based on head (dfd494a) compared to base (dfd494a).
Patch has no changes to coverable lines.

❗ Current head dfd494a differs from pull request most recent head c7f5164. Consider uploading reports for the commit c7f5164 to get more accurate results

Additional details and impacted files

@@                  Coverage Diff                  @@
##           gh/danthe3rd/56/base     #539   +/-   ##
=====================================================
  Coverage                 89.79%   89.79%           
=====================================================
  Files                        80       80           
  Lines                      4839     4839           
=====================================================
  Hits                       4345     4345           
  Misses                      494      494

Flag	Coverage Δ
Python	`89.79% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]

fmassa

LGTM, thanks!

fmassa · 2022-11-25T15:27:26Z

xformers/components/attention/csrc/cuda/mem_eff_attention/mma_from_smem.h

@@ -384,7 +384,7 @@ class MmaPipelinedFromSharedMemory : public MmaBaseFromSharedMemory<
  // but not supported as it worsens perf: older gpus < sm80 don't
  // support async tranfers and have to waste registers
  CUTLASS_DEVICE
-  bool set_prologue_done(bool value) {}
+  void set_prologue_done(bool value) {}


Those changes are unrelated, right?

fmassa · 2022-11-25T15:27:55Z

xformers/components/attention/csrc/cuda/mem_eff_attention/kernel_backward.h

@@ -676,18 +671,19 @@ struct AttentionBackwardKernel {
    }
  };

-  static void __host__ check_supported(Params const& p) {
+  static bool __host__ check_supported(Params const& p) {


This change is also unrelated, right?

... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]

ghstack-source-id: a083a9494486298191eea001ff480a82af6966c7 Pull Request resolved: #539

Improve build time

c021839

[ghstack-poisoned]

danthe3rd pushed a commit that referenced this pull request Nov 24, 2022

Improve build time

adade41

ghstack-source-id: 504b6d0a01cf4df9daa82264544e01da466f83a3 Pull Request resolved: #539

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2022

Update on "Improve build time"

c8c11fa

[ghstack-poisoned]

danthe3rd pushed a commit that referenced this pull request Nov 24, 2022

Improve build time

1e679df

ghstack-source-id: 4b2445fe3c83eef3282643862ed83cef85dc5997 Pull Request resolved: #539

Update on "Improve build time"

167584a

[ghstack-poisoned]

danthe3rd pushed a commit that referenced this pull request Nov 24, 2022

Improve build time

ba07d33

ghstack-source-id: eee2e10144ca54bb23e34d6ff5988faa3308c08d Pull Request resolved: #539

Update on "Improve build time"

9fcb9bf

[ghstack-poisoned]

danthe3rd pushed a commit that referenced this pull request Nov 24, 2022

Improve build time

2707c23

ghstack-source-id: bb993fcd50536e112fcb478e7128d76da3b1f195 Pull Request resolved: #539

danthe3rd requested a review from fmassa November 24, 2022 22:03

danthe3rd marked this pull request as ready for review November 24, 2022 22:03

danthe3rd changed the title ~~Improve build time~~ Improve build time by ~30% Nov 24, 2022

Update on "Improve build time by ~30%"

80bd22d

... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]

danthe3rd mentioned this pull request Nov 25, 2022

MmaFromSmem[A100]: Accept transposed operand A #540

Merged

fmassa approved these changes Nov 25, 2022

View reviewed changes

Update on "Improve build time by ~30%"

c7f5164

... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]

danthe3rd mentioned this pull request Nov 28, 2022

MemEff: Accumulate in f32 for bw #467

Merged

Update on "Improve build time by ~30%"

928b0df

... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]

danthe3rd mentioned this pull request Nov 29, 2022

Reduce binary size - no debug info for binaries #549

Merged

danthe3rd merged commit 928b0df into gh/danthe3rd/56/base Nov 29, 2022

danthe3rd pushed a commit that referenced this pull request Nov 29, 2022

Improve build time

0bad001

ghstack-source-id: a083a9494486298191eea001ff480a82af6966c7 Pull Request resolved: #539

danthe3rd deleted the gh/danthe3rd/56/head branch November 29, 2022 17:34

danthe3rd mentioned this pull request Nov 30, 2022

Use a newer version of flash attention that compiles faster #550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve build time by ~30% #539

Improve build time by ~30% #539

danthe3rd commented Nov 24, 2022 •

edited

Loading

codecov-commenter commented Nov 24, 2022 •

edited

Loading

fmassa left a comment

fmassa Nov 25, 2022

fmassa Nov 25, 2022

Improve build time by ~30% #539

Improve build time by ~30% #539

Conversation

danthe3rd commented Nov 24, 2022 • edited Loading

codecov-commenter commented Nov 24, 2022 • edited Loading

Codecov Report

fmassa left a comment

Choose a reason for hiding this comment

fmassa Nov 25, 2022

Choose a reason for hiding this comment

fmassa Nov 25, 2022

Choose a reason for hiding this comment

danthe3rd commented Nov 24, 2022 •

edited

Loading

codecov-commenter commented Nov 24, 2022 •

edited

Loading