-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve build time by ~30% #539
Conversation
[ghstack-poisoned]
ghstack-source-id: 504b6d0a01cf4df9daa82264544e01da466f83a3 Pull Request resolved: #539
[ghstack-poisoned]
ghstack-source-id: 4b2445fe3c83eef3282643862ed83cef85dc5997 Pull Request resolved: #539
[ghstack-poisoned]
ghstack-source-id: eee2e10144ca54bb23e34d6ff5988faa3308c08d Pull Request resolved: #539
[ghstack-poisoned]
ghstack-source-id: bb993fcd50536e112fcb478e7128d76da3b1f195 Pull Request resolved: #539
Codecov ReportBase: 89.79% // Head: 89.79% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## gh/danthe3rd/56/base #539 +/- ##
=====================================================
Coverage 89.79% 89.79%
=====================================================
Files 80 80
Lines 4839 4839
=====================================================
Hits 4345 4345
Misses 494 494
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -384,7 +384,7 @@ class MmaPipelinedFromSharedMemory : public MmaBaseFromSharedMemory< | |||
// but not supported as it worsens perf: older gpus < sm80 don't | |||
// support async tranfers and have to waste registers | |||
CUTLASS_DEVICE | |||
bool set_prologue_done(bool value) {} | |||
void set_prologue_done(bool value) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes are unrelated, right?
@@ -676,18 +671,19 @@ struct AttentionBackwardKernel { | |||
} | |||
}; | |||
|
|||
static void __host__ check_supported(Params const& p) { | |||
static bool __host__ check_supported(Params const& p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is also unrelated, right?
... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]
... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels 13mn -> 9mn on Sm61 (CI, does not build flash) [ghstack-poisoned]
ghstack-source-id: a083a9494486298191eea001ff480a82af6966c7 Pull Request resolved: #539
Stack from ghstack (oldest at bottom):
... by reducing the number of ATen imports, and skipping them altogether when building the actual kernels
13mn -> 9mn on Sm61 (CI, does not build flash)