Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cublas<t>matinvBatched() for N <= 32 #739

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

tbensonatl
Copy link
Collaborator

Use the cublasmatinvBatched() family of functions to invert linear systems of size N <= 32. This has two advantages over the more general pair of getrfBatched() and getriBatched() functions:

  1. Higher performance with the single kernel than with split kernels.
  2. The matinv functions support in-place transforms and do not modify the input in the case of out-of-place transforms, so we do not need a temporary input work buffer if the input is a tensor view.

Use the cublas<t>matinvBatched() family of functions to invert linear systems
of size N <= 32. This has two advantages over the more general pair of
getrfBatched() and getriBatched() functions:

1. Higher performance with the single kernel than with split kernels.
2. The matinv functions support in-place transforms and do not modify the
input in the case of out-of-place transforms, so we do not need a temporary
input work buffer if the input is a tensor view.
@tbensonatl tbensonatl self-assigned this Aug 27, 2024
@tbensonatl
Copy link
Collaborator Author

/build

@coveralls
Copy link

Coverage Status

coverage: 93.386% (-0.02%) from 93.406%
when pulling c6ae9fa on optimize-inv-operator-for-small-systems
into 77f2901 on main.

@cliffburdick cliffburdick merged commit d9053d6 into main Aug 27, 2024
1 check passed
@cliffburdick cliffburdick deleted the optimize-inv-operator-for-small-systems branch August 27, 2024 18:29
@cliffburdick
Copy link
Collaborator

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants