Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cublas sgemm parallelization #492

Closed
coreylowman opened this issue Feb 25, 2023 · 2 comments · Fixed by #610
Closed

cublas sgemm parallelization #492

coreylowman opened this issue Feb 25, 2023 · 2 comments · Fixed by #610
Labels
gpu Related to GPU support optimization

Comments

@coreylowman
Copy link
Owner

Use cudarc::driver::result::stream api and cublas::result::set_stream to parallelize sgemm operations for conv2d and 4d batched matmul

@coreylowman coreylowman added gpu Related to GPU support optimization labels Feb 25, 2023
@coreylowman
Copy link
Owner Author

coreylowman commented Feb 25, 2023

Might be useful to create some Stream object that is easy to create & forces null stream to wait for it to finish before continuing. Could add to cudarc

@coreylowman
Copy link
Owner Author

The latest cudarc added a CudaStream object that we can use for this:

let stream = self.dev.auto_joining_stream()?;

self.blas.set_stream(Some(&stream))?;
// call kernel
self.blas.set_stream(None)?;

self.dev.join_async(stream)?; // or you can just `drop(stream)`;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Related to GPU support optimization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant