cublas sgemm parallelization #492

coreylowman · 2023-02-25T16:54:57Z

Use cudarc::driver::result::stream api and cublas::result::set_stream to parallelize sgemm operations for conv2d and 4d batched matmul

The text was updated successfully, but these errors were encountered:

coreylowman · 2023-02-25T17:06:32Z

Might be useful to create some Stream object that is easy to create & forces null stream to wait for it to finish before continuing. Could add to cudarc

coreylowman · 2023-02-27T14:35:28Z

The latest cudarc added a CudaStream object that we can use for this:

let stream = self.dev.auto_joining_stream()?;

self.blas.set_stream(Some(&stream))?;
// call kernel
self.blas.set_stream(None)?;

self.dev.join_async(stream)?; // or you can just `drop(stream)`;

coreylowman added gpu Related to GPU support optimization labels Feb 25, 2023

This was referenced Feb 25, 2023

Should binary ops invoke two separate kernels for backward? #490

Closed

bump cudarc version #498

Merged

coreylowman mentioned this issue Mar 16, 2023

Next release tracking issue #577

Closed

11 tasks

coreylowman mentioned this issue Mar 24, 2023

Using multiple streams for matmul with cuda #610

Merged

coreylowman closed this as completed in #610 Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cublas sgemm parallelization #492

cublas sgemm parallelization #492

coreylowman commented Feb 25, 2023

coreylowman commented Feb 25, 2023 •

edited

Loading

coreylowman commented Feb 27, 2023

cublas sgemm parallelization #492

cublas sgemm parallelization #492

Comments

coreylowman commented Feb 25, 2023

coreylowman commented Feb 25, 2023 • edited Loading

coreylowman commented Feb 27, 2023

coreylowman commented Feb 25, 2023 •

edited

Loading