Skip to content

Commit

Permalink
csrsm bugfix (#475) (#325)
Browse files Browse the repository at this point in the history
* csrsm bugfix

* same fix for bsrsm
  • Loading branch information
ntrost57 authored Jun 20, 2023
1 parent 8252a37 commit 690915f
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 15 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Full documentation for rocSPARSE is available at [rocsparse.readthedocs.io](http
## rocSPARSE 2.5.2 for ROCm 5.6.0
### Improved
- Fixed a memory leak in csritsv
- Fixed a bug in csrsm and bsrsm

## rocSPARSE 2.5.1 for ROCm 5.5.0
### Added
Expand Down
24 changes: 12 additions & 12 deletions library/src/level3/bsrsm_device_large.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,12 @@ __launch_bounds__(BLOCKSIZE) ROCSPARSE_KERNEL
}
}

// Wait for spin looping thread to finish as the whole block depends on this row
__syncthreads();

// Make sure updated X is visible globally
__threadfence();

// Wait for spin looping thread to finish as the whole block depends on this row
__syncthreads();

// Local sum computation

// Do not run out of bounds
Expand Down Expand Up @@ -170,12 +170,12 @@ __launch_bounds__(BLOCKSIZE) ROCSPARSE_KERNEL
}
}

// Wait for all threads to finish writing into global memory before we mark the row "done"
__syncthreads();

// Make sure X is written to global memory before setting row is done flag
__threadfence();

// Wait for all threads to finish the threadfence before we mark the row "done"
__syncthreads();

if(row < mb && threadIdx.x == 0)
{
// Write "row is done" flag
Expand Down Expand Up @@ -267,12 +267,12 @@ __launch_bounds__(BLOCKSIZE) ROCSPARSE_KERNEL
}
}

// Wait for spin looping thread to finish as the whole block depends on this row
__syncthreads();

// Make sure updated X is visible globally
__threadfence();

// Wait for spin looping thread to finish as the whole block depends on this row
__syncthreads();

// Local sum computation

// Do not run out of bounds
Expand Down Expand Up @@ -332,12 +332,12 @@ __launch_bounds__(BLOCKSIZE) ROCSPARSE_KERNEL
}
}

// Wait for all threads to finish writing into global memory before we mark the row "done"
__syncthreads();

// Make sure X is written to global memory before setting row is done flag
__threadfence();

// Wait for all threads to finish the threadfence before we mark the row "done"
__syncthreads();

if(row < mb && threadIdx.x == 0)
{
// Write "row is done" flag
Expand Down
6 changes: 3 additions & 3 deletions library/src/level3/csrsm_device.h
Original file line number Diff line number Diff line change
Expand Up @@ -231,12 +231,12 @@ __device__ void csrsm_device(rocsparse_operation transB,
B[idx_B] = local_sum;
}

// Wait for all threads to finish writing into global memory before we mark the row "done"
__syncthreads();

// Make sure B is written to global memory before setting row is done flag
__threadfence();

// Wait for all threads to finish the threadfence before we mark the row "done"
__syncthreads();

if(hipThreadIdx_x == 0)
{
// Write the "row is done" flag
Expand Down

0 comments on commit 690915f

Please sign in to comment.