Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
minor bug fix in warp synchronous code (#7029)
Browse files Browse the repository at this point in the history
  • Loading branch information
stefanhenneking authored and piiswrong committed Jul 13, 2017
1 parent fa2c0a3 commit b5615f5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/operator/tensor/dot-inl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ struct DotCsrDnsDnsVectorKernel {
for (int j = low+lane; j < high; j+=32) {
sum += data_l[j] * data_r[col_idx_l[j]*num_cols_r + kcol];
}
vals[threadIdx.x] = sum;
vals[threadIdx.x] = sum; __syncwarp();

// Parallel reduction in shared memory
if (lane < 16) {vals[threadIdx.x] += vals[threadIdx.x+16];} __syncwarp();
Expand Down

0 comments on commit b5615f5

Please sign in to comment.