Skip to content

CUTLASS 1.0.1

Compare
Choose a tag to compare
@kerrmudgeon kerrmudgeon released this 26 Jun 21:00
cf0301e

CUTLASS 1.0.1.

Intra-threadblock reduction added for small threadblock tile sizes

  • sgemm_64x128x16, sgemm_128x128x16, sgemm_128x64x16, sgemm_128x32x16, sgemm_64x64x16, sgemm_64x32x16
  • igemm_32x32x128
  • GEMM K residue handled during prologue prior to mainloop

Replaced Google Test copy with submodule. Use git submodule init