You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CuTe, a new core library and backend for CUTLASS 3.0 that defines a single Layout vocabulary type and an associated algebra of layouts for a much more expressive and composable abstraction for tensors, sets of parallel agents, and operations by said agents on tensors.
Updates to Functionality which directs users on which kernels are supported via CUTLASS-2 and CUTLASS-3.
Updates to Compatibility Section regarding supported compilers, operating systems, CUDA Toolkits, Hardware Architectures and Target Architecture.
New warp-specialized GEMM kernel schedules and mainloops targeting Hopper architecture that achieve great performance with TMA, WGMMA, and threadblock clusters.
Extensions to CUTLASS profiler to support threadblock cluster shapes in library and profiler tile configurations.
CUTLASS library integration for 3.x API kernels built through the new CollectiveBuilder API, enabling CUTLASS profiler.