Skip to content

FAv3, profiler update & AMD

Compare
Choose a tag to compare
@danthe3rd danthe3rd released this 12 Sep 15:49
· 29 commits to main since this release

Pre-built binary wheels require PyTorch 2.4.1

Added

  • Added wheels for cuda 12.4
  • Added conda builds for python 3.11
  • Added wheels for rocm 6.1

Improved

  • Profiler: Fix computation of FLOPS for the attention when using xFormers
  • Profiler: Fix MFU/HFU calculation when multiple dtypes are used
  • Profiler: Trace analysis to compute MFU & HFU is now much faster
  • fMHA/splitK: Fixed nan in the output when using a torch.Tensor bias where a lot of consecutive keys are masked with -inf
  • Update Flash-Attention version to v2.6.3 when building from scratch
  • When using the most recent version of Flash-Attention, it is no longer possible to mix it with the cutlass backend. In other words, it is no longer possible to use the cutlass Fw with the flash Bw.

Removed

  • fMHA: Removed decoder and small_k backends
  • profiler: Removed DetectSlowOpsProfiler profiler
  • Removed compatibility with PyTorch < 2.4
  • Removed conda builds for python 3.9
  • Removed windows pip wheels for cuda 12.1 and 11.8