Release FAv3, profiler update & AMD · facebookresearch/xformers

Pre-built binary wheels require PyTorch 2.4.1

Profiler: Fix computation of FLOPS for the attention when using xFormers
Profiler: Fix MFU/HFU calculation when multiple dtypes are used
Profiler: Trace analysis to compute MFU & HFU is now much faster
fMHA/splitK: Fixed nan in the output when using a torch.Tensor bias where a lot of consecutive keys are masked with -inf
Update Flash-Attention version to v2.6.3 when building from scratch
When using the most recent version of Flash-Attention, it is no longer possible to mix it with the cutlass backend. In other words, it is no longer possible to use the cutlass Fw with the flash Bw.

Provide feedback