Skip to content

Latest commit

 

History

History
186 lines (150 loc) · 10 KB

CHANGELOG.md

File metadata and controls

186 lines (150 loc) · 10 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.0.21] - TBD

Fixed

Added

[0.0.20] - 2023-05-23

Improved

  • fMHA/cutlass (backward): Massive performance improvements when batch_size * num_heads is low (10x+)
  • fMHA/cutlass: Further performance improvements for both the forward & backward kernels
  • fMHA (backward): Now dispatching to cutlass when embed_dim>64
  • fMHA: Updated Flash-Attention to v1.0.5

Added

  • fMHA now runs on H100 (support is experimental)

[0.0.19] - 2023-04-28

Added

  • Display nvcc version used to compile xformers in python -m xformers.info

Fixed

  • Fixed performance regression with nvcc>11.6 (facebookresearch#712)
  • fMHA/cutlass: Fixed nan in the output when using a torch.Tensor with -inf prefixes as attn_bias (facebookresearch#722)
  • fMHA/cutlass: Fixed nan in the output when the sequence length is larger than 2 ** 15 (facebookresearch#719)
  • fMHA/cutlass: Significative performance improvements (up to 2x) for both the forward pass and backward pass
  • fMHA/cutlass: The kernel are now deterministic
  • fMHA/cutlass: Fixed backward pass correctness when using dropout (facebookresearch#724)

[0.0.18] - 2023-03-31

Added

  • Added xformers.ops.index_select_cat and xformers.ops.scaled_index_add - those are experimental functions that only work with a few shapes, and can be used to write efficient stochastic depth in transformer architectures for instance

Fixed

  • fMHA: memory_efficient_attention now accepts torch.Tensor as attention bias for any seqlen, although there are still requirements on the alignment of the bias tensor (see facebookresearch#683)

[0.0.17] - 2023-03-28

Fixed

  • fMHA: Fixed BW pass on Sm86/Sm89 GPUs when K > 64 (RTX 3090, RTX 4090, A6000, ..) [facebookresearch#631]

Added

[0.0.16] - 2023-01-31

Fixed

Added

[0.0.15] - Skipped

[0.0.14] - 2022-11-10

Fixed

  • fMHA/CUTLASS: The current CUDA stream is now used by the kernel [facebookresearch#491]
  • fMHA/CUTLASS: Improve overall performance

Added

  • SwiGLU: Added xformers.ops.SwiGLU and its functional counterpart (xformers.ops.swiglu) [facebookresearch#490]
  • fMHA: Possible to combine CUTLASS's forward with flash-attention's backward pass [facebookresearch#469] - improves performance on A100 for K = 128
  • fMHA: Add custom xformers.ops.unbind operator to avoid a cat in the attention block [facebookresearch#458]

[0.0.13] - 2022-09-26

Added

  • fMHA: Added CUTLASS-based kernel for xformers.ops.memory_efficient_attention. This kernel is automatically depending on the inputs, and works on any GPU after P100 [facebookresearch#362]

[0.0.12] - 2022-08-08

Fixed

Added

[0.0.11] - 2022-05-30

Fixed

Added

[0.0.10] - 2022-03-14

Fixed

Added

[0.0.9] - 2022-02-09

Added

Fixed

[0.0.8] - 2022-01-07

Fixed

Added

[0.0.7] - 2021-11-30

Fixed

[0.0.6] - 2021-11-24

Fixed

Added

[0.0.4] - 2021-11-16

Fixed

Added

[0.0.3] - 2021-11-01

Fixed

[0.0.2] - 2021-11-01

Fixed

Added