ComfyUI Flash Attention Triton

A ComfyUI node that allows you to select Flash Attention Triton implementation as sampling attention.

This implementation is approximately 20% slower than sub-quadratic attention on tested hardware, but uses less VRAM.

Performance Comparison

Testing on a RX 6700 XT, generating a 1024x1024 image with FLUX.1-dev q4_K-S:

Attention Method	VRAM Usage	Speed (s/it)
Flash Attention Triton	8.2 GB	28.60
Sub-Quadratic	9.4 GB	23.23

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
COPYING		COPYING
COPYING.LESSER		COPYING.LESSER
README.md		README.md
__init__.py		__init__.py
fused_attention.py		fused_attention.py