This repository integrates all the tricks I know to speed up Flux inference:
- Use
TeaCache
orFBCache
orMBCache
; - Skip some unnessasery blocks;
- Compile and quantize model;
- Use fast CuDNN attention kernels;
- Use SageAttention;
- Fix
AttributeError: 'SymInt' object has no attribute 'size'
to speed up recompilation after resolution changing.
MBCache
extends FBCache
and is applied to cache multiple blocks. The codes are modified from SageAttention, ComfyUI-TeaCache, comfyui-flux-accelerator and Comfy-WaveSpeed. More details see above given repositories.
- [2025/1/24] Now support Sana. Get your 1024*1024 images within 2s. All the codes are modified from Sana.
-
Download Sana diffusion model from Model Zoo and put the
.pth
file intomodels/diffusion_models
; -
Download Gemma text encoder from google/gemma-2-2b-it, unsloth/gemma-2b-it-bnb-4bit or Efficient-Large-Model/gemma-2-2b-it and put the whole folder into
models/text_encoders
; -
Download DCAE image decoder from mit-han-lab/dc-ae-f32c32-sana-1.0 and put the
.safetensors
file intomodels/vae
; -
Run the example workflow.