TroubleShooting

Huge Precision Loss

Try tweaking the config:

torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False
torch.backends.cuda.matmul.allow_tf32 = False

Compilation Is SO SLOW. How To Improve It?

Dynamic code generation is usually the cause for slow compilation. You could disable features related to it to speed up compilation. But this might slow down your inference.

Disable JIT optimized execution (fusion). This can significantly speed up compilation.

# Wrap your code in this context manager
with torch.jit.optimized_execution(False):
    # Do your things

Or disable it globally.

torch.jit.set_fusion_strategy([('STATIC', 0), ('DYNAMIC', 0)])

Disable Triton (not suggested).

config.enable_triton = False

Inference Is SO SLOW. What's Wrong?

When your GPU VRAM is insufficient or the image resolution is high, CUDA Graph could cause less efficient VRAM utilization and slow down the inference.

config.enable_cuda_graph = False

Triton Does Not Work

Triton might be not working properly because it uses cache to store compiled kernels, especially when you just upgrade stable-fast or triton. You could try to clear the cache to fix it.

rm -rf ~/.triton

Crashes, Invalid Memory Access Or Segmentation Fault

Even in PyTorch's own implementation torch.compile, I have encountered crashes and segmentation faults. It is usually caused by Triton, CUDA Graph or cudaMallocAsync because they are not stable enough. You could try to remove the PYTORCH_CUDA_MALLOC_CONF=backend:cudaMallocAsync environment variable and disable Triton and CUDA Graph to fix it.

config.enable_triton = False
# or
config.enable_cuda_graph = False

Import Error On Windows

ImportError: DLL load failed while importing _C:  The specified module could not be found

Make sure you have installed torch with CUDA support and your installed version is compatible with your Python and CUDA version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

troubleshooting.md

troubleshooting.md

TroubleShooting

Huge Precision Loss

Compilation Is SO SLOW. How To Improve It?

Inference Is SO SLOW. What's Wrong?

Triton Does Not Work

Crashes, Invalid Memory Access Or Segmentation Fault

Import Error On Windows

Files

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

TroubleShooting

Huge Precision Loss

Compilation Is SO SLOW. How To Improve It?

Inference Is SO SLOW. What's Wrong?

Triton Does Not Work

Crashes, Invalid Memory Access Or Segmentation Fault

Import Error On Windows