use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

godwenbin · 2025-02-13T02:23:39Z

when i use flux_train.py to train my flux-full-finetune model, i use --optimizer_type adamw8bit & --batch_size 1, this situation always meets OOM.but also, single gpu trainning can use --optimizer_type adamw8bit & --batch_size 8, and the gpu-using size almost 79g. How can i fix the multi-gpu OOM problem? thanks for your replay~

Ice-YY · 2025-02-13T12:32:38Z

Some features for VRAM optimization do not work very well with multi-GPU setups, such as --fused_backward_pass. However, 80GB of VRAM should at least work with a batch size of 1. Have you tried --gradient_checkpointing

godwenbin · 2025-02-14T07:31:36Z

Some features for VRAM optimization do not work very well with multi-GPU setups, such as --fused_backward_pass. However, 80GB of VRAM should at least work with a batch size of 1. Have you tried --gradient_checkpointing

yes，i had used --gradient_checkpointing. Also,I tried different configurations to get it to work but all failed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

godwenbin commented Feb 13, 2025 •

edited

Loading

Ice-YY commented Feb 13, 2025

godwenbin commented Feb 14, 2025 •

edited

Loading

use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

Comments

godwenbin commented Feb 13, 2025 • edited Loading

Ice-YY commented Feb 13, 2025

godwenbin commented Feb 14, 2025 • edited Loading

godwenbin commented Feb 13, 2025 •

edited

Loading

godwenbin commented Feb 14, 2025 •

edited

Loading