Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

Open
godwenbin opened this issue Feb 13, 2025 · 2 comments
Open

use multigpu(8*A800 80G) train flux_train.py OOM problem #1930

godwenbin opened this issue Feb 13, 2025 · 2 comments

Comments

@godwenbin
Copy link

godwenbin commented Feb 13, 2025

when i use flux_train.py to train my flux-full-finetune model, i use --optimizer_type adamw8bit & --batch_size 1, this situation always meets OOM.but also, single gpu trainning can use --optimizer_type adamw8bit & --batch_size 8, and the gpu-using size almost 79g. How can i fix the multi-gpu OOM problem? thanks for your replay~

@Ice-YY
Copy link

Ice-YY commented Feb 13, 2025

Some features for VRAM optimization do not work very well with multi-GPU setups, such as --fused_backward_pass. However, 80GB of VRAM should at least work with a batch size of 1. Have you tried --gradient_checkpointing

@godwenbin
Copy link
Author

godwenbin commented Feb 14, 2025

Some features for VRAM optimization do not work very well with multi-GPU setups, such as --fused_backward_pass. However, 80GB of VRAM should at least work with a batch size of 1. Have you tried --gradient_checkpointing

yes,i had used --gradient_checkpointing. Also,I tried different configurations to get it to work but all failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants