Replies: 1 comment
-
use the HuggingFace ControlNet Training script which has more optimization builtin. I wrote an article about controlnet training based on this script here https://civitai.com/articles/2078 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@lllyasviel
I saw you said The training on circle dataset is fast. After 4000 steps (batch size 4, learning rate 1e-5, about 50 minutes on A100 PCIE 40G), you converged.
That's around 1.33 steps/sec.
I tried running the same program on RunPod using either A40, A6000, or A100 GPU and the speed is much lower (0.55-0.7 steps/sec).
I also installed xformers (and triton) but got an error like in #218. He suggested to try float16 but #265 (comment) said that SD doesn't work that well with float16.
I tried float16 with xformers and the iteration speed becomes 3x faster (1.5 steps/sec) but training doesn't converge. It's the same issue mentioned in #265.
I have to eventually uninstall xformers, use float32, and tolerate 0.55-0.7 steps/sec speed. My problem is that I'm not able to replicate the training speed you have using the same GPU (A100) and it confuses me.
I wonder if you used xformers (and/or triton) package to help accelerate training. Does environment.yaml fully list the packages used to train the model?
Beta Was this translation helpful? Give feedback.
All reactions