Question regarding the gradient accumulation #7

lehduong · 2023-10-18T20:43:01Z

Hi, thanks for your implementation.

I noticed you accumulated gradients of two models with two different context managers (here). Could you let me know if you verified your implementation with gradient accumulation step different than 1? Apparently, this approach can be erroneous according to this and followed-up comments. I believe the newer version of hf's accelerate has already allowed the context manager to receive multiple models as in here.

kxhit · 2023-10-19T10:03:33Z

Hi @lehduong thanks for pointing this out! Yes, in Zero123, the gradient accumulation is 1 so this doesn't matter. And I know people are trying to fix it for more than one model. According to your link, the correct way to do it for multiple models is
with accelerator.accumulate(model1, model2)? Would be very happy if you want to PR! Thank you!

lehduong · 2023-10-20T21:23:35Z

Yes, you only need to change this line to with accelerator.accumulate(unet, cc_projection) and use accelerate >= 0.23 (not sure if it is the earliest accelerate version that supports this feature but I'm using it). I did a quick experiment and I attached the image below to compare the training loss of two accumulation approaches. I used the resolution of 512, (per device) batch size of 24, and the gradient_accumulation_steps is set to 8 (effective batch size is 1536 as in the original implementation).

kxhit · 2023-10-21T15:15:39Z

Thanks a lot for your contribution! Just pushed a fix, please reopen the issue if anything needs further change!

cfeng16 · 2023-10-29T02:18:25Z

Hi @lehduong , can I ask do you use gradient accumulation for multiple models in distributed training setting (say multiple GPUs)?

lehduong · 2023-10-29T06:16:42Z

I trained the model on a single machine (8gpus).

kxhit added a commit that referenced this issue Oct 21, 2023

fix gradient accumulation (issue #7)

b54d4cb

kxhit closed this as completed Oct 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the gradient accumulation #7

Question regarding the gradient accumulation #7

lehduong commented Oct 18, 2023 •

edited

Loading

kxhit commented Oct 19, 2023

lehduong commented Oct 20, 2023 •

edited

Loading

kxhit commented Oct 21, 2023

cfeng16 commented Oct 29, 2023

lehduong commented Oct 29, 2023 •

edited

Loading

Question regarding the gradient accumulation #7

Question regarding the gradient accumulation #7

Comments

lehduong commented Oct 18, 2023 • edited Loading

kxhit commented Oct 19, 2023

lehduong commented Oct 20, 2023 • edited Loading

kxhit commented Oct 21, 2023

cfeng16 commented Oct 29, 2023

lehduong commented Oct 29, 2023 • edited Loading

lehduong commented Oct 18, 2023 •

edited

Loading

lehduong commented Oct 20, 2023 •

edited

Loading

lehduong commented Oct 29, 2023 •

edited

Loading