feat(train): add optional `accumulate_grad_batches` config param #306

guranon · 2023-04-12T16:24:24Z

Add an optional accumulate_grad_batches param to the train part of the config to allow for gradient accumulation. (Lightning docs)
This updates the gradients once every accumulate_grad_batches batches, with a default value of 1 to not break any existing configs.

I went with the same name (accumulate_grad_batches) as the argument to Trainer, even though we're not actually using that argument due to doing manual optimization. I realize that might be confusing, so I'm open to suggestions.

I'm relatively new to ML/AI work, so just to double-check my understanding:

manual_backward backpropagates and accumulates gradients
step updates model parameters
zero_grad clears the gradients

So calling manual_backward, step, zero_grad should be equivalent to calling zero_grad, manual_backward, step.

Other considerations:

Normalize/scale loss by 1 / accumulation steps
- I looked at NVIDIA's HiFi-GAN for PyTorch example and they normalize the loss, so I'm pretty sure this is correct
We don't need to set retain_graph=True because we backpropagate once for each forward pass

Add an `accumulate_grad_batches` param to the `train` part of the config to allow for gradient accumulation. This updates the gradients once every `accumulate_grad_batches` batches, with a default value of 1 to not break any existing configs.

BlueAmulet · 2023-04-12T17:05:35Z

One thing to consider is where gradient clipping should occur, my understanding is with gradient accumulation, clip_grad_value_ should also only be called once every N batches. Which on a slight note is probably broken since the training refactor, clip_grad_value_ is no longer after backwards and before optimizer step.

Edit: nevermind, since clip_grad_value_ is only ever called with None, it never actually performs gradient clipping. Still though it should be after .backward to log the gradients properly.

34j · 2023-04-20T14:11:44Z

@allcontributors add guranon bug, ideas, code

allcontributors · 2023-04-20T14:11:53Z

@34j

I've put up a pull request to add @guranon! 🎉

34j · 2023-04-20T14:13:42Z

@allcontributors add BlueAmulet maintenance

allcontributors · 2023-04-20T14:13:50Z

@34j

I've put up a pull request to add @BlueAmulet! 🎉

34j · 2023-04-20T14:14:30Z

@allcontributors add GarrettConway review

allcontributors · 2023-04-20T14:14:39Z

@34j

I've put up a pull request to add @GarrettConway! 🎉

guranon added 2 commits April 12, 2023 09:06

fix(train): normalize loss when using gradient accumulation

e8bbac4

34j merged commit 1172b23 into voicepaw:main Apr 13, 2023

allcontributors bot mentioned this pull request Apr 20, 2023

docs: add guranon as a contributor for bug, ideas, and code #433

Merged

allcontributors bot mentioned this pull request Apr 20, 2023

docs: add BlueAmulet as a contributor for maintenance #434

Merged

allcontributors bot mentioned this pull request Apr 20, 2023

docs: add GarrettConway as a contributor for review #435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(train): add optional `accumulate_grad_batches` config param #306

feat(train): add optional `accumulate_grad_batches` config param #306

guranon commented Apr 12, 2023 •

edited

Loading

BlueAmulet commented Apr 12, 2023 •

edited

Loading

34j commented Apr 20, 2023

allcontributors bot commented Apr 20, 2023

34j commented Apr 20, 2023

allcontributors bot commented Apr 20, 2023

34j commented Apr 20, 2023

allcontributors bot commented Apr 20, 2023

feat(train): add optional accumulate_grad_batches config param #306

feat(train): add optional accumulate_grad_batches config param #306

Conversation

guranon commented Apr 12, 2023 • edited Loading

BlueAmulet commented Apr 12, 2023 • edited Loading

34j commented Apr 20, 2023

allcontributors bot commented Apr 20, 2023

34j commented Apr 20, 2023

allcontributors bot commented Apr 20, 2023

34j commented Apr 20, 2023

allcontributors bot commented Apr 20, 2023

feat(train): add optional `accumulate_grad_batches` config param #306

feat(train): add optional `accumulate_grad_batches` config param #306

guranon commented Apr 12, 2023 •

edited

Loading

BlueAmulet commented Apr 12, 2023 •

edited

Loading