Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(train): add optional accumulate_grad_batches config param #306

Merged
merged 2 commits into from
Apr 13, 2023

Conversation

guranon
Copy link
Contributor

@guranon guranon commented Apr 12, 2023

Add an optional accumulate_grad_batches param to the train part of the config to allow for gradient accumulation. (Lightning docs)
This updates the gradients once every accumulate_grad_batches batches, with a default value of 1 to not break any existing configs.

I went with the same name (accumulate_grad_batches) as the argument to Trainer, even though we're not actually using that argument due to doing manual optimization. I realize that might be confusing, so I'm open to suggestions.

I'm relatively new to ML/AI work, so just to double-check my understanding:

  • manual_backward backpropagates and accumulates gradients
  • step updates model parameters
  • zero_grad clears the gradients

So calling manual_backward, step, zero_grad should be equivalent to calling zero_grad, manual_backward, step.

Other considerations:

guranon added 2 commits April 12, 2023 09:06
Add an `accumulate_grad_batches` param to the `train` part
of the config to allow for gradient accumulation.
This updates the gradients once every `accumulate_grad_batches`
batches, with a default value of 1 to not break any existing configs.
@BlueAmulet
Copy link
Collaborator

BlueAmulet commented Apr 12, 2023

One thing to consider is where gradient clipping should occur, my understanding is with gradient accumulation, clip_grad_value_ should also only be called once every N batches. Which on a slight note is probably broken since the training refactor, clip_grad_value_ is no longer after backwards and before optimizer step.

Edit: nevermind, since clip_grad_value_ is only ever called with None, it never actually performs gradient clipping. Still though it should be after .backward to log the gradients properly.

@34j 34j merged commit 1172b23 into voicepaw:main Apr 13, 2023
@34j
Copy link
Collaborator

34j commented Apr 20, 2023

@allcontributors add guranon bug, ideas, code

@allcontributors
Copy link
Contributor

@34j

I've put up a pull request to add @guranon! 🎉

@34j
Copy link
Collaborator

34j commented Apr 20, 2023

@allcontributors add BlueAmulet maintenance

@allcontributors
Copy link
Contributor

@34j

I've put up a pull request to add @BlueAmulet! 🎉

@34j
Copy link
Collaborator

34j commented Apr 20, 2023

@allcontributors add GarrettConway review

@allcontributors
Copy link
Contributor

@34j

I've put up a pull request to add @GarrettConway! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants