Gradient Accumulation #400

jafioti · 2023-01-25T15:20:10Z

Often times it is desired to train on larger batch sizes than can fit on the GPU / memory at once. Accumulating gradients across mini-batches is a solution, which can effectively simulate larger batch sizes, albeit without the parallelism advantage.

A straightforward way to do this would be to impl Add<Gradients<D>> to Gradients<D> such that gradients on the same device can be added.

I'm not sure how this would be handled, since the grads seem to be stored in a Box, which I don't see how you can add to.

The text was updated successfully, but these errors were encountered:

coreylowman · 2023-01-25T16:44:14Z

I think if we change the underlying storage to Box<dyn std::ops::AddAssign>, we could do this. Probably prefer add assign so we aren't cloning things.

Would need to add implementations of AddAssign for raw device storage

coreylowman · 2023-01-30T20:56:47Z

So I was working on this a bit cause I thought I had a pretty clever solution. Turns out Box<dyn Any> makes this pretty hard due to object safety rules.

Notably you can't have a trait that takes/returns any Self objects. So you can't add AddAssign or something like:

trait AddSelf {
    fn add_self(self, rhs: Self) -> Self;
}

If we can't do this with Gradients object, we may need to add some separate gradient accumulator object that does lazy addition:

struct GradientAccumulator {
    gradients: Vec<Gradients>
}

and then add some abstraction layer for the optimizers to use:

trait HasGradients {
    fn remove<T>(&mut self, t: &T) -> Option<T::Gradient>
    where
        T: HasUniqueId + AllocGrad;
}

impl this for both Gradients and GradientAccumulator, and then change optimizer to accept this trait:

fn update<G: HasGradients>(
        &mut self,
        module: &mut M,
        gradients: G,
    ) -> Result<(), OptimizerUpdateError<D>>;

jafioti · 2023-02-07T22:43:11Z

@coreylowman Why can't there be a trait that accepts / returns Self? That first trait you had looks acceptable to me

coreylowman · 2023-02-08T14:47:26Z

You can't use that kind of object with dyn. I.e. you can't have Box<dyn AddSelf> because the AddSelf trait is not object safe. See https://doc.rust-lang.org/reference/items/traits.html#object-safety

coreylowman · 2023-02-26T18:59:36Z

Another option for this: give an option to pass in an existing Gradients object to .trace().

Pros:

this is possible with minimal modifications
this could help reduce allocations

* #496 #400 optimizer now takes &Gradients * Fixing docstrings

coreylowman mentioned this issue Feb 11, 2023

Parameter Count #435

Closed

nkoppel mentioned this issue Feb 12, 2023

[WIP] Factor out nn impls using TupleLists and the GetTensors trait. #444

Closed

5 tasks

This was referenced Feb 15, 2023

[WIP] Gradient accumulation #458

Closed

0.11.0 release #278

Closed

This was referenced Feb 26, 2023

Ability to zero gradients instead of reallocating for each batch #496

Closed

Tensors the whole way down #508

Merged

Adding axpy tensor op & ModelEMA module walker #511

Merged

coreylowman added a commit that referenced this issue Mar 3, 2023

#496 #400 optimizer now takes &Gradients

f2e1c88

This was referenced Mar 3, 2023

Optimizer now takes &Gradients #515

Merged

Adding tensor.trace_with(grads) #517

Merged

coreylowman added a commit that referenced this issue Mar 4, 2023

Optimizer now takes &Gradients (#515)

74fd9dd

* #496 #400 optimizer now takes &Gradients * Fixing docstrings

coreylowman mentioned this issue Mar 4, 2023

Adding gradient accumulation example #519

Merged

coreylowman closed this as completed in #519 Mar 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Accumulation #400

Gradient Accumulation #400

jafioti commented Jan 25, 2023

coreylowman commented Jan 25, 2023

coreylowman commented Jan 30, 2023

jafioti commented Feb 7, 2023

coreylowman commented Feb 8, 2023

coreylowman commented Feb 26, 2023 •

edited

Loading

Gradient Accumulation #400

Gradient Accumulation #400

Comments

jafioti commented Jan 25, 2023

coreylowman commented Jan 25, 2023

coreylowman commented Jan 30, 2023

jafioti commented Feb 7, 2023

coreylowman commented Feb 8, 2023

coreylowman commented Feb 26, 2023 • edited Loading

coreylowman commented Feb 26, 2023 •

edited

Loading