Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Adapt.adapt_structure method for Optimisers.Leaf #180

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

vpuri3
Copy link

@vpuri3 vpuri3 commented Oct 3, 2024

Fix #179

PR Checklist

  • Tests are added
  • Documentation, if applicable

@vpuri3 vpuri3 changed the title Add Adapt.adapt_storage method for Optimisers.Leaf Add Adapt.adapt_structure method for Optimisers.Leaf Oct 3, 2024
vpuri3 added a commit to vpuri3/NeuralROMs.jl that referenced this pull request Oct 3, 2024
Project.toml Outdated Show resolved Hide resolved
@mcabbott
Copy link
Member

mcabbott commented Oct 3, 2024

I think this won't preserve identity? That's the big difference between Functors and Adapt

@vpuri3
Copy link
Author

vpuri3 commented Oct 3, 2024

@mcabbott can you explain? I don't know what identity is.

@CarloLucibello addressing your comment from #179

Since Leaf is a functor, it will move to GPU when using Flux.gpu or MLDataDevices.jl.

That is true, the MWE works with MLDataDevices. However, we still need Adapt functionality. Consider the case when Leaf is stored as part of a struct. Then using MLDataDevice.gpu_device doesn't move the state to the GPU even if we have Adapt.adapt_structure defined for the object.

using Optimisers, CUDA, LuxCUDA, MLDataDevices, Adapt

struct TrainState{Tp, To}
  p::Tp
  opt_st::To
end

Adapt.@adapt_structure TrainState

p = rand(2)
opt_st = Optimisers.setup(Optimisers.Adam(), p)
ts = TrainState(p, opt_st)
device = gpu_device()
device(ts).opt_st.state[1]

2-element Vector{Float64}:
 0.0
 0.0

So there is a need to define Adapt.adapt_structure for Leaf.

@mcabbott
Copy link
Member

mcabbott commented Oct 3, 2024

Functors keeps an IdDict so that if the same array appears twice, this property is preserved by fmap. Optimisers.jl follows that too, and will use (and expect, IIRC) the same Leaf in such cases. So I don't see an easy way to male cu work for this.

@vpuri3
Copy link
Author

vpuri3 commented Oct 3, 2024

Would something like this solve the problem?

function Adapt.adapt_storage(to, leaf::Leaf)
    return fmap(x -> Adapt.adapt_storage(to, x), leaf)
end

@ToucheSir
Copy link
Member

It would not, because the IdDict needs to be shared between Leafs. This is why Flux.gpu is a standalone function right now, FWIW.

@vpuri3
Copy link
Author

vpuri3 commented Oct 3, 2024

It would not, because the IdDict needs to be shared between Leafs. This is why Flux.gpu is a standalone function right now, FWIW.

Maybe it's possible to grab the IdDict and bind it to the new Leaf object? Where is it defined?

BTW the fix in this PR Adapt.@adapt_storage Optimisers.Leaf is working fine in my training runs.

@ToucheSir
Copy link
Member

ToucheSir commented Oct 4, 2024

BTW the fix in this PR Adapt.@adapt_storage Optimisers.Leaf is working fine in my training runs.

That's because your model doesn't have any shared/"tied" parameters. e.g. model.layer1.W === model.layer2.W. Which is fine, but libraries like Optimisers have to support all use cases.

Maybe it's possible to grab the IdDict and bind it to the new Leaf object? Where is it defined?

It's created at the top level in Functors.fmap and threaded down through the state tree. I'm not sure what it means to "grab the IdDict" in the context of overriding adapt_structure. Flux and MLDataDevices only ever call adapt using fmap, and adapt doesn't take a cache argument.

…handled by functors. So we add a warning referring the user to Flux.gpu or MLDataDevices.gpu_device()
@vpuri3
Copy link
Author

vpuri3 commented Oct 4, 2024

@ToucheSir, thanks for explaining. I added a warning to the adapt_structure method that points the user to Flux.gpu, and moved it all to an extension. Now cu(opt_st) won't silently not do what the user expects it to do.

The behavior is as follows:

julia> using Optimisers, CUDA, LuxCUDA

julia> opt_st = Optimisers.setup(Optimisers.Adam(), zeros(2))
Leaf(Adam(0.001, (0.9, 0.999), 1.0e-8), ([0.0, 0.0], [0.0, 0.0], (0.9, 0.999)))

julia> cu(opt_st)
┌ Warning: `Optimisers.Leaf` object does not support device transfer via
│ `Adapt.jl`. Avoid this by calling `Flux.gpu/cpu` or
│ `MLDataDevices.cpu_device()/gpu_device()` on the optimiser state object.
│ See below GitHub issue for more details.
│ https://github.com/FluxML/Optimisers.jl/issues/179 
└ @ OptimisersAdaptExt ~/.julia/dev/Optimisers.jl/ext/OptimisersAdaptExt.jl:7
Leaf(Adam(0.001, (0.9, 0.999), 1.0e-8), (Float32[0.0, 0.0], Float32[0.0, 0.0], (0.9, 0.999)))

julia> cu(opt_st).state[1] |> typeof
CuArray{Float32, 1, CUDA.DeviceMemory}

julia> using MLDataDevices

julia> gpu_device()(opt_st)
Leaf(Adam(0.001, (0.9, 0.999), 1.0e-8), (Float32[0.0, 0.0], Float32[0.0, 0.0], (0.9, 0.999)))

julia> gpu_device()(opt_st).state[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimiser state a not moving to GPU
4 participants