Fix Conv transfer to AMDGPU #2235

pxl-th · 2023-04-21T20:40:38Z

Previously only this flipped conv weights:

Conv((3, 3), 3 => 3) |> gpu

but not this (or anything more complex):

Chain(Conv((3, 3), 3 => 3)) |> gpu

To fix this, I modified exclude function passed to fmap both for CPU -> GPU & GPU -> CPU transfer (so that weights are flipped back).

Update AMDGPU compat to latest to include MIOpen synchronization fix: Use TLS library state for MIOpen JuliaGPU/AMDGPU.jl#415

PR Checklist

Tests are added
Entry in NEWS.md
Documentation, if applicable

src/functor.jl

ext/AMDGPUExt/functor.jl

ToucheSir

A couple of small touch-ups, but otherwise LGTM

ToucheSir · 2023-04-23T21:11:23Z

ext/AMDGPUExt/functor.jl


-_conv_basetype(c::Type{C}) where C <: Conv = Conv
-_conv_basetype(c::Type{C}) where C <: ConvTranspose = ConvTranspose
+Flux._isleaf(::AMD_CONV) = return true


Suggested change

Flux._isleaf(::AMD_CONV) = return true

Flux._isleaf(::AMD_CONV) = true

ToucheSir · 2023-04-23T21:11:58Z

ext/AMDGPUExt/functor.jl


-_amd(m::Union{Conv, ConvTranspose}) = adapt_storage(FluxAMDAdaptor(), m)
+Adapt.adapt_structure(to::FluxAMDAdaptor, m::AMD_CONV) = return m


Suggested change

Adapt.adapt_structure(to::FluxAMDAdaptor, m::AMD_CONV) = return m

Adapt.adapt_structure(to::FluxAMDAdaptor, m::AMD_CONV) = m

ToucheSir · 2023-04-23T21:14:51Z

ext/AMDGPUExt/functor.jl

+_conv_basetype(c::C) where C <: Conv = Conv
+_conv_basetype(c::C) where C <: ConvTranspose = ConvTranspose


Suggested change

_conv_basetype(c::C) where C <: Conv = Conv

_conv_basetype(c::C) where C <: ConvTranspose = ConvTranspose

_conv_basetype(::Conv) = Conv

_conv_basetype(::ConvTranspose) = ConvTranspose

pxl-th · 2023-04-23T21:24:10Z

Done! Once this is merged, can we also tag a release?
I'd like to specify a compat bound for one of the packages.

ToucheSir · 2023-04-23T22:21:34Z

I want to see if the Julia v1 CUDA failures on this PR propagate to the main branch. They shouldn't be caused by this test suite because it never runs on CI (we should add that), so it'd be good to figure out what's going on before tagging.

pxl-th · 2023-04-24T05:39:02Z

The one with Float16 or the one at the very end?

ToucheSir · 2023-04-24T05:42:13Z

The Float16 one. I'm not sure why it's seemingly spread from just the 1.6 job to the 1.x one as well...

pxl-th · 2023-04-24T11:01:45Z

This could be some kind of synchronization issue probably unrelated to Flux.
Running tests multiple times either fails or finishes successfully (mostly successfully)...

ToucheSir · 2023-04-24T14:12:10Z

You mean running the CUDA tests locally works? That's odd, I wonder why it's failing consistently on CI then. CUDNN_STATUS_BAD_PARAM is an input validation error, so I would have expected it to be deterministic.

pxl-th · 2023-04-24T15:03:15Z

You mean running the CUDA tests locally works?

Yes. And now it always succeeds. Can't reproduce the error for some reason...

ToucheSir · 2023-04-25T14:07:51Z

I was able to repro consistently yesterday on 1.6 and 1.8. Will try to look into it over the next couple of days.

ToucheSir reviewed Apr 22, 2023

View reviewed changes

src/functor.jl Outdated Show resolved Hide resolved

Fix Conv transfer to AMDGPU

3cde207

ToucheSir reviewed Apr 23, 2023

View reviewed changes

ext/AMDGPUExt/functor.jl Outdated Show resolved Hide resolved

ext/AMDGPUExt/functor.jl Outdated Show resolved Hide resolved

Simplify

14e026c

ToucheSir approved these changes Apr 23, 2023

View reviewed changes

Refactor

56dd083

ToucheSir merged commit 3392a02 into FluxML:master Apr 23, 2023

pxl-th deleted the amd-transfer branch April 24, 2023 05:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Conv transfer to AMDGPU #2235

Fix Conv transfer to AMDGPU #2235

pxl-th commented Apr 21, 2023 •

edited

Loading

ToucheSir left a comment

ToucheSir Apr 23, 2023

ToucheSir Apr 23, 2023

ToucheSir Apr 23, 2023

pxl-th commented Apr 23, 2023

ToucheSir commented Apr 23, 2023

pxl-th commented Apr 24, 2023

ToucheSir commented Apr 24, 2023

pxl-th commented Apr 24, 2023 •

edited

Loading

ToucheSir commented Apr 24, 2023

pxl-th commented Apr 24, 2023

ToucheSir commented Apr 25, 2023

	Flux._isleaf(::AMD_CONV) = return true
	Flux._isleaf(::AMD_CONV) = true


		_amd(m::Union{Conv, ConvTranspose}) = adapt_storage(FluxAMDAdaptor(), m)
		Adapt.adapt_structure(to::FluxAMDAdaptor, m::AMD_CONV) = return m

		_conv_basetype(c::C) where C <: Conv = Conv
		_conv_basetype(c::C) where C <: ConvTranspose = ConvTranspose

Fix Conv transfer to AMDGPU #2235

Fix Conv transfer to AMDGPU #2235

Conversation

pxl-th commented Apr 21, 2023 • edited Loading

PR Checklist

ToucheSir left a comment

Choose a reason for hiding this comment

ToucheSir Apr 23, 2023

Choose a reason for hiding this comment

ToucheSir Apr 23, 2023

Choose a reason for hiding this comment

ToucheSir Apr 23, 2023

Choose a reason for hiding this comment

pxl-th commented Apr 23, 2023

ToucheSir commented Apr 23, 2023

pxl-th commented Apr 24, 2023

ToucheSir commented Apr 24, 2023

pxl-th commented Apr 24, 2023 • edited Loading

ToucheSir commented Apr 24, 2023

pxl-th commented Apr 24, 2023

ToucheSir commented Apr 25, 2023

pxl-th commented Apr 21, 2023 •

edited

Loading

pxl-th commented Apr 24, 2023 •

edited

Loading