Incorrect gradients of batchnorm in testmode #548

phaim · 2023-10-27T12:38:30Z

I have been trying to take the derivative of a Flux model in testmode, and noticed that the BatchNorm layer behaves incorrectly for 4D and 5D CUDA-arrays.
Here is a MVE of this behaviour, computing the gradient of the BatchNorm for differently reshaped inputs:

using Flux, CUDA, Zygote

function gradient_varying_shape(m, x, n_dims, device)
    m = m |> device
    Flux.testmode!(m)

    x = reshape(x, ntuple(i -> 1, n_dims)) |> device
    return gradient(input -> sum(m(input).^2), x)[1] |> cpu
end

model = BatchNorm(1)
x = [1f0]

for i=2:7
    cpu_gradient = gradient_varying_shape(model, x, i, cpu) 
    gpu_gradient = gradient_varying_shape(model, x, i, gpu) 
    println("n_dim=$i, cpu: $(cpu_gradient[1]), gpu: $(gpu_gradient[1])")
end

This gives the following output for me:

n_dim=2, cpu: 1.99998, gpu: 0.0
n_dim=3, cpu: 1.99998, gpu: 1.99998
n_dim=4, cpu: 1.99998, gpu: 0.0
n_dim=5, cpu: 1.99998, gpu: 0.0
n_dim=6, cpu: 1.99998, gpu: 1.99998
n_dim=7, cpu: 1.99998, gpu: 1.99998

Looking through the Code, I found that the implementation of the CUDA backwards batchnorm here ignores the argument training. Could this be the origin of this behavior?

I'm using Julia 1.9.3 with NNlib version 0.9.7 and this environment:

[052768ef] CUDA v5.0.0
[587475ba] Flux v0.14.6
[e88e6eb3] Zygote v0.6.66
[02a925ec] cuDNN v1.2.0

The text was updated successfully, but these errors were encountered:

ToucheSir · 2023-10-27T15:14:19Z

It's quite possible. That whole method is a bit of a kludge and hasn't been changed in years, so I'm not surprised it has edge cases.

phaim · 2023-11-14T10:48:01Z

I have been looking into fixing this issue, but I have a hard time understanding the function signature of cudnnBNBackward! and on what variable it is supposed to act. Is there some additional documentation or information on that?

ToucheSir · 2023-11-14T14:49:37Z

No, but the good news is that it's merely a thin wrapper over cudnnBatchNormalizationBackward and that is documented by Nvidia. If you have any questions during your effort, I'd be happy to answer them.

ToucheSir added bug CUDA labels Oct 27, 2023

ToucheSir added the help wanted label Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect gradients of batchnorm in testmode #548

Incorrect gradients of batchnorm in testmode #548

phaim commented Oct 27, 2023

ToucheSir commented Oct 27, 2023

phaim commented Nov 14, 2023

ToucheSir commented Nov 14, 2023

Incorrect gradients of batchnorm in testmode #548

Incorrect gradients of batchnorm in testmode #548

Comments

phaim commented Oct 27, 2023

ToucheSir commented Oct 27, 2023

phaim commented Nov 14, 2023

ToucheSir commented Nov 14, 2023