Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null gradient when turning free_graph off #98

Open
eliemichel opened this issue Jun 29, 2020 · 3 comments
Open

Null gradient when turning free_graph off #98

eliemichel opened this issue Jun 29, 2020 · 3 comments

Comments

@eliemichel
Copy link
Contributor

eliemichel commented Jun 29, 2020

The following snippet prints null gradients while if using backward(c, true) we get the right value (5.0, 2.0):

using FloatD = DiffArray<float>;
FloatD a = 2.0f;
FloatD b = 5.0f;
set_requires_gradient(a);
set_requires_gradient(b);

FloatD c = a * b;

backward(c, false);
LOG << "dc/da = " << gradient(a);
LOG << "dc/db = " << gradient(b);

Output:

dc/da = 0
dc/db = 0

Expected:

dc/da = 5.0
dc/db = 2.0

Built with MSVC16, without CUDA, commit e240a4b

edit: The line canceling the gradients is this one: https://github.com/mitsuba-renderer/enoki/blob/master/src/autodiff/autodiff.cpp#L896 I am not sure what this reference counter is, but shouldn't the condition be if (target.ref_count_int == 0) rather than > 0 ?

@Speierers
Copy link
Member

Hi @eliemichel ,

I doubt DiffArray<float>. IIRC automatic differentiation in enoki is only supported for CUDAArray. Do you have the same issue when using DiffArray<CUDAArray<float>> instead?

@eliemichel
Copy link
Contributor Author

Regarding the other issue I don't have precise ideas but for this one what do you think about the suggested fix of changing line 896 of autodiff.cpp? Did I misunderstand the meaning of this ref_count_int?

@stefanjp
Copy link

stefanjp commented Oct 28, 2021

i could reproduce the problem with the "Interfacing with PyTorch" example from the documentation, just modifying FloatD.backward() to FloatD.backward(free_graph=False). I also added imports for FloatC and FloatD as the example did not run out of the box, but i guess that is unrelated. I ended up here after trying to modify the mitsuba autodiff function render_torch to not whipe the AD graph.

import torch
import enoki
from enoki.cuda_autodiff import Float32 as FloatD
from enoki.cuda import Float32 as FloatC
class EnokiAtan2(torch.autograd.Function):
    @staticmethod
    def forward(ctx, arg1, arg2):
        # Convert input parameters to Enoki arrays
        ctx.in1 = FloatD(arg1)
        ctx.in2 = FloatD(arg2)

        # Inform Enoki if PyTorch wants gradients for one/both of them
        enoki.set_requires_gradient(ctx.in1, arg1.requires_grad)
        enoki.set_requires_gradient(ctx.in2, arg2.requires_grad)

        # Perform a differentiable computation in ENoki
        ctx.out = enoki.atan2(ctx.in1, ctx.in2)

        # Convert the result back into a PyTorch array
        out_torch = ctx.out.torch()

        # Optional: release any cached memory from Enoki back to PyTorch
        enoki.cuda_malloc_trim()

        return out_torch

    @staticmethod
    def backward(ctx, grad_out):
        # Attach gradients received from PyTorch to the output
        # variable of the forward pass
        enoki.set_gradient(ctx.out, FloatC(grad_out))

        # Perform a reverse-mode traversal. Note that the static
        # version of the backward() function is being used, see
        # the following subsection for details on this
        FloatD.backward(free_graph=False)

        # Fetch gradients from the input variables and pass them on
        result = (enoki.gradient(ctx.in1).torch()
                  if enoki.requires_gradient(ctx.in1) else None,
                  enoki.gradient(ctx.in2).torch()
                  if enoki.requires_gradient(ctx.in2) else None)

        # Garbage-collect Enoki arrays that are now no longer needed
        del ctx.out, ctx.in1, ctx.in2

        # Optional: release any cached memory from Enoki back to PyTorch
        enoki.cuda_malloc_trim()

        return result

# Create enoki_atan2(y, x) function
enoki_atan2 = EnokiAtan2.apply

# Let's try it!
y = torch.tensor(1.0, device='cuda')
x = torch.tensor(2.0, device='cuda')
y.requires_grad_()
x.requires_grad_()

o = enoki_atan2(y, x)
print(o)

o.backward()
print(y.grad)
print(x.grad)

The modified example prints:

tensor([0.4636], device='cuda:0', grad_fn=)
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')

Whereas the unmodified example prints:

tensor([0.4636], device='cuda:0', grad_fn=)
tensor(0.4000, device='cuda:0')
tensor(-0.2000, device='cuda:0')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants