why is detach necessary #116

rogertrullo · 2017-03-20T22:12:36Z

Hi, I am wondering why is detach necessary in this line:

Line 230 in a60bd4e

output = netD(fake.detach())

I understand that we want to update the gradients of netD without changin the ones of netG. But if the optimizer is only using the parameters of netD, then only its weight will be updated. Am I missing something here?
Thanks in advance!

soumith · 2017-03-20T22:13:35Z

you are correct. it is done for speed, not correctness. The computation of gradients wrt the weights of netG can be fully avoided in the backward pass if the graph is detached where it is.

rogertrullo · 2017-03-20T22:17:46Z

Thanks for the quick reply @soumith!

sunshineatnoon · 2017-04-01T12:38:21Z

Missed detach when implementing dcgan in pytorch, and it gives me this error:

RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.

dantp-ai · 2018-07-01T02:48:44Z

@soumith This is not true. Detaching fake from the graph is necessary to avoid forward-passing the noise through G when we actually update the generator. If we do not detach, then, although fake is not needed for gradient update of D, it will still be added to the computational graph and as a consequence of backward pass which clears all the variables in the graph (retain_graph=False by default), fake won't be available when G is updated.

tusharkr · 2018-11-23T15:40:19Z

what @plopd has said is absolutely right. Detaching fake from the graph is necessary and will lead to an error if not done so.

soumith · 2018-12-13T06:35:54Z

ah yes, what I said above is only true if we also retain_graph=True. My bad, I stand corrected.

MeirGavish · 2019-06-15T17:37:19Z

Hello,
I am a little confused by this and will greatly appreciate help in understanding.
According to my understanding detach() prevents further computations from being tracked. (I suppose it also prevent previous computations from being taken into account in the backward pass?)

Either way, wouldn't you want to track the next computation, the operation of D over fake, for the backward pass of D?
If you wanted to prevent tracking of the Generator, wouldn't it make sense to detach before applying G and then restore tracking for D right at the point where detach is now called? (With requires_grad_(True)?)
Thank you

colinshenc · 2019-07-12T21:19:08Z

@plopd what you are saying doesn't make any sense to me

Einstellung · 2019-09-13T02:09:03Z

let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move?
Because we freeze the gradient, we can speed up the training, so we can use it where it can be used. It is not an extra task. Then when we train the generator, because of logD(G(z)), there is no way to freeze the gradient of D, so we will not write detach here.

shiyuanyin · 2019-10-11T01:01:37Z

@Einstellung
hi
I have a question, the G model gradient update
首先有三个独立的网络，鉴别器网络D，生成器网络G和源网络S
1、鉴别器网络首先输入:合并的值特征值，输出：LogSoftmax()，损失是用1 和0 的标签，二分类损失，梯度更新backward()
2 输入：首先用鉴别器，输入：生成器G的输出特征，输出：LogSoftmax()，
然后我不明白，怎么和生成器G，关联起来昵，它和鉴别器是两个独立的网络，鉴别器的梯度更新怎么和 G 联系起来昵？？？，
G网络的输入是target 数据标签，输出是fc
D网路的输入是 G的特征以及 G与S的拼接特征。输出是LogSoftmax()

shiyuanyin · 2019-10-11T01:17:43Z

S feature->
G feature->

Einstellung · 2019-10-11T03:26:02Z

@shiyuanyin
I don't really know what you said. You should use G result to update D

shiyuanyin · 2019-10-11T03:43:12Z

@Einstellung

I don't really know what you said. You should use G result to update D

    ############################
    # (2) Update G network: maximize log(D(G(z)))
    ###########################
    netG.zero_grad()
    label.data.fill_(real_label)  # fake labels are real for generator cost
    output = netD(fake)
    errG = criterion(output, label)
    errG.backward()
    D_G_z2 = output.data.mean()
    optimizerG.step()

########
output = netD(fake), output is related to D model , not G model, when backward(D get the input G gradient) ,the gadient how to give the G, because is not connect
##I think the process, netG.backward(the model D grad,to input G feature)
this is right???

disanda · 2020-04-19T07:59:43Z

detach just reduce the work that G() gradient upgrade in training step of D(), because G() will train in next step

tjysdsg · 2021-09-26T07:57:01Z

@soumith This is not true. Detaching fake from the graph is necessary to avoid forward-passing the noise through G when we actually update the generator. If we do not detach, then, although fake is not needed for gradient update of D, it will still be added to the computational graph and as a consequence of backward pass which clears all the variables in the graph (retain_graph=False by default), fake won't be available when G is updated.

If I understand correctly, then if i created a new noise input for G, there's no need for the detach() call?

Shanmukh0028 · 2022-02-07T15:16:51Z

@soumith This is not true. Detaching fake from the graph is necessary to avoid forward-passing the noise through G when we actually update the generator. If we do not detach, then, although fake is not needed for gradient update of D, it will still be added to the computational graph and as a consequence of backward pass which clears all the variables in the graph (retain_graph=False by default), fake won't be available when G is updated.

If I understand correctly, then if i created a new noise input for G, there's no need for the detach() call?

Do you mean like creating fake1=netG(noise) which is same as fake that was before disconnection. Even i have the same doubt can someone please clarify this?

batman47steam · 2022-04-16T07:20:21Z

I think that's because if you don't use fake.detach() in output = netD(fake.detach()).view(-1) then fake is just some middle variable in the whole computational Graph, it will track from netG() to netD(). and when you can optimizerD.step() all grad information except leaf nodes are released. which means no more gradient information about netG() in the computational Graph. then you use errG.backward() it will cause an error

soumith closed this as completed Mar 20, 2017

yyrkoon27 mentioned this issue Dec 8, 2017

Detach in Lab3-2 & 3-3 2017-fall-DL-training-program/VAE-GAN-and-VAE-GAN#20

Open

aidiary mentioned this issue Mar 3, 2018

Why there is no detach() after the generator? znxlwm/pytorch-generative-model-collections#4

Open

daa233 mentioned this issue Mar 1, 2020

Question about computing losses daa233/generative-inpainting-pytorch#29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is detach necessary #116

why is detach necessary #116

rogertrullo commented Mar 20, 2017

soumith commented Mar 20, 2017

rogertrullo commented Mar 20, 2017

sunshineatnoon commented Apr 1, 2017

dantp-ai commented Jul 1, 2018

tusharkr commented Nov 23, 2018

soumith commented Dec 13, 2018

MeirGavish commented Jun 15, 2019

colinshenc commented Jul 12, 2019

Einstellung commented Sep 13, 2019

shiyuanyin commented Oct 11, 2019

shiyuanyin commented Oct 11, 2019

Einstellung commented Oct 11, 2019

shiyuanyin commented Oct 11, 2019

disanda commented Apr 19, 2020

tjysdsg commented Sep 26, 2021

Shanmukh0028 commented Feb 7, 2022

batman47steam commented Apr 16, 2022

why is detach necessary #116

why is detach necessary #116

Comments

rogertrullo commented Mar 20, 2017

soumith commented Mar 20, 2017

rogertrullo commented Mar 20, 2017

sunshineatnoon commented Apr 1, 2017

dantp-ai commented Jul 1, 2018

tusharkr commented Nov 23, 2018

soumith commented Dec 13, 2018

MeirGavish commented Jun 15, 2019

colinshenc commented Jul 12, 2019

Einstellung commented Sep 13, 2019

shiyuanyin commented Oct 11, 2019

shiyuanyin commented Oct 11, 2019

Einstellung commented Oct 11, 2019

shiyuanyin commented Oct 11, 2019

disanda commented Apr 19, 2020

tjysdsg commented Sep 26, 2021

Shanmukh0028 commented Feb 7, 2022

batman47steam commented Apr 16, 2022