-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why is detach necessary #116
Comments
you are correct. it is done for speed, not correctness. The computation of gradients wrt the weights of netG can be fully avoided in the backward pass if the graph is detached where it is. |
Thanks for the quick reply @soumith! |
Missed detach when implementing dcgan in pytorch, and it gives me this error:
|
@soumith This is not true. Detaching |
what @plopd has said is absolutely right. Detaching |
ah yes, what I said above is only true if we also |
Hello, Either way, wouldn't you want to track the next computation, the operation of D over fake, for the backward pass of D? |
@plopd what you are saying doesn't make any sense to me |
let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move? |
@Einstellung |
@shiyuanyin |
######## |
detach just reduce the work that G() gradient upgrade in training step of D(), because G() will train in next step |
If I understand correctly, then if i created a new noise input for G, there's no need for the detach() call? |
Do you mean like creating fake1=netG(noise) which is same as fake that was before disconnection. Even i have the same doubt can someone please clarify this? |
I think that's because if you don't use |
Hi, I am wondering why is detach necessary in this line:
examples/dcgan/main.py
Line 230 in a60bd4e
I understand that we want to update the gradients of netD without changin the ones of netG. But if the optimizer is only using the parameters of netD, then only its weight will be updated. Am I missing something here?
Thanks in advance!
The text was updated successfully, but these errors were encountered: