-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd behavior when training #657
Comments
Is there a way you can replace some of the newer features with ones that are more proven? Using ReLU instead of PReLU should be easy enough, and if you see the same behavior then that will cross that off. I can take a look at upscale |
|
There may be some issues with Bilinear interpolation - torch is giving me different results. |
You have to use torch's upsample with |
Ahhhhh okay I'll make sure to document that, that's an important piece of info! |
Yeah its currently only documented in a test |
Okay I think the issue is actually batched convtrans2d, I have a test that's passing for CPU and failing for CUDA. The corresponding test for conv2d passes on both |
@opfromthestart let me know if you still encounter issues - upscale and prelu seemed fine to me, so now that convtrans2d is fixed you should be good 👍 |
I am making a CNN based autoencoder, and it will not train properly with large batches. If I feed it a large batch (16), the error will not decrease, or will increase over time. However, when given batches of 1, the error does decrease. I am using some of the new/experimental features (Conv2D, ConvTrans2d, Upscale, PReLU) so I suspect that there is an error in one of them that doesn't calculate gradients right for batched inputs.
The repo of my code is at https://github.com/opfromthestart/touhou-ai, but it probably only runs on my computer at the moment. The structure of the model is at https://github.com/opfromthestart/touhou-ai/blob/master/src/net/dfdx.rs#L155 (lines 13-153 define the encoder, decoder, and actor models).
The text was updated successfully, but these errors were encountered: