-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What the comment mean? #1
Comments
Not entirely sure. You'll have to ask the original author mentioned in the docs. |
As far as I remember, the only difference is normalization. So one can get the gradients from |
Honestly I can't understand the purpose of that function. I couldn't find any equivalent in the original code. As far as I understand, |
Hey, sorry for the unclear comment. I think I wrote that more as a note to myself somehow and it refers to the fact that I feared the gradient might still be unstable without using the As an example, the gradient of the categorical cross entropy is something like This is what |
It makes much more sense now, thank you. However, I don't see how it is specific to CTC cost. If it is related only to softmax/cross-entropy, it must be a trouble for almost any convnet implementation. Do you suggest that theano's backpropagation of the categorial cross-entropy is numerically unstable in general? |
I remember some implementations of it being more reliable than others. The one taking indices seems more stable than the one that expects one-hot coding if I remember correctly. The problem is also that our batch version of CTC needs to propagate zeros in log domain, which leads to some computations that might lead to things like inf - inf or inf * 0. |
Well, I have done some experiments on my tasks. I agree that Anyway I suggest that it should be solved on Theano level. As I said log(softmax(.)) is very common function, it has to be treated correctly. I have done some googling, this problem was noticed and reported in Theano upstream a few times already: Theano/Theano#2944, Theano/Theano#2781, mila-iqia/blocks#654. It seems also that Theano contains a related optimization, but I don't understand its semantics (it is buried in cuDNN). Somebody mentioned very different stability depending on What I took away from these discussions is that Theano can optimize log(softmax(.)) (on CPU also), but sometimes doesn't. Presumably because of |
The comment of pseudo_cost() says: "This cost should have the same gradient but hopefully theano will use a more stable implementation of it." What does this mean actually? Is this implementation not stable for now?
The text was updated successfully, but these errors were encountered: