Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability of log(softmax) and its gradient in RNN #2944

Open
lamblin opened this issue May 21, 2015 · 2 comments
Open

Stability of log(softmax) and its gradient in RNN #2944

lamblin opened this issue May 21, 2015 · 2 comments

Comments

@lamblin
Copy link
Member

lamblin commented May 21, 2015

In some cases, SoftmaxGrad is not optimized away and still part of the final graph, that happens in particular in RNN (and similar models), presumably when the softmax is done inside the scan.
A related problem is that the computation of the log-probability is not stabilized numerically, which is why we have to compute log(prop + eps) in the LSTM deep learning tutorial (https://github.com/lisa-lab/DeepLearningTutorials/blob/master/code/lstm.py#L336).

@lamblin
Copy link
Member Author

lamblin commented May 22, 2015

See also #2781.

@justheuristic
Copy link

justheuristic commented Jan 15, 2017

The issue seems to persist even for simple transformations (reshape, repeat, etc.)

import numpy as np
import theano
import theano.tensor as T
print theano.__version__

logits = T.tensor3('logits [batch,tick,classes]')

probs = T.nnet.softmax(logits.reshape([-1,2])).reshape(logits.shape)

logp = T.log(probs)

f = theano.function([logits],logp)

theano.printing.debugprint(f)
print f(np.arange(30).astype('float32').reshape([3,5,2])*1000)

Yields output

0.9.0dev4.dev-RELEASE
Elemwise{Log}[(0, 0)] [id A] ''   7
 |Reshape{2} [id B] ''   6
   |Alloc [id C] ''   5
   | |InplaceDimShuffle{0,x,1} [id D] ''   4
   | | |Softmax [id E] ''   3
   | |   |InplaceDimShuffle{x,0} [id F] ''   2
   | |     |MakeVector{dtype='float32'} [id G] ''   1
   | |       |<TensorType(float32, scalar)> [id H]
   | |       |Elemwise{neg,no_inplace} [id I] ''   0
   | |         |<TensorType(float32, scalar)> [id H]
   | |TensorConstant{1} [id J]
   | |TensorConstant{3} [id K]
   | |TensorConstant{2} [id L]
   |TensorConstant{[3 2]} [id M]
[[-inf   0.]
 [-inf   0.]
 [-inf   0.]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants