Stability of log(softmax) and its gradient in RNN #2944

lamblin · 2015-05-21T20:30:17Z

In some cases, SoftmaxGrad is not optimized away and still part of the final graph, that happens in particular in RNN (and similar models), presumably when the softmax is done inside the scan.
A related problem is that the computation of the log-probability is not stabilized numerically, which is why we have to compute log(prop + eps) in the LSTM deep learning tutorial (https://github.com/lisa-lab/DeepLearningTutorials/blob/master/code/lstm.py#L336).

The text was updated successfully, but these errors were encountered:

lamblin · 2015-05-22T15:14:57Z

See also #2781.

justheuristic · 2017-01-15T10:08:22Z

The issue seems to persist even for simple transformations (reshape, repeat, etc.)

import numpy as np
import theano
import theano.tensor as T
print theano.__version__

logits = T.tensor3('logits [batch,tick,classes]')

probs = T.nnet.softmax(logits.reshape([-1,2])).reshape(logits.shape)

logp = T.log(probs)

f = theano.function([logits],logp)

theano.printing.debugprint(f)
print f(np.arange(30).astype('float32').reshape([3,5,2])*1000)

Yields output

0.9.0dev4.dev-RELEASE
Elemwise{Log}[(0, 0)] [id A] ''   7
 |Reshape{2} [id B] ''   6
   |Alloc [id C] ''   5
   | |InplaceDimShuffle{0,x,1} [id D] ''   4
   | | |Softmax [id E] ''   3
   | |   |InplaceDimShuffle{x,0} [id F] ''   2
   | |     |MakeVector{dtype='float32'} [id G] ''   1
   | |       |<TensorType(float32, scalar)> [id H]
   | |       |Elemwise{neg,no_inplace} [id I] ''   0
   | |         |<TensorType(float32, scalar)> [id H]
   | |TensorConstant{1} [id J]
   | |TensorConstant{3} [id K]
   | |TensorConstant{2} [id L]
   |TensorConstant{[3 2]} [id M]
[[-inf   0.]
 [-inf   0.]
 [-inf   0.]]

lamblin added Optimization Stability labels May 21, 2015

lamblin mentioned this issue May 22, 2015

Question about Softmax.categorical_crossentopy mila-iqia/blocks#654

Closed

lamblin mentioned this issue Jun 12, 2015

numerical unstable log(softmax(·)) #2781

Closed

kshmelkov mentioned this issue Aug 18, 2015

What the comment mean? skaae/Lasagne-CTC#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stability of log(softmax) and its gradient in RNN #2944

Stability of log(softmax) and its gradient in RNN #2944

lamblin commented May 21, 2015

lamblin commented May 22, 2015

justheuristic commented Jan 15, 2017 •

edited

Loading

Stability of log(softmax) and its gradient in RNN #2944

Stability of log(softmax) and its gradient in RNN #2944

Comments

lamblin commented May 21, 2015

lamblin commented May 22, 2015

justheuristic commented Jan 15, 2017 • edited Loading

justheuristic commented Jan 15, 2017 •

edited

Loading