Skip to content

Latest commit

 

History

History
206 lines (149 loc) · 7.17 KB

File metadata and controls

206 lines (149 loc) · 7.17 KB

Torch Tutorial

Make sure you have warp-ctc installed by running luarocks make torch_binding/rocks/warp-ctc-scm-1.rockspec at the top level directory.

Using the torch bindings, it is easy to experiment with CTC interactively.

If you have compiled without GPU support, replace torch.Tensor(...):cuda() with torch.Tensor(...):float() and calls to gpu_ctc with cpu_ctc.

The CTC algorithm gives the loss between input sequences and target output sequences. Since CTC is commonly used with neural networks, we will call the input sequences activation sequences. The target output sequences are drawn from a fixed alphabet. For the discussion here we choose the four characters a,b,c,d. The algorithm requires a <BLANK> symbol distinct from the alphabet. This means that the activation sequences will be sequences of vectors of dimension five (the size of our alphabet together with <BLANK>). The vectors will be converted to a probability distribution over the alphabet and the <BLANK> with a SoftMax function. So for example a problem with an activation sequence of length seven would be (the components of the vectors here are arbitrary)

{<2,0,0,0,0>, <1,3,0,0,0>, <1,4,1,0,0>, <1,1,5,6,0>, <1,1,1,1,1>, <1,1,7,1,1>, <9,1,1,1,1>}

and a valid target output sequence would be daceba.

To start we are going to use a very simple example. In the example we will have an activation sequence of length one and also a target output sequence of length one. To specify the activation sequence then, we have to write down the components of a single five dimensional vector. We are going to use <0,0,0,0,0> as the single vector in the activation sequence and so the resulting probabilities will be 0.2,0.2,0.2,0.2,0.2.

For the targets, we are going to have a single label a.

Firstly, how do we present the data to the algorithm? As usual in Torch, the activations are put into rows in a 2 dimensional Tensor. The target labels are put into a lua table of tables with one table for each sequence of target labels. We only have one sequence (of one label) and so the table is {{1}} as the label a has index 1 (the index 0 is reserved for the blank symbol). Since we are allowing the possibility of inputting different length activation sequences, we have to specify the length of our input activation sequence, which in this case is 1 with a lua table {1}.

To calculate the value of the CTC loss for the above problem just observe that with a one element input sequence and a single output label, there is only one possible alignment and so the symbol must be emitted at the first time step. The probability of emitting the symbol is 0.2. The algorithm returns the negative log likelihood which is -ln(0.2)=1.6094.

Now we want to use the code to do the calculation. Start with a Torch session and require the libraries.

If you have GPU support

th>require 'cutorch'  

for CPU only

th>require 'warp_ctc'  

We need to put the activations in rows - so note the double braces.

th>acts = torch.Tensor({{0,0,0,0,0}}):cuda()

If an empty grad Tensor is passed, the gradient calculation will not be done.

th>grads = torch.Tensor():cuda()

For the target labels and sizes of the input sequence,

th>labels = {{1}}
th>sizes ={1}

If you have CUDA support, use gpu_ctc otherwise use cpu_ctc

th> gpu_ctc(acts, grads, labels, sizes)

{
  1 : 1.6094379425049
}

The function returns a lua table of the CTC loss for each set of sequences.

Now for a slightly more interesting example. Suppose we have an input sequence of length three, with activations

1,2,3,4,5,6,7,8,9,10 and 11,12,13,14,15.

The corresponding probabilities for the frames are then

0.0117, 0.0317, 0.0861, 0.2341, 0.6364

(the probabilties are the same for each frame in this special case).

For target symbols we will use the sequence c,c.

th>acts = torch.Tensor({{1,2,3,4,5},{6,7,8,9,10},{11,12,13,14,15}}):cuda()
th>labels = {{3,3}}
th>sizes = {3}

CTC calculates the probability of all the possible alignments. Note that the targets contain the repeated symbol c. CTC cannot emit a repeated symbol on consecutive timesteps (for more details consult http://www.cs.toronto.edu/~graves/icml_2006.pdf) it must separate the repeated symbol with a blank and so the only possible aligned sequence is

c <BLANK> c.

CTC assumes the label probabilities are conditionally independent given the data and so we expect the answer to be Pr(c at frame 1)*Pr(<BLANK> at frame 2)*Pr(c at frame 3) = 0.2341*0.0117*0.2341 and -ln(0.2341*0.0117*0.2341) = 7.3522.

th> gpu_ctc(acts, grads, labels, sizes)

{
  1 : 7.355742931366
}

The small numerical difference is from doing one of the calculations by hand.

Suppose the target sequence is b,c and the activations are

-5,-4,-3,-2,-1,-10,-9,-8,-7,-6 and -15,-14,-13,-12,-11.

The corresponding probabilities for the frames are then again

0.0117, 0.0317, 0.0861, 0.2341, 0.6364.

Now there are five possible alignments as repeated symbols are collapsed and blanks are removed: <BLANK> b c, b <BLANK> c, b c <BLANK>, b b c and b c c.

The result should be -ln(3*0.0117*0.0861*0.2341 + 0.0861*0.0861*0.2341 + 0.0861*0.2341*0.2341) = 4.9390

th>acts = torch.Tensor({{-5,-4,-3,-2,-1},{-10,-9,-8,-7,-6},{-15,-14,-13,-12,-11}}):cuda()
th>labels = {{2,3}}
th>sizes = {3}
th>gpu_ctc(acts, grads, labels, sizes)

{
  1 : 4.938850402832
}

So we have three examples. The final example shows how to do all three at once is the case where we want to put minibatches through the algorithm. The labels are now {{1}, {3,3}, {2,3}} and the lengths of the input sequences are {1,3,3}. We have to put all of the input sequences in a single two dimensional matrix. This is done by interleaving the input sequence elements so that the input matrix will look like this. For clarity we start with the first two input sequences

entries col1 col2 col3 col4 col5
seq1 item 1 0 0 0 0 0
seq2 item 1 1 2 3 4 5
seq1 item 2 P P P P P
seq2 item 2 6 7 8 9 10
seq1 item 3 P P P P P
seq2 item 3 11 12 13 14 15

Since the first sequence has no second or third elements, we pad the matrix with zeros (which appear as P in the above table). Now we put the third sequence in

entries col1 col2 col3 col4 col5
seq1 item 1 0 0 0 0 0
seq2 item 1 1 2 3 4 5
seq3 item 1 -5 -4 -3 -2 -1
seq1 item 2 P P P P P
seq2 item 2 6 7 8 9 10
seq3 item 2 -10 -9 -8 7 -6
seq1 item 3 P P P P P
seq2 item 3 11 12 13 14 15
seq3 item 3 -15 -14 -13 -12 -11

The complete example in Torch is

th>acts = torch.Tensor({{0,0,0,0,0},{1,2,3,4,5},{-5,-4,-3,-2,-1},
                        {0,0,0,0,0},{6,7,8,9,10},{-10,-9,-8,-7,-6},
                        {0,0,0,0,0},{11,12,13,14,15},{-15,-14,-13,-12,-11}}):cuda()
th>labels = {{1}, {3,3}, {2,3}}
th>sizes = {1,3,3}
th>gpu_ctc(acts, grads, labels, sizes)

{
  1 : 1.6094379425049
  2 : 7.355742931366
  3 : 4.938850402832
}

In order to obtain gradients wrt the incoming activations simply pass a tensor of the same size as the activations tensor. Also see torch_binding/tests/test.lua for more examples.