Skip to content

Latest commit

 

History

History
53 lines (42 loc) · 2.9 KB

README.md

File metadata and controls

53 lines (42 loc) · 2.9 KB

LSTM Implementation in Caffe

Note that the master branch of Caffe supports LSTM now. (Jeff Donahue's implementation has been merged.)
This repo is no longer maintained.

Speed comparison (Titan X, 3-layer LSTM with 2048 units)

Jeff's code is more modularized, whereas this code is optimized for LSTM.
This code computes gradient w.r.t. recurrent weights with a single matrix computation.

  • Batch size = 20, Length = 100
Code Forward(ms) Backward(ms) Total (ms)
This code 248 291 539
Jeff's code 264 462 726
  • Batch size = 4, Length = 100
Code Forward(ms) Backward(ms) Total (ms)
This code 131 118 249
Jeff's code 140 290 430
  • Batch size = 20, Length = 20
Code Forward(ms) Backward(ms) Total (ms)
This code 49 59 108
Jeff's code 52 92 144
  • Batch size = 4, Length = 20
Code Forward(ms) Backward(ms) Total (ms)
This code 29 26 55
Jeff's code 30 61 91

Example

An example code is in /examples/lstm_sequence/.
In this code, LSTM network is trained to generate a predefined sequence without any inputs.
This experiment was introduced by Clockwork RNN.
Four different LSTM networks and shell scripts(.sh) for training are provided.
Each script generates a log file containing the predicted sequence and the true sequence.
You can use plot_result.m to visualize the result.
The result of four LSTM networks will be as follows:

  • 1-layer LSTM with 15 hidden units for short sequence Diagram
  • 1-layer LSTM with 50 hidden units for long sequence Diagram
  • 3-layer deep LSTM with 7 hidden units for short sequence Diagram
  • 3-layer deep LSTM with 23 hidden units for long sequence Diagram