Implementation based on Jeff Donahue's LSTM implementation for Caffe.
Requires a recent version of caffe (or alternatively, the "recurrent" branch of Jeff Donahue's caffe github repository).
Clone with git clone -b recurrent <address>
Then, simply copy the files in include/ and src/ to their corresponding directories.
You need to merge the protobuffer defintion in patch.proto with src/caffe/proto/caffe.proto. To make this job easier, I have written a small patcher in python, see patch_proto.py.
- Note: I do not take any responsibility for files broken by the patcher! Merge the files manually!
- It does create a backup file!
- Note: The patcher is more of a quick hack. Applying a patch more than once will destroy caffe.proto
If you do not want to use the patcher, you will have to manually merge the two files: Extend the block "LayerParameter" accordingly, and add the other blocks to the end of the file.
We provide a working configuration file with this repository, see
Makefile.config
It was tested with g++-5 and Cuda 7.5.
Once everything is prepared, run make clean && make to recompile caffe.
Requires C++11 to compile. Set CUSTOM_CXX := g++ -std=c++11 in Makefile.config.
We have observed some bugs when compiling with g++-5 (which is not technically supported with CUDA 7.5).
To avoid these problems, add:
-D__STRICT_ANSI__ -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES
to the compiler line.
Furthermore, a bug seems to appear in crop_layer.cu when using C++11 and Cuda. You can find a simple fix in fixes/.
Note that Jeff's implementation expects data of shape T x N x ..., where T are the number of timesteps and N the number of independent streams, e.g., videos.
That means the data needs to be interleaved: <video1_t1>, <video2_t1>, <video1_t2>, <video2_t2>,
etc..
Use "lstm_convolution_param" to specify the details of the convolutional layer inside a ConvLSTM layer. It can have the following parameters:
- "type": Whether this is a input-to-hidden or hidden-to-hidden operations, or both ("input", "hidden", "all").
- This means you can specify up to two lstm_convolution_param per layer (one "hidden" plus one "input", or simply one "all")!
- Default Conv.-Params, such as "kernel_size", "num_outputs", "pad", etc.
- Certain features may not be available
For an example, please refer to the models/ directory!
To allow for correct gradient backpropagation, we currently use a workaround to force Caffe to propagate all the way to latent states h_0 / c_0. This is realized by the DummyForward layer, which simply forwards data, and backpropagates gradients. The layer owns a 'dummy' parameter of shape (1), which you may notice if you wish to inspect ConvLSTM's weights. The parameter is without further meaning. This workaround may or may not be removed in the future, in which case old models may not be compatible, as the number of parameters count will have changed.
If you find any bugs or have other feedback, please let me know! Thanks :)
Contact: s [dot] agethen [at] gmail [dot] com