Bachelor thesis by Ralph Meier & Dano Roost. Check out the following branches:
- baseline-version: CNN only scene understanding
- cnn-plus-rnn-version: CNN + RNN (best performing) uses LSTM to aggregate information over multiple timesteps
- convlstm-version-new: ConvLSTM version that also aggregates data over timesteps