Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Setting initial state on ConvLSTM2D with input variables #9761

Closed
mharradon opened this issue Mar 27, 2018 · 11 comments
Closed

BUG: Setting initial state on ConvLSTM2D with input variables #9761

mharradon opened this issue Mar 27, 2018 · 11 comments

Comments

@mharradon
Copy link

mharradon commented Mar 27, 2018

Should this work, or is this an unsupported use case?

import keras
from keras import layers as L

x = L.Input((4,6,6,3))
init_state = L.Input((6,6,3))
y = L.ConvLSTM2D(filters=3,kernel_size=(3,3),padding='same',return_sequences=True)(x,initial_state=[init_state,init_state])
Using TensorFlow backend.
Traceback (most recent call last):
  File "ConvLSTM2DTest.py", line 10, in <module>
    y = L.ConvLSTM2D(filters=3,kernel_size=(3,3),padding='same',return_sequences=True)(x,initial_state=[init_state,init_state])
  File ".../keras/layers/convolutional_recurrent.py", line 319, in __call__
    output = super(ConvRNN2D, self).__call__(full_input, **kwargs)
  File ".../keras/layers/recurrent.py", line 496, in __call__
    inputs, initial_state, constants)
  File ".../keras/layers/recurrent.py", line 655, in _standardize_args
    assert initial_state is None and constants is None
AssertionError

Thanks!

@mizima
Copy link

mizima commented Apr 13, 2018

Hello,
I think the issue is related to #7612. The result of the discussion therein resulted in change of API. Now (2.1.5), the initial_state is taken from inputs according to
initial_state = inputs[1:] # part of _standardize_args impl
and the original argument initial_state is checked to be None (the AsserionError you see). Therefore, your code should have been written

y = L.ConvLSTM2D(filters=3,kernel_size=(3,3),padding='same',return_sequences=True)([x,
 init_state,init_state])

BUT, it seems that _standardize_args is called twice for ConvLSTM2D (once by ConvRNN2D impl and then via super(ConvRNN2D, self).__call__ propagated to RNN impl). Therefore, the assertion fires even if you fix the code on the client side. This issue does not occur for (dense) LSTM that was discussed under #7612.

I think the solution is to remove the (redundant?)

inputs, initial_state, constants = self._standardize_args(
            inputs, initial_state, constants)

from ConvRNN2D.__call__. It fixes the AsserionError for me.

Can anybody confirm?

@ChengeLi
Copy link

ChengeLi commented Jun 6, 2018

I had the same problem with mizima, it would be really helpful if someone can confirm this hacky solution.

@ribhupathria
Copy link

We are hitting the same issue with Seq2Seq LSTM model based on https://arxiv.org/pdf/1409.3215.pdf. Issue is seen when creating converting model to tf estimator. Is there a fix planned anytime soon?

@Gerryflap
Copy link

I am still running into this bug at the moment. I've tried removing those (seemingly redundant) standardize calls, but haven't had much luck with that method. If this does not work, is there any other method of implementing convolutional seq2seq models in Keras?

@fangzuliang
Copy link

Hello,
I think the issue is related to #7612. The result of the discussion therein resulted in change of API. Now (2.1.5), the initial_state is taken from inputs according to
initial_state = inputs[1:] # part of _standardize_args impl
and the original argument initial_state is checked to be None (the AsserionError you see). Therefore, your code should have been written

y = L.ConvLSTM2D(filters=3,kernel_size=(3,3),padding='same',return_sequences=True)([x,
 init_state,init_state])

BUT, it seems that _standardize_args is called twice for ConvLSTM2D (once by ConvRNN2D impl and then via super(ConvRNN2D, self).__call__ propagated to RNN impl). Therefore, the assertion fires even if you fix the code on the client side. This issue does not occur for (dense) LSTM that was discussed under #7612.

I think the solution is to remove the (redundant?)

inputs, initial_state, constants = self._standardize_args(
            inputs, initial_state, constants)

from ConvRNN2D.__call__. It fixes the AsserionError for me.

Can anybody confirm?

I had met the same AssertionError, thanks for your suggestions, I solved the problems.

@dzhv
Copy link

dzhv commented Jun 5, 2019

This issue is still not addressed in the latest release.
What can be done for it to receive attention?

Also, although the proposed solution (below) fixes the model for single GPU training, I still face issues due to the initial_state when training on multiple GPUs using keras.utils.multi_gpu_model .

I think the solution is to remove the (redundant?)

inputs, initial_state, constants = self._standardize_args(
            inputs, initial_state, constants)

from ConvRNN2D.__call__. It fixes the AsserionError for me.

@aharchaoumehdi
Copy link

Do we have any update on this issue?

In my seq2seq model based on convLSTM2DCell and convRNN2D, I have to pass the encoder_states as initial_state to the decoder (see code below). For now, this

decoder_outputs, _, _ = decoder(decoder_inputs, initial_state=encoder_states) # a tensor

leads to an Assertion error.

Here is the function that tries to implement a seq2seq convLSTM2D model, for sequences of video frames as input.

def s2s_convlstm(nfilters, kernel_size, nrows, ncols, nchannels, learning_rate, num_layers, num_gpus):
    optimiser = Adam(lr=learning_rate) 
    loss = "mse" 
    # encoder
    encoder_inputs = Input(shape=(None, nrows, ncols, nchannels)) # a tensor
    encoder_cells = ConvLSTM2DCell(filters=nfilters, kernel_size=kernel_size, padding='same') 
    encoder = ConvRNN2D(encoder_cells, return_state=True) # a function
    encoder_output, hidden_state, cell_state = encoder(encoder_inputs) # a tensor
    encoder_states = [hidden_state, cell_state] # a tensor
    print(encoder_states)
    # decoder
    decoder_inputs = Input(shape=(None, nrows, ncols, nchannels)) # a tensor
    decoder_cells = ConvLSTM2DCell(filters=nfilters, kernel_size=kernel_size, padding='same') 
    decoder = ConvRNN2D(decoder_cells, return_sequences=True, return_state=True) # a function
    print(decoder_inputs)
    decoder_outputs, _, _ = decoder(decoder_inputs, initial_state=encoder_states) # a tensor
    # model
    model = Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs) # a function
    parallel_model = multi_gpu_model(model, gpus=num_gpus) # a function: instantiates a Model object using inputs and outputs: behind the scenes, Keras retrives every layer involved in going from input to output, bringing them together into a graph-like structure-- a Model object. The reason it works is output was obtained by repeatedly transforming input. Otherwise, Runtime error would occur. 
    parallel_model.compile(optimizer=optimiser, loss=loss, metrics=['mae'])
    parallel_model.summary()
    return parallel_model

@kisckorea
Copy link

Hello, I have the same problem too. Anybody who has a solution?
thanks!

num_input_features = (201,201,1)
num_output_features =(201,201,1)

encoder_inputs = keras.layers.Input(shape=(None, num_input_features[0], num_input_features[1], num_input_features[2]))

encoder = ConvLSTM2D(filters=8, kernel_size=(3, 3),input_shape=(None, 201, 201, 1),
data_format='channels_last', padding='same', activation='tanh', return_state=True)
encoder_outputs_and_states = encoder(encoder_inputs)
encoder_states = encoder_outputs_and_states[1:]

decoder_inputs = keras.layers.Input(shape=(None, num_output_features[0], num_output_features[1], num_output_features[2]))
decoder = ConvLSTM2D(filters=8, kernel_size=(3, 3),
input_shape=(None, 201, 201, 1), data_format='channels_last',
padding='same', activation='tanh', return_sequences=True, return_state=True)
decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
decoder_outputs = decoder_outputs_and_states[0]

model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs)


Traceback (most recent call last):
File "//model/Conv_LSTM_seq2seq_draft4.py", line 73, in
decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
File "//anaconda3/envs/dev_tf1.13/lib/python3.6/site-packages/keras/layers/convolutional_recurrent.py", line 321, in call
output = super(ConvRNN2D, self).call(full_input, **kwargs)
File "//anaconda3/envs/dev_tf1.13/lib/python3.6/site-packages/keras/layers/recurrent.py", line 529, in call
inputs, initial_state, constants, self._num_constants)
File "//anaconda3/envs/dev_tf1.13/lib/python3.6/site-packages/keras/layers/recurrent.py", line 2336, in _standardize_args
assert initial_state is None and constants is None
AssertionError


removing below does not work..
inputs, initial_state, constants = self._standardize_args(
inputs, initial_state, constants)

from ConvRNN2D.call. It fixes the AsserionError for me.

@gmrhub
Copy link

gmrhub commented Dec 20, 2019

A better fix would follow tf.keras issue.
But that is having issue of:

but after loading the saved model weights in a complete new python session the results on validation data doesn't match at all

@flyinskybtx
Copy link

I think the problem comes from line 307 - line 337 in convolutional_recurrent.py.
After initial state is concated to inputs, the code also pass initial state to kwargs, which passed to super class call , and triggered _standardize_args() again. And here comes the problem.

I think a possible fix is not to update kwargs with "initial_state". So here i remove line 308:
kwargs['initial_state'] = initial_state

and things go well

@gyla1993
Copy link

gyla1993 commented Feb 5, 2020

I meet the same problem with keras==2.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests