Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] save previous output and state of RNNs #235

Closed
wants to merge 5 commits into from

Conversation

superbock
Copy link
Collaborator

This change enables recurrent neural networks to be initialised with an initial output/state (#230), to be streaming compatible (#185), and also fixes some GRU related parameter glitches (#234).

Unfortunately, this change breaks the API, since parameters are reordered and renamed.
Any thoughts?

@superbock superbock added this to the v0.15 milestone Dec 4, 2016
@superbock superbock force-pushed the save_rnn_prev_out_and_state branch 2 times, most recently from 718f934 to 4ebd9ce Compare December 4, 2016 20:03
@fdlm
Copy link
Contributor

fdlm commented Dec 5, 2016

I see two issues here:

  • Now I don't like the name of the parameters (prev_output) :), because from a user's perspective, I don't understand why I should give an RNN layer a "previous output" when creating it. I might want to set the "first value", or the "initial value" of the layer, but the name prev_output seems counter-intuitive for this (even though it is conceptually the same under the hood!). I would rather go with init (and set self.prev_output = init in the __init__ method).
  • I didn't check this, but in non-streaming mode, calling the RNN twice for the same input would return different results, because the initial value of the units is overwritten (it would be the last output of the first call). I think this will be a general problem for all "stateful" processors (HMM, CRF, ...), and I don't have a solution except having a different process method for streaming and non-streaming mode.

@@ -309,38 +316,46 @@ def activate(self, data):
Activations for this data.

"""
# init previous output
if not hasattr(self, 'prev_output') or self.prev_output is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to ensure backwards compatibility with older models?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise we can not load the old models any more, since they don't have these attributes.

@@ -287,12 +291,15 @@ class LSTMLayer(Layer):
"""

def __init__(self, input_gate, forget_gate, cell, output_gate,
activation_fn=tanh):
activation_fn=tanh, prev_output=None, prev_state=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these initialisations go to the Gate class, as discussed in #233 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this as well, but they are technically not connected to the gates. The prev_output is the previous output of the unit, prev_state is the state, which is the cell weighted by the input gate plus the previous state weighted by the forget gate. Thus I thought it would be the easiest to add them to the layer directly.

@superbock superbock changed the title Save previous output and state of RNNs [WIP] save previous output and state of RNNs Dec 5, 2016
@superbock
Copy link
Collaborator Author

Regarding the issues you mentioned:

  • As indicated elsewhere, I also voted for init, instead of prev_output, at least in the __init__() method. I just used them for clarity right now, since this PR is WIP. I should have indicated this somewhere.
  • Good point! I thought about adding a reset() method to BufferProcessor (or all classes buffering something) to clear the buffer. Although this works, this will be a problem, because people will forget to reset the buffers -- like I forgot to think about that in the first place.

I tried (and still want) to avoid having two different process methods, simply because this will introduce a lot of duplicated code.

How about the following:

We add a second buffer() method to buffer the data and then just use the old, unmodified process() method to do the actual work. This would require additional buffer() calls in processors.process_online(), but would not change the behaviour of existing code nor would require code changes (hopefully). Maybe we need two methods, one before process and one after, but you get the idea.

Processor will simply raises a NotImplementedError if these methods are called. This way we can easily control which Processors are streaming-capable or not, by simply implementing these methods.

Any thoughts?

@superbock
Copy link
Collaborator Author

The current state allows setting the initial state of recurrent layers (and cell for LSTMLayer), but does not deal with proper resetting yet.

As outlined here, it involves a lot of if/elif/else statements to get this right. Waiting for some feedback before proceeding further.

@superbock superbock changed the base branch from master to streaming_mode December 6, 2016 14:17
Sebastian Böck added 2 commits December 8, 2016 15:41
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters, to be consistend with all other layers
@superbock superbock mentioned this pull request Jan 18, 2017
@superbock
Copy link
Collaborator Author

Closing, since #243 is merged.

@superbock superbock closed this Jan 20, 2017
@superbock superbock deleted the save_rnn_prev_out_and_state branch February 3, 2017 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants