-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] save previous output and state of RNNs #235
Conversation
dafabcb
to
718f934
Compare
718f934
to
4ebd9ce
Compare
I see two issues here:
|
@@ -309,38 +316,46 @@ def activate(self, data): | |||
Activations for this data. | |||
|
|||
""" | |||
# init previous output | |||
if not hasattr(self, 'prev_output') or self.prev_output is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary to ensure backwards compatibility with older models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise we can not load the old models any more, since they don't have these attributes.
@@ -287,12 +291,15 @@ class LSTMLayer(Layer): | |||
""" | |||
|
|||
def __init__(self, input_gate, forget_gate, cell, output_gate, | |||
activation_fn=tanh): | |||
activation_fn=tanh, prev_output=None, prev_state=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these initialisations go to the Gate
class, as discussed in #233 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this as well, but they are technically not connected to the gates. The prev_output
is the previous output of the unit, prev_state
is the state, which is the cell weighted by the input gate plus the previous state weighted by the forget gate. Thus I thought it would be the easiest to add them to the layer directly.
Regarding the issues you mentioned:
I tried (and still want) to avoid having two different process methods, simply because this will introduce a lot of duplicated code. How about the following: We add a second
Any thoughts? |
4ebd9ce
to
e30d6fe
Compare
The current state allows setting the initial state of recurrent layers (and cell for LSTMLayer), but does not deal with proper resetting yet. As outlined here, it involves a lot of if/elif/else statements to get this right. Waiting for some feedback before proceeding further. |
e30d6fe
to
1e8cd39
Compare
1e8cd39
to
421fa8f
Compare
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers
e64fc55
to
5a2e813
Compare
Closing, since #243 is merged. |
This change enables recurrent neural networks to be initialised with an initial output/state (#230), to be streaming compatible (#185), and also fixes some GRU related parameter glitches (#234).
Unfortunately, this change breaks the API, since parameters are reordered and renamed.
Any thoughts?