Initial implementation of GRU layers #48

ptigwe · 2021-11-27T15:20:10Z

Currently this shows a working implementation of GRU layer which follows quite closely the Keras API, using the flax.linen.GRUCell as its backbone.

Tackling the implementation of GRU (#40)

Following the implementation of BatchNorm, use the underlying Flax implementation for the logic behind the forward pass. This also adds the use of `jax.lax.scan` for performing the sequential calls on the hidden state, making this the implementation of the GRU layer as opposed to a reimplementation of the `GRUCell`.

- `return_state`: whether the final state should be returned - `return_sequences`: whether all the intermediate states should be returned - `go_backwards`: whether the input should be run in reverse order

Swaps the order of the state and input variables as well as the return order of the final state and sequence of states to be more inline with that of Keras API.

This allows for optional passing an initial state and the specifying of a function which can intialize the initial state.

Adds the `stateful` flag which allows for the last state of the GRU to be used as the start state for the next batch.

cgarciae · 2021-12-01T21:36:37Z

Thanks @ptigwe for this! Sorry it took so long to respond, I'd seen it but hadn't had the time to review it. Overall it looks very good, I'll just leave a couple of comments.

treex/nn/recurrent.py

cgarciae · 2021-12-02T14:57:54Z

As discussed offline, we feel here we can lift Keras restrictions regarding the shape of the input by removing the time_major argument in favor of something like a time_axis which lets you select the dimension you are going to scan over. This changes requires a couple of changes to the logic including figuring out the proper batch_dims to initialize_carry.

In preparation for the updates to `flax.jax_utils.scan_in_dim`, this changes `time_major` to `time_axis`. Currently allowing for only the specification of a single time dimension via `type hinting` although this is currently not enforced in runtime. But underneath, it stores this as a tuple which would allow for its use in `scan_in_dim`.

cgarciae · 2022-01-10T17:19:26Z

Hey @ptigwe!
There are a few minor comments / changes left if you'd like to finish them else I'd be very happy to continue with the PR.

ptigwe · 2022-01-11T11:00:56Z

@cgarciae, I believe I have fixed most of the comments which you mentioned before, including the time_axis as discussed. The only thing I didn't update was the switch from jax.lax.scan to jax_utils.scan_in_dim, as I was waiting for the change in flax to get merged and the upstream version updated.

Please let me know if there is anything else which I missed and I would make the changes ASAP.

cgarciae · 2022-01-11T13:05:46Z

Thanks @ptigwe ! There is a small comment about changing the default time_axis from 0 to -2.

ptigwe · 2022-01-12T11:30:30Z

@cgarciae, I guess I must have missed that one. I've added it now to the PR. Quick clarification as I'm fixing the errors, by having time_axis = -2 this means the default expected input shape would be [..., T, C] as opposed to the current default of 0 / -3 with [..., T, B, C].

Changes the default `time_axis` to be -2, i.e. by default the expected shape of the input should be of the form [..., time, :, :].

Swaps `jax.lax.scan` for `flax.jax_utils.scan_in_dim` which allows for one to have multiple time dimensions by specifying `time_axis` to be a tuple instead of a single int value.

This reverts commit 4a8f15a.

cgarciae · 2022-01-12T14:39:18Z

Keras uses [B, T, C], we are going for [..., B, T, C] which is even better. I think this is nicer than [...,T, B, C] because 1D convolutions and transformers also use [..., B, T, C] so the op would chain better.

ptigwe · 2022-01-12T20:57:00Z

OK cool. It has already been set in the previous commit. Also decided not to include the change to jax_utils.scan_in_dim as the current version being pointed to 0.3.6 does not have the unrolled added to it yet.

cgarciae · 2022-01-13T19:11:34Z

If you want to update flax to the latest version to add this feature now you can run:

poetry add flax@latest

ptigwe · 2022-01-14T11:34:01Z

Seems the latest is indeed 0.3.6 which indeed does not have the updated scan_in_dims. Whenever it gets updated we can always update the code to make use of it. I've also included some comments on things that might need changing in that case.

cgarciae · 2022-01-14T15:15:56Z

@ptigwe Sounds good. I'll merge this for now, we can create a new PR with scan_in_dims latter.

Thanks a lot for pushing this through!

ptigwe added 6 commits November 27, 2021 07:25

Add key functionality to GRU implementation

61e6202

- `return_state`: whether the final state should be returned - `return_sequences`: whether all the intermediate states should be returned - `go_backwards`: whether the input should be run in reverse order

Swap order of state and input variables

90de9aa

Swaps the order of the state and input variables as well as the return order of the final state and sequence of states to be more inline with that of Keras API.

Allow for optional passing of initial state

2ce7f63

This allows for optional passing an initial state and the specifying of a function which can intialize the initial state.

Add stateful flag

d2cb6d1

Adds the `stateful` flag which allows for the last state of the GRU to be used as the start state for the next batch.

Adds documentation to GRU

6e94296

cgarciae reviewed Dec 1, 2021

View reviewed changes

treex/nn/recurrent.py Outdated Show resolved Hide resolved

cgarciae reviewed Dec 1, 2021

View reviewed changes

treex/nn/recurrent.py Outdated Show resolved Hide resolved

cgarciae reviewed Dec 2, 2021

View reviewed changes

treex/nn/recurrent.py Outdated Show resolved Hide resolved

ptigwe added 5 commits January 12, 2022 20:30

Sets default time_axis=-2

e2a3f94

Changes the default `time_axis` to be -2, i.e. by default the expected shape of the input should be of the form [..., time, :, :].

Swap jax.lax.scan for jax_utils.scan_in_dim

4a8f15a

Swaps `jax.lax.scan` for `flax.jax_utils.scan_in_dim` which allows for one to have multiple time dimensions by specifying `time_axis` to be a tuple instead of a single int value.

Revert "Swap jax.lax.scan for jax_utils.scan_in_dim"

3b10789

This reverts commit 4a8f15a.

Fix testcases due to different default timeaxis

00cd386

Fix black formatting issues

eead3ae

cgarciae merged commit fce1175 into cgarciae:master Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of GRU layers #48

Initial implementation of GRU layers #48

ptigwe commented Nov 27, 2021

cgarciae commented Dec 1, 2021 •

edited

Loading

cgarciae commented Dec 2, 2021

cgarciae commented Jan 10, 2022

ptigwe commented Jan 11, 2022

cgarciae commented Jan 11, 2022

ptigwe commented Jan 12, 2022 •

edited

Loading

cgarciae commented Jan 12, 2022

ptigwe commented Jan 12, 2022 •

edited

Loading

cgarciae commented Jan 13, 2022

ptigwe commented Jan 14, 2022

cgarciae commented Jan 14, 2022

Initial implementation of GRU layers #48

Initial implementation of GRU layers #48

Conversation

ptigwe commented Nov 27, 2021

cgarciae commented Dec 1, 2021 • edited Loading

cgarciae commented Dec 2, 2021

cgarciae commented Jan 10, 2022

ptigwe commented Jan 11, 2022

cgarciae commented Jan 11, 2022

ptigwe commented Jan 12, 2022 • edited Loading

cgarciae commented Jan 12, 2022

ptigwe commented Jan 12, 2022 • edited Loading

cgarciae commented Jan 13, 2022

ptigwe commented Jan 14, 2022

cgarciae commented Jan 14, 2022

cgarciae commented Dec 1, 2021 •

edited

Loading

ptigwe commented Jan 12, 2022 •

edited

Loading

ptigwe commented Jan 12, 2022 •

edited

Loading