Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

NaN in gradient on A matrix #16

Open
deepaksuresh opened this issue Mar 13, 2019 · 1 comment
Open

NaN in gradient on A matrix #16

deepaksuresh opened this issue Mar 13, 2019 · 1 comment

Comments

@deepaksuresh
Copy link

For a model with adjacent weight tying, as in section 2.2.1, the gradient goes to NaN after a while.
The model is designed to work in bAbI (1k dataset). I tried lowering the learning rate to 1e-5 from 1e-2, that didn't help.
The parameters are initialized according to section 4.2 of the paper. The weights A,C,T_A(temporal encoding), T_C, are initialized from a gaussian with mean=0 and std=0.1. Number of hops are set to 3. Maximum gradient norm is set to 40. Batch size is 32, and embedding dimension is 40.
During training, gradients of A and T_A becomes NaN after about 10 epochs. This doesn't happen for C and T_C. The learning rate anneals at rate of 0.5 after every 15 epochs.

  1. What can I try to address the NaN in gradients of A and T_A? These weights are used only during the first hop.

On some tasks, we observed a large variance in the performance of our model (i.e. sometimes failing badly, other times not, depending on the initialization). To remedy this, we repeated each training 10 times with different random initializations, and picked the one with the lowest training error.

What were the other initializations that worked for you?

@tesatory
Copy link
Contributor

Is this happening in your implementation? If so, I can't really help. Getting NaN us pretty common, and usually there is a bug somewhere. Have you tried clipping your gradients? That usually helps. About 2, we are talking about the variance in performance, not a numerical instability like NaN. I don't think we got NaN in our implementation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants