You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use BiSequencerLM to train a network using sequences of different length, but I'm finding an issue in the gradInput of the BiSequencerLM module. When a sequence is shorter than the previous sequence, the number of elements in self.gradInput in the function below is the number of elements of the previous sequence, not the current sequence.
function BiSequencerLM:updateGradInput(input, gradOutput)
local nStep = #input
self._mergeGradInput = self._merge:updateGradInput(self._mergeInput, gradOutput)
self._fwdGradInput = self._fwd:updateGradInput(_.first(input, nStep - 1), _.last(self._mergeGradInput[1], nStep - 1))
self._bwdGradInput = self._bwd:updateGradInput(_.last(input, nStep - 1), _.first(self._mergeGradInput[2], nStep - 1))
-- add fwd rnn gradInputs to bwd rnn gradInputs
for i=1,nStep do
if i == 1 then
self.gradInput[1] = self._fwdGradInput[1]
elseif i == nStep then
self.gradInput[nStep] = self._bwdGradInput[nStep-1]
else
self.gradInput[i] = nn.rnn.recursiveCopy(self.gradInput[i], self._fwdGradInput[i])
nn.rnn.recursiveAdd(self.gradInput[i], self._bwdGradInput[i-1])
end
end
return self.gradInput
end
I believe this is caused by self.gradInput not being recreated for the new sequence, and hence maintaining the length of the previous sequence. This causes an error when you have further modules down to backpropagate, because their gradOutput is going to have incorrect size (different to their input). This issue can be fixed by setting gradInput to an empty table before the call to updateGradInput. I can do this by accessing the module form my code, but maybe it would be better to just add this line of code
self.gradInput={}
before the for loop (something equivalent would have to be done if working with Tensors instead of Tables).
Or maybe I'm wrong and this is expected behavior and I'm doing something wrong, in which case any advice is appreciated. Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I'm facing same issue when training a language model (sort of). Please suggest me how do i get this issue resolved. Additionally, i'm using optim package for optimization.
Hi
I'm trying to use BiSequencerLM to train a network using sequences of different length, but I'm finding an issue in the gradInput of the BiSequencerLM module. When a sequence is shorter than the previous sequence, the number of elements in self.gradInput in the function below is the number of elements of the previous sequence, not the current sequence.
I believe this is caused by self.gradInput not being recreated for the new sequence, and hence maintaining the length of the previous sequence. This causes an error when you have further modules down to backpropagate, because their gradOutput is going to have incorrect size (different to their input). This issue can be fixed by setting gradInput to an empty table before the call to updateGradInput. I can do this by accessing the module form my code, but maybe it would be better to just add this line of code
self.gradInput={}
before the for loop (something equivalent would have to be done if working with Tensors instead of Tables).
Or maybe I'm wrong and this is expected behavior and I'm doing something wrong, in which case any advice is appreciated. Thanks!
The text was updated successfully, but these errors were encountered: