Skip to content

Commit

Permalink
fix misbehave of KLDivLoss (apache#18423)
Browse files Browse the repository at this point in the history
* fix misbehave of KLDivLoss

In the current version of KLDivLoss, the return value is not the same value calculated by SoftmaxCrossEntropyLoss, which is not documented. It may due to the incorrect settings which using mean rather than sum dealing with the return value.
I provide a fix of this setting, which will keep the return value of `KLDivLoss` and SoftmaxCrossEntropyLoss` almost the same when `from_logits=False` and `sparse_label=False` are set to these functions seperately.
Now, the behave of KLDivLoss is exactly the same to what the document say.
```
import mxnet as mx
a=mx.nd.array([[-1,1],[1,-1]])
b=mx.nd.array([1,0]).one_hot(2)
TrueLoss=mx.gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)
FalseLoss=mx.gluon.loss.KLDivLoss(from_logits=False)
c=TrueLoss(a,b)
d=FalseLoss(a,b)*a.shape[-1]
assert((c-d).abs().sum()==0 and a.shape[-1]==2)
```

* update sdml loss

the current version of SDMLLoss told us to `multiply for the number of labels` but actually it `multiply batch_size`. After this PR, it is no need to `multiply batch_size` or `multiply the number of labels` any more.

* remove outdated comment
  • Loading branch information
Neutron3529 authored and bgawrych committed Jun 23, 2020
1 parent 36941e4 commit e1e4e2b
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions python/mxnet/gluon/loss.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ def hybrid_forward(self, F, pred, label, sample_weight=None):
pred = F.log_softmax(pred, self._axis)
loss = label * (F.log(label + 1e-12) - pred)
loss = _apply_weighting(F, loss, self._weight, sample_weight)
return F.mean(loss, axis=self._batch_axis, exclude=True)
return F.sum(loss, axis=self._batch_axis, exclude=True)


class CTCLoss(Loss):
Expand Down Expand Up @@ -1010,8 +1010,7 @@ def _compute_labels(self, F, batch_size):
confident output distributions." arXiv preprint arXiv:1701.06548 (2017).
"""

# TODO: replace with mx.nd.eye(batch_size) with mxnet 1.2
gold = F.one_hot(F.arange(batch_size), batch_size)
gold = F.eye(batch_size)
labels = gold * (1 - self.smoothing_parameter) + (1 - gold) * self.smoothing_parameter / (batch_size - 1)
return labels

Expand Down Expand Up @@ -1039,7 +1038,9 @@ def _loss(self, F, x1, x2):
distances = self._compute_distances(x1, x2)
log_probabilities = F.log_softmax(-distances, axis=1)
# multiply for the number of labels to obtain the correct loss (gluon kl_loss averages instead of sum)
return self.kl_loss(log_probabilities, labels.as_in_context(distances.context)) * batch_size
# PR#18423:multiply for the number of labels should multiply x1.shape[1] rather than x1.shape[0])
# After PR#18423, it is no need to multiply it anymore.
return self.kl_loss(log_probabilities, labels.as_in_context(distances.context))


def hybrid_forward(self, F, x1, x2):
Expand Down

0 comments on commit e1e4e2b

Please sign in to comment.