Lightning-AI · SkafteNicki · Dec 30, 2020 · Nov 24, 2020 · Nov 24, 2020 · Nov 24, 2020
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -15,11 +15,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 - `HammingDistance` metric to compute the hamming distance (loss) ([#4838](https://github.com/PyTorchLightning/pytorch-lightning/pull/4838))
 
+- `StatScores` metric to compute the number of true positives, false positives, true negatives and false negatives ([#4839](https://github.com/PyTorchLightning/pytorch-lightning/pull/4839))
+
 ### Changed
 
+- `stat_scores` metric now calculates stat scores over all classes and gains new parameters, in line with the new `StatScores` metric ([#4839](https://github.com/PyTorchLightning/pytorch-lightning/pull/4839))
 
 ### Deprecated
 
+- `stat_scores_multiple_classes` is deprecated in favor of `stat_scores` ([#4839](https://github.com/PyTorchLightning/pytorch-lightning/pull/4839))
 
 ### Removed
 

diff --git a/docs/source/metrics.rst b/docs/source/metrics.rst
@@ -251,13 +251,62 @@ the possible class labels are 0, 1, 2, 3, etc. Below are some examples of differ
     ml_preds  = torch.tensor([[0.2, 0.8, 0.9], [0.5, 0.6, 0.1], [0.3, 0.1, 0.1]])
     ml_target = torch.tensor([[0, 1, 1], [1, 0, 0], [0, 0, 0]])
 
-In some rare cases, you might have inputs which appear to be (multi-dimensional) multi-class
-but are actually binary/multi-label. For example, if both predictions and targets are 1d
-binary tensors. Or it could be the other way around, you want to treat binary/multi-label
-inputs as 2-class (multi-dimensional) multi-class inputs.
+
+Using the ``is_multiclass`` parameter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In some cases, you might have inputs which appear to be (multi-dimensional) multi-class
+but are actually binary/multi-label - for example, if both predictions and targets are
+integer (binary) tensors. Or it could be the other way around, you want to treat 
+binary/multi-label inputs as 2-class (multi-dimensional) multi-class inputs.
 
 For these cases, the metrics where this distinction would make a difference, expose the
-``is_multiclass`` argument.
+``is_multiclass`` argument. Let's see how this is used on the example of 
+:class:`~pytorch_lightning.metrics.classification.StatScores` metric.
+
+First, let's consider the case with label predictions with 2 classes, which we want to
+treat as binary.
+
+.. testcode::
+
+   from pytorch_lightning.metrics.functional import stat_scores
+
+   # These inputs are supposed to be binary, but appear as multi-class
+   preds  = torch.tensor([0, 1, 0])
+   target = torch.tensor([1, 1, 0])
+
+As you can see below, by default the inputs are treated
+as multi-class. We can set ``is_multiclass=False`` to treat the inputs as binary - 
+which is the same as converting the predictions to float beforehand.
+
+.. doctest::
+
+    >>> stat_scores(preds, target, reduce='macro', num_classes=2)
+    tensor([[1, 1, 1, 0, 1],
+            [1, 0, 1, 1, 2]])
+    >>> stat_scores(preds, target, reduce='macro', num_classes=1, is_multiclass=False)
+    tensor([[1, 0, 1, 1, 2]])
+    >>> stat_scores(preds.float(), target, reduce='macro', num_classes=1)
+    tensor([[1, 0, 1, 1, 2]])
+
+Next, consider the opposite example: inputs are binary (as predictions are probabilities),
+but we would like to treat them as 2-class multi-class, to obtain the metric for both classes.
+
+.. testcode::
+
+   preds  = torch.tensor([0.2, 0.7, 0.3])
+   target = torch.tensor([1, 1, 0])
+
+In this case we can set ``is_multiclass=True``, to treat the inputs as multi-class.
+
+.. doctest::
+
+    >>> stat_scores(preds, target, reduce='macro', num_classes=1)
+    tensor([[1, 0, 1, 1, 2]])
+    >>> stat_scores(preds, target, reduce='macro', num_classes=2, is_multiclass=True)
+    tensor([[1, 1, 1, 0, 1],
+            [1, 0, 1, 1, 2]])
+
 
 Class Metrics (Classification)
 ------------------------------
@@ -323,6 +372,13 @@ ROC
     :noindex:
 
 
+StatScores
+~~~~~~~~~~
+
+.. autoclass:: pytorch_lightning.metrics.classification.StatScores
+    :noindex:
+
+
 Functional Metrics (Classification)
 -----------------------------------
 
@@ -444,7 +500,7 @@ select_topk [func]
 stat_scores [func]
 ~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: pytorch_lightning.metrics.functional.classification.stat_scores
+.. autofunction:: pytorch_lightning.metrics.functional.stat_scores
     :noindex:
 
 

diff --git a/pytorch_lightning/metrics/__init__.py b/pytorch_lightning/metrics/__init__.py
@@ -24,6 +24,7 @@
     ROC,
     FBeta,
     F1,
+    StatScores
 )
 
 from pytorch_lightning.metrics.regression import (  # noqa: F401

diff --git a/pytorch_lightning/metrics/classification/__init__.py b/pytorch_lightning/metrics/classification/__init__.py
@@ -19,3 +19,4 @@
 from pytorch_lightning.metrics.classification.precision_recall import Precision, Recall  # noqa: F401
 from pytorch_lightning.metrics.classification.precision_recall_curve import PrecisionRecallCurve  # noqa: F401
 from pytorch_lightning.metrics.classification.roc import ROC  # noqa: F401
+from pytorch_lightning.metrics.classification.stat_scores import StatScores  # noqa: F401
diff --git a/pytorch_lightning/metrics/classification/accuracy.py b/pytorch_lightning/metrics/classification/accuracy.py
@@ -21,7 +21,7 @@
 
 class Accuracy(Metric):
     r"""
-    Computes `Accuracy <https://en.wikipedia.org/wiki/Accuracy_and_precision>`_:
+    Computes `Accuracy <https://en.wikipedia.org/wiki/Accuracy_and_precision>`__:
 
     .. math::
         \text{Accuracy} = \frac{1}{N}\sum_i^N 1(y_i = \hat{y}_i)
@@ -43,7 +43,8 @@ class Accuracy(Metric):
     Args:
         threshold:
             Threshold probability value for transforming probability predictions to binary
-            `(0,1)` predictions, in the case of binary or multi-label inputs.
+            (0,1) predictions, in the case of binary or multi-label inputs. If not set it
+            defaults to 0.5.
         top_k:
             Number of highest probability predictions considered to find the correct label, relevant
             only for (multi-dimensional) multi-class inputs with probability predictions. The
@@ -54,17 +55,18 @@ class Accuracy(Metric):
             Whether to compute subset accuracy for multi-label and multi-dimensional
             multi-class inputs (has no effect for other input types).
 
-            For multi-label inputs, if the parameter is set to `True`, then all labels for
-            each sample must be correctly predicted for the sample to count as correct. If it
-            is set to `False`, then all labels are counted separately - this is equivalent to
-            flattening inputs beforehand (i.e. ``preds = preds.flatten()`` and same for ``target``).
-
-            For multi-dimensional multi-class inputs, if the parameter is set to `True`, then all
-            sub-sample (on the extra axis) must be correct for the sample to be counted as correct.
-            If it is set to `False`, then all sub-samples are counter separately - this is equivalent,
-            in the case of label predictions, to flattening the inputs beforehand (i.e.
-            ``preds = preds.flatten()`` and same for ``target``). Note that the ``top_k`` parameter
-            still applies in both cases, if set.
+            - For multi-label inputs, if the parameter is set to ``True``, then all labels for
+              each sample must be correctly predicted for the sample to count as correct. If it
+              is set to ``False``, then all labels are counted separately - this is equivalent to
+              flattening inputs beforehand (i.e. ``preds = preds.flatten()`` and same for ``target``).
+
+            - For multi-dimensional multi-class inputs, if the parameter is set to ``True``, then all
+              sub-sample (on the extra axis) must be correct for the sample to be counted as correct.
+              If it is set to ``False``, then all sub-samples are counter separately - this is equivalent,
+              in the case of label predictions, to flattening the inputs beforehand (i.e.
+              ``preds = preds.flatten()`` and same for ``target``). Note that the ``top_k`` parameter
+              still applies in both cases, if set.
+
         compute_on_step:
             Forward only calls ``update()`` and return None if this is set to False.
         dist_sync_on_step:
@@ -95,7 +97,7 @@ class Accuracy(Metric):
 
     def __init__(
         self,
-        threshold: float = 0.5,
+        threshold: Optional[float] = None,
         top_k: Optional[int] = None,
         subset_accuracy: bool = False,
         compute_on_step: bool = True,
@@ -113,11 +115,11 @@ def __init__(
         self.add_state("correct", default=torch.tensor(0), dist_reduce_fx="sum")
         self.add_state("total", default=torch.tensor(0), dist_reduce_fx="sum")
 
-        if not 0 <= threshold <= 1:
-            raise ValueError("The `threshold` should lie in the [0,1] interval.")
+        if threshold is not None and not 0 < threshold < 1:
+            raise ValueError(f"The `threshold` should be a float in the (0,1) interval, got {threshold}")
 
-        if top_k is not None and top_k <= 0:
-            raise ValueError("The `top_k` should be an integer larger than 1.")
+        if top_k is not None and (not isinstance(top_k, int) or top_k <= 0):
+            raise ValueError(f"The `top_k` should be an integer larger than 0, got {top_k}")
 
         self.threshold = threshold
         self.top_k = top_k

diff --git a/pytorch_lightning/metrics/classification/helpers.py b/pytorch_lightning/metrics/classification/helpers.py
@@ -39,8 +39,8 @@ def _basic_input_validation(preds: torch.Tensor, target: torch.Tensor, threshold
     if preds_float and (preds.min() < 0 or preds.max() > 1):
         raise ValueError("The `preds` should be probabilities, but values were detected outside of [0,1] range.")
 
-    if threshold > 1 or threshold < 0:
-        raise ValueError("The `threshold` should be a probability in [0,1].")
+    if not 0 < threshold < 1:
+        raise ValueError(f"The `threshold` should be a float in the (0,1) interval, got {threshold}")
 
     if is_multiclass is False and target.max() > 1:
         raise ValueError("If you set `is_multiclass=False`, then `target` should not exceed 1.")
@@ -181,13 +181,19 @@ def _check_num_classes_ml(num_classes: int, is_multiclass: bool, implied_classes
 
 
 def _check_top_k(top_k: int, case: str, implied_classes: int, is_multiclass: Optional[bool], preds_float: bool):
-    if "multi-class" not in case or not preds_float:
-        raise ValueError(
-            "You have set `top_k` above 1, but your data is not (multi-dimensional) multi-class"
-            " with probability predictions."
-        )
+    if case == "binary":
+        raise ValueError("You can not use `top_k` parameter with binary data.")
+    if not isinstance(top_k, int) or top_k <= 0:
+        raise ValueError("The `top_k` has to be an integer larger than 0.")
+    if not preds_float:
+        raise ValueError("You have set `top_k`, but you do not have probability predictions.")
     if is_multiclass is False:
         raise ValueError("If you set `is_multiclass=False`, you can not set `top_k`.")
+    if case == "multi-label" and is_multiclass:
+        raise ValueError(
+            "If you want to transform multi-label data to 2 class multi-dimensional"
+            "multi-class data using `is_multiclass=True`, you can not use `top_k`."
+        )
     if top_k >= implied_classes:
         raise ValueError("The `top_k` has to be strictly smaller than the `C` dimension of `preds`.")
 
@@ -216,9 +222,9 @@ def _check_classification_inputs(
     When ``num_classes`` is not specified in these cases, consistency of the highest target
     value against ``C`` dimension is checked for (multi-dimensional) multi-class cases.
 
-    If ``top_k`` is set (not None) for inputs which are not (multi-dimensional) multi class
-    with probabilities, then an error is raised. Similarly if ``top_k`` is set to a number
-    that is higher than or equal to the ``C`` dimension of ``preds``.
+    If ``top_k`` is set (not None) for inputs which do not have probability predictions (and
+    are not binary), then an error is raised. Similarly if ``top_k`` is set to a number that
+    is higher than or equal to the ``C`` dimension of ``preds``.
 
     Preds and target tensors are expected to be squeezed already - all dimensions should be
     greater than 1, except perhaps the first one (N).
@@ -228,17 +234,18 @@ def _check_classification_inputs(
         target: Tensor with ground truth labels, always integers (labels)
         threshold:
             Threshold probability value for transforming probability predictions to binary
-            (0,1) predictions, in the case of binary or multi-label inputs. Default: 0.5
+            (0,1) predictions, in the case of binary or multi-label inputs.
         num_classes:
             Number of classes. If not explicitly set, the number of classes will be infered
             either from the shape of inputs, or the maximum label in the ``target`` and ``preds``
             tensor, where applicable.
         top_k:
             Number of highest probability entries for each sample to convert to 1s - relevant
-            only for (multi-dimensional) multi-class inputs with probability predictions. The
-            default value (``None``) will be interepreted as 1 for these inputs.
+            only for inputs with probability predictions. The default value (``None``) will be
+            interepreted as 1 for these inputs. If this parameter is set for multi-label inputs,
+            it will take precedence over threshold.
 
-            Should be left unset (``None``) for all other types of inputs.
+            Should be left unset (``None``) for inputs with label predictions.
         is_multiclass:
             Used only in certain special cases, where you want to treat inputs as a different type
             than what they appear to be (see :ref:`metrics: Input types` documentation section for
@@ -294,7 +301,7 @@ def _check_classification_inputs(
             _check_num_classes_ml(num_classes, is_multiclass, implied_classes)
 
     # Check that top_k is consistent
-    if top_k:
+    if top_k is not None:
         _check_top_k(top_k, case, implied_classes, is_multiclass, preds.is_floating_point())
 
     return case
@@ -364,7 +371,8 @@ def _input_format_classification(
         target: Tensor with ground truth labels, always integers (labels)
         threshold:
             Threshold probability value for transforming probability predictions to binary
-            (0,1) predictions, in the case of binary or multi-label inputs. Default: 0.5
+            (0,1) predictions, in the case of binary or multi-label inputs. If not set it
+            defaults to 0.5.
         num_classes:
             Number of classes. If not explicitly set, the number of classes will be infered
             either from the shape of inputs, or the maximum label in the ``target`` and ``preds``
@@ -410,6 +418,9 @@ def _input_format_classification(
     if preds.dtype == torch.float16:
         preds = preds.float()
 
+    # Let threshold default to 0.5 if not set
+    threshold = 0.5 if threshold is None else threshold
+
     case = _check_classification_inputs(
         preds,
         target,
@@ -419,21 +430,22 @@ def _input_format_classification(
         top_k=top_k,
     )
 
-    top_k = top_k if top_k else 1
-
-    if case in ["binary", "multi-label"]:
+    if case in ["binary", "multi-label"] and not top_k:
         preds = (preds >= threshold).int()
         num_classes = num_classes if not is_multiclass else 2
 
+    if case == "multi-label" and top_k:
+        preds = select_topk(preds, top_k)
+
     if "multi-class" in case or is_multiclass:
         if preds.is_floating_point():
             num_classes = preds.shape[1]
-            preds = select_topk(preds, top_k)
+            preds = select_topk(preds, top_k or 1)
         else:
             num_classes = num_classes if num_classes else max(preds.max(), target.max()) + 1
-            preds = to_onehot(preds, max(2,num_classes))
+            preds = to_onehot(preds, max(2, num_classes))
 
-        target = to_onehot(target, max(2,num_classes))
+        target = to_onehot(target, max(2, num_classes))
 
         if is_multiclass is False:
             preds, target = preds[:, 1, ...], target[:, 1, ...]