The f-measure is ill-defined when there are no true positives or no positive predicitons #72

timokau · 2019-11-14T20:13:45Z

sklearn issues a warning during the tests:

sklearn.exceptions.UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no predicted labels.

This is because

some of the test samples generated in csrank/tests/test_choice_functions.py:trivial_choice_problem have no true positives
some of the learners predict no positives for some of the generated problems

In both of those cases the f-measure is not properly defined. sklearn assigns 0 and 1 respectively.

How should we deal with this? A metric should be defined for these possibilities. 0 and 1 in those cases seems somewhat reasonable, so maybe we should just silence the warning?

The text was updated successfully, but these errors were encountered:

kiudee · 2019-11-18T08:15:28Z

The first problem we should avoid by generating test samples, which cannot consist of only negatives. Assigning a 1 in these cases would be sensible in general.

Regarding the second case: Assigning 0 here is sensible, since the learner achieved no true positive.

Note: My version of sklearn (0.20.2) returns 0.0 for both cases.

timokau · 2019-11-18T18:14:44Z

You're right, sklearn returns 0.0 for both cases. The more I think about this the less sure I am that defining values for these cases is a good idea. The implementation is also non-straightforward, since we would have to do some of the work that we currently outsource to scipy.

Here are the tests I came up with:

    There are no true positives but some predicted positives; e.g. "infinite recall".
    >>> f1_measure([[False, False]], [[True, True]])
    0.0

    There are no predicted positives but some true positives; e.g. 0 recall, 0 precision.
    >>> f1_measure([[True, True]], [[False, False]])
    0.0

    There are neither true nor predicted positives, e.g. all predictions are correct:
    >>> f1_measure([[False, False]], [[False, False]])
    1.0

(2) and (3) seem pretty clear cut, but (1) should really depend on how many labels were predicted positive. Should we sidestep the issue by just defining cases (2) and (3) and continuing to throw a warning in (1)?

kiudee · 2019-11-22T14:30:36Z

From those three cases (2) is an obvious 0.0.
For (3) the value 1.0 is sensible, but I would still throw a warning, since having no positives in an instance might hint at a problem in the dataset.
Similarly, I would return 0.0 for (1) and raise a warning.

timokau mentioned this issue Nov 14, 2019

Ignore some warnings #75

Merged

7 tasks

kiudee added enhancement New feature or request Priority: Medium labels Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The f-measure is ill-defined when there are no true positives or no positive predicitons #72

The f-measure is ill-defined when there are no true positives or no positive predicitons #72

timokau commented Nov 14, 2019

kiudee commented Nov 18, 2019

timokau commented Nov 18, 2019

kiudee commented Nov 22, 2019

The f-measure is ill-defined when there are no true positives or no positive predicitons #72

The f-measure is ill-defined when there are no true positives or no positive predicitons #72

Comments

timokau commented Nov 14, 2019

kiudee commented Nov 18, 2019

timokau commented Nov 18, 2019

kiudee commented Nov 22, 2019