Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification metrics overhaul: stat scores (3/n) #4839

Merged
merged 175 commits into from
Dec 30, 2020
Merged

Classification metrics overhaul: stat scores (3/n) #4839

merged 175 commits into from
Dec 30, 2020

Conversation

tadejsv
Copy link
Contributor

@tadejsv tadejsv commented Nov 24, 2020

This PR is a spin-off from #4835, based on new input formatting from #4837

This will provide a basis for future PRs for recall, precision, fbeta and iou metrics.

What does this PR do?

top_k parameter for input formatting now also works with multi-label inputs

This was done so that StatScores can also provide a basis for Recall@K and Precision@K later - because these two metrics always take multi-label inputs, and count the top K highest probability predictions as True. For multi-class inputs this parameter works as before.

This addition was done in the input formatting function. This means that multi-label inputs can now be binarized in two ways: through the threshold parameter, or through the top_k parameter. I have decided to give the top_k parameter preference if both are set.

For Top-K Accuracy multi-label inputs don't make sense (or at least I have not seen any use of it), so I have updated the Accuracy metric so that an error is raised if top_k is used with multi-label inputs.

New StatScores metric (and updated functional counterpart)

Computes stat score, i.e. true positives, false positives, true negatives, false negatives. It is used as a base for many other metrics (recall, precision, fbeta, iou). It is made to work with all types of inputs, and is very configurable. There are two main parameters here:

  • reduce: This determines how should the statistics be counted: globally (summing across all labels), by calsses, or by samples. The possible values (micro, macro, samples), correspond to averaging names for metrics such as precision. This is "inspired" by sklearn's averaging argument in such metrics.

  • mdmc_reduce: In case of multi-dimensional multi-class (mdmc) inputs, how should the statistics be reduced? This is on top of the reduce argument. The possible values are global (i.e. extra dimensions are actually sample dimensions) and samplewise (compute statistics for each sample, taking the extra dimensions as a sample-within-sample dimension).

    Why? The reason for these two options (right now PL metrics implements the global option by default) is that in some "downstream" metrics, such as iou, it is, in my opinion, much more natural to compute the metric per sample, and then average accross samples, rather than join everyhing into one "blob", and compute the averages for this blob. For example, if you are doing image segmentation, it makes more sense to compute the metrics per image, as the model is trained on images, and not blobs :) Also, aggregation of everything may disguise some unwanted behavior (such as inability to predict a minority class), which would be evident if averaging was done per sample (samplewise).

Also, this class metric (and the functional equivalent) now return the stat scores concatenated in a single tensor, instead of returning a tuple. I did this because the standard metrics testing framework in PL does not support non-tensor returns - and the change should be minor for the users.

I have deprecated the stat_scores_multiple_classes metric, as stat_scores is now perfectly capable of handling multiple classes itself.

Documentation

Second part of "Input types" section with examples of the use of is_multiclass parameter with StatScores is added.

@pep8speaks
Copy link

pep8speaks commented Nov 24, 2020

Hello @tadejsv! Thanks for updating this PR.

Line 23:63: E203 whitespace before ':'

Comment last updated at 2020-12-30 18:58:06 UTC

@tadejsv tadejsv changed the title Classification metrics overhaul: stat scores (2b/n) Classification metrics overhaul: stat scores (3/n) Nov 24, 2020
@Borda
Copy link
Member

Borda commented Dec 29, 2020

@tadejsv @justusschock @SkafteNicki how is it going here? :]

@tadejsv
Copy link
Contributor Author

tadejsv commented Dec 29, 2020

@Borda @SkafteNicki @justusschock @teddykoker @rohitgr7 This is ready for (re)review :)

Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like it!

@Borda Borda added the ready PRs ready to be merged label Dec 29, 2020
Borda
Borda previously approved these changes Dec 29, 2020
pytorch_lightning/metrics/classification/accuracy.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/classification/helpers.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/classification/helpers.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/classification/stat_scores.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/classification/stat_scores.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/functional/accuracy.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reading...

pytorch_lightning/metrics/classification/helpers.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/functional/classification.py Outdated Show resolved Hide resolved
tests/deprecated_api/test_remove_1-3.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM... Great work!!!!
I'd recommend waiting for other reviewers before merging.

tests/deprecated_api/test_remove_1-4.py Show resolved Hide resolved
@Borda Borda dismissed their stale review December 29, 2020 20:02

some defaults

Copy link
Member

@SkafteNicki SkafteNicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job as always :]

tests/metrics/classification/test_stat_scores.py Outdated Show resolved Hide resolved
@SkafteNicki SkafteNicki merged commit 7f71ee9 into Lightning-AI:release/1.2-dev Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants