Correctly using IoU and ConfusionMatrix #5803

remisphere · 2020-07-29T17:00:23Z

remisphere
Jul 29, 2020

❓ Questions and Help

Before asking:

search the issues.
- related: Issues with Confusion Matrix normalization and DDP computation #2724
search the docs.
- related: confusion_matrix, iou

What is your question?

Hello,
I've been trying to use the IoU and ConfusionMatrix metrics for semantic segmentation, but I can't wrap my head around their implementation in PL and their intended usage.
They seem to assume that every class is present in at least the prediction or the target [1, 2, 3] (actually it looks for the max class index), which is a rather strange expectation to me.
With this assumption, they have variable return sizes, depending on what classes are missing in the batch (this was noticed by #2724).
IoU has a num_classes argument, but it is only used to throw warnings if the above expectation is not met.
The docs give very basic examples that are not in the context of a training loop and are thus outside the scope of computing the metrics over several batches.

How then do I get the IoUs (or confusion matrix) on my dataset, since it's not possible to average them as they don't have the same shape?

What have you tried?

For IoU, using the default reduction='elementwise_mean' prevent crashing, but I then get the mean IoU over the classes, and that is not what I want.

What's your environment?

OS: Linux
Packaging: pip
Version: 0.9.0rc3

justusschock · 2020-07-30T08:35:06Z

justusschock
Jul 30, 2020
Maintainer

Hi @remisphere,
The idea here is to compute them on your whole dataset (e.g in validation_epoch_end). Therefore you currently have to collect your results by returning them in validation_step as part of the dict.

basically it would be something like this:

def validation_step(batch, batch_idx):
    pred = self(batch[0])
    return {'pred': pred, 'target': batch[1]}

def validation_epoch_end(self, outputs):
    preds = torch.cat([tmp['pred'] for tmp in outputs])
    targets = torch.cat([tmp['target'] for tmp in outputs])
    metric_val = my_metric(preds, targets)

We are currently working on a V2 for metrics, which will include some aggregation mechanisms, but I'm afraid, it will still take some time for us to finish that

0 replies

remisphere · 2020-07-30T09:05:46Z

remisphere
Jul 30, 2020
Author

Thank you @justusschock, that makes sense.

I have indeed seen some example where the full prediction is returned in the validation step, but I'm concerned it will eat up my memory if the validation data-set is a bit large, not talking about tracking the metric for the training data-set (unless it is stored to disk until the epoch's end ?).

I'm looking forward to the V2 then, happy to see PL flourishing !

P.S.:
I come from the PyTorch Ignite framework, where the IoU is computed from the confusion matrix, which has a required num_classes argument that set its size to a constant, that allows each class to have the same index in the confusion matrix / IoU vector regardless of what is in the batch.

0 replies

justusschock · 2020-07-30T11:12:57Z

justusschock
Jul 30, 2020
Maintainer

@remisphere I'm aware that this may be a memory issue, but we don't store it to disk on default. You could do so manually and just pass the file path to epoch end to restore it. I'm familiar with the way, ignite handles it, but IMO it's not that intuitive and it also does not change much, since you also need all the data to compute the cm.

0 replies

remisphere · 2020-07-30T11:53:05Z

remisphere
Jul 30, 2020
Author

Ok, thanks for the advice !
To me it looks like Ignite doesn't store all the data (see here), but aggregates statistics, just like what you are planning to do.

0 replies

abrahambotros · 2020-09-18T19:25:27Z

abrahambotros
Sep 18, 2020

@remisphere #3097 fixes some of the issues you mentioned in the issue description above - you can now specify num_classes to the class version of IoU, similar to how you could specify that for the functional version before. If you're still computing IoU on just a single example (and not the whole dataset, as was recommended above if you can), you can also specify the absent_score to use for any classes not present in either the target or the prediction.

Regarding aggregation discussed above, it looks like a big metric aggregation PR (#3321) just got merged, though I haven't fully checked it out to know if it would make IoU feasible over large datasets / datasets with large images. Otherwise, maybe some of the recommendations from @justusschock above might be applicable still, and might be things that l look into trying as well.

0 replies

ToucheSir · 2020-09-18T20:43:36Z

ToucheSir
Sep 18, 2020

I'm not sure what kind of aggregation @remisphere was looking for, but #3321 won't aggregate confusion matrix based metrics in the same way one would expect e.g. Keras to. Frankly, I'm not too clear on how one would interpret summing or averaging confusion matrices from different batches.

That said, the solution above still works and can be cleaned up a bit now that Train/EvalResult exist.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly using IoU and ConfusionMatrix #5803

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Correctly using IoU and ConfusionMatrix #5803

remisphere Jul 29, 2020

❓ Questions and Help

Before asking:

What is your question?

What have you tried?

What's your environment?

Replies: 6 comments

justusschock Jul 30, 2020 Maintainer

remisphere Jul 30, 2020 Author

justusschock Jul 30, 2020 Maintainer

remisphere Jul 30, 2020 Author

abrahambotros Sep 18, 2020

ToucheSir Sep 18, 2020

remisphere
Jul 29, 2020

justusschock
Jul 30, 2020
Maintainer

remisphere
Jul 30, 2020
Author

justusschock
Jul 30, 2020
Maintainer

remisphere
Jul 30, 2020
Author

abrahambotros
Sep 18, 2020

ToucheSir
Sep 18, 2020