Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly using IoU and ConfusionMatrix #2753

Closed
2 tasks done
remisphere opened this issue Jul 29, 2020 · 6 comments
Closed
2 tasks done

Correctly using IoU and ConfusionMatrix #2753

remisphere opened this issue Jul 29, 2020 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@remisphere
Copy link

remisphere commented Jul 29, 2020

❓ Questions and Help

Before asking:

What is your question?

Hello,
I've been trying to use the IoU and ConfusionMatrix metrics for semantic segmentation, but I can't wrap my head around their implementation in PL and their intended usage.
They seem to assume that every class is present in at least the prediction or the target [1, 2, 3] (actually it looks for the max class index), which is a rather strange expectation to me.
With this assumption, they have variable return sizes, depending on what classes are missing in the batch (this was noticed by #2724).
IoU has a num_classes argument, but it is only used to throw warnings if the above expectation is not met.
The docs give very basic examples that are not in the context of a training loop and are thus outside the scope of computing the metrics over several batches.

How then do I get the IoUs (or confusion matrix) on my dataset, since it's not possible to average them as they don't have the same shape?

What have you tried?

For IoU, using the default reduction='elementwise_mean' prevent crashing, but I then get the mean IoU over the classes, and that is not what I want.

What's your environment?

  • OS: Linux
  • Packaging: pip
  • Version: 0.9.0rc3
@remisphere remisphere added the question Further information is requested label Jul 29, 2020
@justusschock
Copy link
Member

Hi @remisphere,
The idea here is to compute them on your whole dataset (e.g in validation_epoch_end). Therefore you currently have to collect your results by returning them in validation_step as part of the dict.

basically it would be something like this:

def validation_step(batch, batch_idx):
    pred = self(batch[0])
    return {'pred': pred, 'target': batch[1]}

def validation_epoch_end(self, outputs):
    preds = torch.cat([tmp['pred'] for tmp in outputs])
    targets = torch.cat([tmp['target'] for tmp in outputs])
    metric_val = my_metric(preds, targets)

We are currently working on a V2 for metrics, which will include some aggregation mechanisms, but I'm afraid, it will still take some time for us to finish that

@remisphere
Copy link
Author

Thank you @justusschock, that makes sense.

I have indeed seen some example where the full prediction is returned in the validation step, but I'm concerned it will eat up my memory if the validation data-set is a bit large, not talking about tracking the metric for the training data-set (unless it is stored to disk until the epoch's end ?).

I'm looking forward to the V2 then, happy to see PL flourishing !

P.S.:
I come from the PyTorch Ignite framework, where the IoU is computed from the confusion matrix, which has a required num_classes argument that set its size to a constant, that allows each class to have the same index in the confusion matrix / IoU vector regardless of what is in the batch.

@justusschock
Copy link
Member

@remisphere I'm aware that this may be a memory issue, but we don't store it to disk on default. You could do so manually and just pass the file path to epoch end to restore it. I'm familiar with the way, ignite handles it, but IMO it's not that intuitive and it also does not change much, since you also need all the data to compute the cm.

@remisphere
Copy link
Author

remisphere commented Jul 30, 2020

Ok, thanks for the advice !
To me it looks like Ignite doesn't store all the data (see here), but aggregates statistics, just like what you are planning to do.

@abrahambotros
Copy link
Contributor

@remisphere #3097 fixes some of the issues you mentioned in the issue description above - you can now specify num_classes to the class version of IoU, similar to how you could specify that for the functional version before. If you're still computing IoU on just a single example (and not the whole dataset, as was recommended above if you can), you can also specify the absent_score to use for any classes not present in either the target or the prediction.

Regarding aggregation discussed above, it looks like a big metric aggregation PR (#3321) just got merged, though I haven't fully checked it out to know if it would make IoU feasible over large datasets / datasets with large images. Otherwise, maybe some of the recommendations from @justusschock above might be applicable still, and might be things that l look into trying as well.

@ToucheSir
Copy link

I'm not sure what kind of aggregation @remisphere was looking for, but #3321 won't aggregate confusion matrix based metrics in the same way one would expect e.g. Keras to. Frankly, I'm not too clear on how one would interpret summing or averaging confusion matrices from different batches.

That said, the solution above still works and can be cleaned up a bit now that Train/EvalResult exist.

@Lightning-AI Lightning-AI locked and limited conversation to collaborators Feb 4, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants