On Adaptive Prediction Sets #9

PaulScemama · 2023-06-07T17:40:18Z

PaulScemama
Jun 7, 2023

Hi Anastasios,

First of all thank you for this great repository! I had a question about the intuition behind the scoring of APS and a potential error mode for it.

As mentioned in "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification", the score $s(x,y)$ for the APS procedure can be outlined in a few steps. For each prediction,

sort the probabilities from greatest to least.
add up the probabilities until the correct class is reached.

The trouble I'm having is I've come up with a couple examples where the score for one prediction is the same as for another, however one of the predictions is clearly better than the other. This seems problematic to me.

For example, consider the following two predictions for a 3-class classification problem:

yhat1 = [0.5, 0.3, 0.2]  # label_idx = 1
yhat2 = [0.8, 0.15, 0.05] # label_idx = 0

For yhat1, the score would be 0.5 + 0.3 = 0.8 since we add up until we get to the correct label index which is 1.
For yhat2, the score would be 0.8 = 0.8 since we add up until we get to the correct label index which is 0.

We're getting the same score but clearly yhat2 is a better prediction. Am I missing something? Or is this just a weakness of APS?

Thanks in advance!

Answered by aangelopoulos

Jun 7, 2023

You're not missing anything!

The one thing I'd say is that it's not totally clear that the second prediction is "better". The model might be better, but they may be equally calibrated from the perspective of the score function. In both yhat1 and yhat2, you need to take 80% of the probability mass before you contain the true label. In that sense, they're the same.

As another example, consider

yhat1 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1] # label_idx = 8
yhat2 = [0.4, 0.3, 0.1, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7] # label_idx = 3

In this one, it's less clear which prediction is "better".
In both cases, you need to take the qhat = 0.8, i.e., they require you to take …

View full answer

aangelopoulos · 2023-06-07T18:32:07Z

aangelopoulos
Jun 7, 2023
Maintainer

You're not missing anything!

The one thing I'd say is that it's not totally clear that the second prediction is "better". The model might be better, but they may be equally calibrated from the perspective of the score function. In both yhat1 and yhat2, you need to take 80% of the probability mass before you contain the true label. In that sense, they're the same.

As another example, consider

yhat1 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1] # label_idx = 8
yhat2 = [0.4, 0.3, 0.1, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7] # label_idx = 3

In this one, it's less clear which prediction is "better".
In both cases, you need to take the qhat = 0.8, i.e., they require you to take 80% of the probability mass before containing the true class.
So they're identical from the perspective of conformal. It's not exactly a weakness, it's just that model 2 may be a better model, but it still (on a relative scale) may be miscalibrated.

1 reply

PaulScemama Jun 7, 2023
Author

Thanks for the very clear explanation. I can see how there are different views of "better". I'm curious as to how changing the score to

sort the probabilities from greatest to least.
add up the probabilities until the correct class is reached; and don't include the probability for the correct class.

This means (in the example I have in my last comment) that yhat2 would have a score of 0 and yhat1 would have a score of 0.5. Perhaps this will create better efficiency? Additionally, I can try to use your regularized APS (RAPS) but instead of using the original APS score, using the one I just described.

Just spitballing. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Adaptive Prediction Sets #9

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

On Adaptive Prediction Sets #9

PaulScemama Jun 7, 2023

Replies: 1 comment · 1 reply

aangelopoulos Jun 7, 2023 Maintainer

PaulScemama Jun 7, 2023 Author

PaulScemama
Jun 7, 2023

Replies: 1 comment 1 reply

aangelopoulos
Jun 7, 2023
Maintainer

PaulScemama Jun 7, 2023
Author