On Adaptive Prediction Sets #9
-
Hi Anastasios, First of all thank you for this great repository! I had a question about the intuition behind the scoring of APS and a potential error mode for it. As mentioned in "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification", the score
The trouble I'm having is I've come up with a couple examples where the score for one prediction is the same as for another, however one of the predictions is clearly better than the other. This seems problematic to me. For example, consider the following two predictions for a 3-class classification problem: yhat1 = [0.5, 0.3, 0.2] # label_idx = 1
yhat2 = [0.8, 0.15, 0.05] # label_idx = 0
We're getting the same score but clearly Thanks in advance!
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You're not missing anything! The one thing I'd say is that it's not totally clear that the second prediction is "better". The model might be better, but they may be equally calibrated from the perspective of the score function. In both yhat1 and yhat2, you need to take 80% of the probability mass before you contain the true label. In that sense, they're the same. As another example, consider yhat1 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1] # label_idx = 8 In this one, it's less clear which prediction is "better". |
Beta Was this translation helpful? Give feedback.
You're not missing anything!
The one thing I'd say is that it's not totally clear that the second prediction is "better". The model might be better, but they may be equally calibrated from the perspective of the score function. In both yhat1 and yhat2, you need to take 80% of the probability mass before you contain the true label. In that sense, they're the same.
As another example, consider
yhat1 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1] # label_idx = 8
yhat2 = [0.4, 0.3, 0.1, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7, 0.2/7] # label_idx = 3
In this one, it's less clear which prediction is "better".
In both cases, you need to take the qhat = 0.8, i.e., they require you to take …