In computer vision, we often use the Dice score to quantify the quality of a generated segmentation to its ground truth:
Where:
-
$Y$ is the ground truth -
$\hat{Y}$ the estimate or prediction.
Let's name
Same thing for the number of pixels added
We thus have:
The penalty factor
We can see that
Also,
However, the interesting thing is that
In other terms, it means that Dice overpenalises small objects, and is too tolerant with big objects.
Using a Dice loss or a Dice metric on a dataset with many different sizes of objects will cause your model to be biased.
A solution is to use the Generalized Dice Loss.