You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe there may be an error in the reference implementation, in the __init__() function for the XView3Dataset class in dataloader.py that causes a significant number of samples to incorrectly be labeled as from class 1/FISHING.
The results hold for both the tiny and full datasets. If we look at the chipping annotations csv generated for the tiny validation set:
We can see in the first row, is_vessel is True, is_fishing is NaN. This should result in a label of 2/NONFISHING, however the label ends up with 1/FISHING. If we do a grouping (filling in NaN values, which pandas will drop if they are in the groupby keys), we get:
So we can see that cases where is_vessel is True and is_fishing is NaN are always labeled as 1/FISHING. This occurs for both LOW and MEDIUM confidence labels in the tiny set. Additionally, cases where both is_vessel and is_fishing are NaN are also labeled as 1/FISHING.
If we do the same analysis for the chipping annotations csv generated for the tiny training set:
We can see that cases where is_vessel is True and is_fishing is NaN don't occur in this set. However, cases where both columns are NaN do, and they are labeled as 1/FISHING.
The issue seems to be in the loop in lines 271 - 277:
I believe there may be an error in the reference implementation, in the
__init__()
function for theXView3Dataset
class indataloader.py
that causes a significant number of samples to incorrectly be labeled as from class 1/FISHING
.The results hold for both the tiny and full datasets. If we look at the chipping annotations csv generated for the tiny validation set:
We can see in the first row,
is_vessel
isTrue
,is_fishing
isNaN
. This should result in a label of 2/NONFISHING
, however the label ends up with 1/FISHING
. If we do a grouping (filling inNaN
values, which pandas will drop if they are in the groupby keys), we get:So we can see that cases where
is_vessel
isTrue
andis_fishing
isNaN
are always labeled as 1/FISHING
. This occurs for bothLOW
andMEDIUM
confidence labels in the tiny set. Additionally, cases where bothis_vessel
andis_fishing
areNaN
are also labeled as 1/FISHING
.If we do the same analysis for the chipping annotations csv generated for the tiny training set:
We can see that cases where
is_vessel
isTrue
andis_fishing
isNaN
don't occur in this set. However, cases where both columns areNaN
do, and they are labeled as 1/FISHING
.The issue seems to be in the loop in lines 271 - 277:
The first conditional statement,
is meant to test if both
is_vessel
andis_fishing
areTrue
. However, the PandasNaN
will also evaluate toTrue
. Here is an example data point:So the first conditional statement will classify
is_vessel
/is_fishing
combinations ofTrue
/True
,NaN
/NaN
,True
/NaN
all as class 1/FISHING
.For a dataset with a random subset of 300 of the training scenes chipped, the label distribution looks like:
So there end up with (4751 + 3315 + 119) = 8185 of the 19224 detections labeled as 1, seemingly incorrectly.
The text was updated successfully, but these errors were encountered: