Potential bug in calc_square_dist() #2178

chyohoo · 2022-08-05T05:27:01Z

Line 12 in f527e43

def calc_square_dist(point_feat_a: Tensor,

This function could return NaN, since the square dist could be negative in some cases.

for example:

a = torch.tensor([[[0.0000, 0.0000, 0.2188, 0.0000, 0.0000]]])
b = torch.tensor([[[0.0000, 0.0000, 0.2189, 0.0000, 0.0000]]])
calc_square_dist(a,b)

The text was updated successfully, but these errors were encountered:

zhouzaida · 2022-08-05T07:39:38Z

Please @ZCMax have a look.

ZCMax · 2022-08-05T08:05:02Z

I can not reproduce the NaN result using your provided example. It calculates the square dist correctly.

chyohoo · 2022-08-05T08:25:44Z

I can not reproduce the NaN result using your provided example. It calculates the square dist correctly.

this is very strange. I can reproduce neither. But I did encounter NaN while using it.

zhouzaida · 2022-08-09T03:44:01Z

Hi, what is your mmcv version? I can not reproduce the NaN result either.

chyohoo · 2022-08-10T06:26:57Z

nan_tensor.zip
I save two tensors that will casues nan in the zip file.
My mmcv version is

MMCV: 1.5.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.0

Tai-Wang · 2022-08-17T03:36:54Z

This seems to be caused by a subtle numerical problem. Because there can be negative values around zero in the results of dist = a_square + b_square - 2 * corr_matrix, i.e., in dist, nan can be produced by computing sqrt(dist). A simple workaround is to add an small number epsilon to dist when computing its square root. Please @ZCMax have a look at whether this modification has other influence or has any effect on current related models.

chyohoo · 2022-08-17T05:40:26Z

This seems to be caused by a subtle numerical problem. Because there can be negative values around zero in the results of dist = a_square + b_square - 2 * corr_matrix, i.e., in dist, nan can be produced by computing sqrt(dist). A simple workaround is to add an small number epsilon to dist when computing its square root. Please @ZCMax have a look at whether this modification has other influence or has any effect on current related models.

yep. using torch.cdist instead not seen nan so far

ZCMax · 2022-08-17T07:59:45Z

Great, torch.cdist would be a better solution for this situation, contributions ( PR) are welcome if you have time after checking the performance influence of this modification.

mm-assistant bot assigned ice-tong Aug 5, 2022

zhouzaida assigned Tai-Wang and unassigned ice-tong Aug 16, 2022

ZCMax mentioned this issue Oct 25, 2022

[Fix] Fix the potential NaN bug in calc_square_dist() #2356

Merged

7 tasks

zhouzaida linked a pull request Oct 25, 2022 that will close this issue

[Fix] Fix the potential NaN bug in calc_square_dist() #2356

Merged

7 tasks

zhouzaida closed this as completed in #2356 Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in calc_square_dist() #2178

Potential bug in calc_square_dist() #2178

chyohoo commented Aug 5, 2022

zhouzaida commented Aug 5, 2022

ZCMax commented Aug 5, 2022 •

edited

Loading

chyohoo commented Aug 5, 2022

zhouzaida commented Aug 9, 2022 •

edited

Loading

chyohoo commented Aug 10, 2022 •

edited

Loading

Tai-Wang commented Aug 17, 2022

chyohoo commented Aug 17, 2022

ZCMax commented Aug 17, 2022 •

edited

Loading

Potential bug in calc_square_dist() #2178

Potential bug in calc_square_dist() #2178

Comments

chyohoo commented Aug 5, 2022

zhouzaida commented Aug 5, 2022

ZCMax commented Aug 5, 2022 • edited Loading

chyohoo commented Aug 5, 2022

zhouzaida commented Aug 9, 2022 • edited Loading

chyohoo commented Aug 10, 2022 • edited Loading

Tai-Wang commented Aug 17, 2022

chyohoo commented Aug 17, 2022

ZCMax commented Aug 17, 2022 • edited Loading

ZCMax commented Aug 5, 2022 •

edited

Loading

zhouzaida commented Aug 9, 2022 •

edited

Loading

chyohoo commented Aug 10, 2022 •

edited

Loading

ZCMax commented Aug 17, 2022 •

edited

Loading