Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The evaluation results on 3DMatch are inconsistent with the paper #19

Open
liuchen-2020 opened this issue Apr 24, 2022 · 8 comments
Open

Comments

@liuchen-2020
Copy link

Hi, I noticed that the registration recall rate was not calculated in test.py, so I calculated it manually.
I have two questions:

  1. How to calculate the registration recall rate?
    The method I used was to calculate the RMSE(<0.2m) of each point of the registered TGT and the real TGT after using the RANSAC registration point cloud, but the calculated result was better than that in the paper.
  2. When 5000 key points are used to register point cloud, the registration recall rate does not attenuate as in the paper
    Here's my evaluate result:

250 key points:
sun3d-hotel_umd-maryland_hotel3: Feature Recall=94.44%, inlier ratio=44.52%, inlier num=31.26, registration_recall=90.74%
sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: Feature Recall=93.51%, inlier ratio=42.84%, inlier num=35.34, registration_recall=80.52%
sun3d-hotel_umd-maryland_hotel1: Feature Recall=88.46%, inlier ratio=35.26%, inlier num=26.69, registration_recall=79.81%
sun3d-home_at-home_at_scan1_2013_jan_1: Feature Recall=93.59%, inlier ratio=43.54%, inlier num=31.84, registration_recall=83.33%
sun3d-home_md-home_md_scan9_2012_sep_30: Feature Recall=88.94%, inlier ratio=38.87%, inlier num=28.19, registration_recall=74.04%
sun3d-hotel_uc-scan3: Feature Recall=97.35%, inlier ratio=35.69%, inlier num=25.60, registration_recall=87.61%
sun3d-mit_76_studyroom-76-1studyroom2: Feature Recall=92.12%, inlier ratio=40.76%, inlier num=30.23, registration_recall=83.22%
7-scenes-redkitchen: Feature Recall=95.06%, inlier ratio=32.07%, inlier num=23.36, registration_recall=83.79%

[94.44444444444444, 93.50649350649351, 88.46153846153847, 93.58974358974359, 88.9423076923077, 97.34513274336283, 92.12328767123287, 95.0592885375494]
All 8 scene, average recall: 92.93%
All 8 scene, average num inliers: 29.06
All 8 scene, average num inliers ratio: 39.19%
All 8 scene, average registration_recall: 82.88%

500 key points:
sun3d-hotel_umd-maryland_hotel3: Feature Recall=94.44%, inlier ratio=46.94%, inlier num=59.85, registration_recall=92.59%
sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: Feature Recall=94.81%, inlier ratio=44.73%, inlier num=67.60, registration_recall=77.92%
sun3d-hotel_umd-maryland_hotel1: Feature Recall=90.38%, inlier ratio=37.21%, inlier num=50.22, registration_recall=80.77%
sun3d-home_at-home_at_scan1_2013_jan_1: Feature Recall=96.15%, inlier ratio=46.76%, inlier num=62.71, registration_recall=89.74%
sun3d-home_md-home_md_scan9_2012_sep_30: Feature Recall=91.83%, inlier ratio=42.02%, inlier num=55.20, registration_recall=81.73%
sun3d-hotel_uc-scan3: Feature Recall=98.23%, inlier ratio=39.66%, inlier num=51.39, registration_recall=93.81%
sun3d-mit_76_studyroom-76-1studyroom2: Feature Recall=93.15%, inlier ratio=42.13%, inlier num=55.96, registration_recall=84.93%
7-scenes-redkitchen: Feature Recall=96.05%, inlier ratio=34.66%, inlier num=46.09, registration_recall=89.53%

[94.44444444444444, 94.8051948051948, 90.38461538461539, 96.15384615384616, 91.82692307692308, 98.23008849557522, 93.15068493150685, 96.04743083003953]
All 8 scene, average recall: 94.38%
All 8 scene, average num inliers: 56.13
All 8 scene, average num inliers ratio: 41.76%
All 8 scene, average registration_recall: 86.38%

I'm sorry that I can't find the result of 5000 key points at the moment, but it is indeed better than the result in the paper. If I remember correctly, the registration recall rate is about 0.88, and there is no attenuation in the paper.
I will add the result of 5000 key points later. Thank you for your reply

@liuchen-2020
Copy link
Author

Sorry, I made a wrong expression. What I want to express is that the registration result of 5000 key points does not appear the attenuation as in the paper

@qsisi
Copy link

qsisi commented Apr 30, 2022

  1. The registration recall is defined in that way you mentioned, but actually, the registration recall is not directly calculated based on RMSE, instead, it is measured by checking whether the approximation of the RMSE is lower than a certain threshold like here:
    https://github.com/prs-eth/OverlapPredator/blob/770c3063399f08b3836935212ab4c84d355b4704/lib/benchmark.py#L256
    where the pose represents the pred transformation and the info for each scene can be found here:
    https://github.com/prs-eth/OverlapPredator/blob/main/configs/benchmarks/3DMatch/7-scenes-redkitchen/gt.info
  2. Have you excluded the consecutive pairs during the evaluation? For example, 3DMatch has 1623 test pairs, by excluding consecutive pairs, it is reduced to 1279 test pairs). Not doing so will result in a 'higher' registration recall as consecutive pairs are more likely to be aligned.

More information could be found here:
http://redwood-data.org/indoor/registration.html

@liuchen-2020
Copy link
Author

liuchen-2020 commented May 1, 2022

Hi. I realize I didn't exclude the consecutive pairs. I used the evaluation method of 3DMatch, and finally got a result similar to the paper. Thank you very much for your reply.

In addition, I found that changing the distance threshold of RANSAC from 0.05 to 0.1 could get better registration results, because I made some modifications to improve the interior point rate, but reduced the registration recall rate. After analysis, I think it may be that the distance threshold of internal point rate is 0.1, while the threshold of RANSAC corresponds to Pairs is 0.05. So I changed 0.05->0.1 to get better results.

I watched your previous live broadcasts and was very interested in your work. Recently, I've been working on low overlap point cloud registration, but I've found that the performance of the 3DLoMatch dataset has also improved considerably over the past two years, especially since the CVPR2022 release. It has become a mainstream practice to realize the interaction perception of point cloud in overlapping areas by integrating global features with Transformer. Do you have any suggestions for the current research direction? Thank you very much for your advice.

@FresherQ
Copy link

@liuchen-2020, @qsisi
Hi, I tested the pytorch version pretrained model provided by author using test.py, and the results are quite different from those reported in the paper. I have some questions:

  1. Whether the pytorch version pretrained model can reproduce the results reported in paper? Maybe vary by some points.
  2. @liuchen-2020 , whether the results in comments is reproduced using the pytorch version D3feat and the pretrained model provided by author? If so, I will list my steps and results, hope you can give me some guidance.
  3. @qsisi , I also tried to exclude the consecutive pairs, but failed to get a consistent results, can you give me some guidance?

I tested the pretrained model using test.py with following steps:

  1. Modify the 'root: author-3dmatch-path' to 'root: my-3dmatch-path' in config.json of provided checkpoint.
  2. Set bias=False in the UnaryBlock and LastUnaryBlock, because the lastest version set bias=True which is inconsistent with provided pretrained model.

250 keypoints results with consecutive pairs:
sun3d-hotel_umd-maryland_hotel1: gt_match:104, Recall=88.46%, inlier ratio=26.16%, inlier num=16.15
sun3d-hotel_umd-maryland_hotel3: gt_match:54, Recall=96.30%, inlier ratio=35.62%, inlier num=20.96
sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: gt_match:77, Recall=89.61%, inlier ratio=32.01%, inlier num=20.92
sun3d-home_at-home_at_scan1_2013_jan_1: gt_match:156, Recall=92.31%, inlier ratio=35.64%, inlier num=23.19
7-scenes-redkitchen: gt_match:506, Recall=92.49%, inlier ratio=25.22%, inlier num=15.85
sun3d-home_md-home_md_scan9_2012_sep_30: gt_match:208, Recall=90.38%, inlier ratio=32.42%, inlier num=19.30
sun3d-hotel_uc-scan3: gt_match:226, Recall=96.90%, inlier ratio=30.34%, inlier num=18.18
sun3d-mit_76_studyroom-76-1studyroom2: gt_match:292, Recall=92.81%, inlier ratio=32.25%, inlier num=20.17


[88.46153846153847, 96.29629629629629, 89.6103896103896, 92.3076923076923, 92.4901185770751, 90.38461538461539, 96.90265486725664, 92.8082191780822]
All 8 scene, average recall: 92.41%
All 8 scene, average num inliers: 19.34
All 8 scene, average num inliers ratio: 31.21%

250 keypoints results without consecutive pairs:
sun3d-hotel_umd-maryland_hotel3: gt_match:26, Recall=92.31%, inlier ratio=32.70%, inlier num=18.96
sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: gt_match:45, Recall=86.67%, inlier ratio=26.69%, inlier num=16.84
sun3d-home_at-home_at_scan1_2013_jan_1: gt_match:106, Recall=89.62%, inlier ratio=31.65%, inlier num=20.49
sun3d-hotel_umd-maryland_hotel1: gt_match:78, Recall=85.90%, inlier ratio=22.76%, inlier num=14.18
sun3d-home_md-home_md_scan9_2012_sep_30: gt_match:159, Recall=88.05%, inlier ratio=27.20%, inlier num=15.47
sun3d-hotel_uc-scan3: gt_match:182, Recall=96.70%, inlier ratio=27.80%, inlier num=16.20
sun3d-mit_76_studyroom-76-1studyroom2: gt_match:234, Recall=91.45%, inlier ratio=29.48%, inlier num=17.72
7-scenes-redkitchen: gt_match:449, Recall=91.54%, inlier ratio=21.96%, inlier num=13.20


[92.3076923076923, 86.66666666666667, 89.62264150943396, 85.8974358974359, 88.0503144654088, 96.7032967032967, 91.45299145299145, 91.53674832962137]
All 8 scene, average recall: 90.28%
All 8 scene, average num inliers: 16.63
All 8 scene, average num inliers ratio: 27.53%

@liuchen-2020
Copy link
Author

I get exactly the same results as in the paper using the PyTorch pre-training model provided by the authors. As far as I know, the only caveat is that you need to exclude consecutive pairs, otherwise your registration recall will be higher than the results in the paper.

@houyongkuo
Copy link
Contributor

Hi, I noticed that the registration recall rate was not calculated in test.py, so I calculated it manually.

I have two questions:

  1. How to calculate the registration recall rate?

The method I used was to calculate the RMSE(<0.2m) of each point of the registered TGT and the real TGT after using the RANSAC registration point cloud, but the calculated result was better than that in the paper.

  1. When 5000 key points are used to register point cloud, the registration recall rate does not attenuate as in the paper

Here's my evaluate result:

250 key points:

sun3d-hotel_umd-maryland_hotel3: Feature Recall=94.44%, inlier ratio=44.52%, inlier num=31.26, registration_recall=90.74%

sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: Feature Recall=93.51%, inlier ratio=42.84%, inlier num=35.34, registration_recall=80.52%

sun3d-hotel_umd-maryland_hotel1: Feature Recall=88.46%, inlier ratio=35.26%, inlier num=26.69, registration_recall=79.81%

sun3d-home_at-home_at_scan1_2013_jan_1: Feature Recall=93.59%, inlier ratio=43.54%, inlier num=31.84, registration_recall=83.33%

sun3d-home_md-home_md_scan9_2012_sep_30: Feature Recall=88.94%, inlier ratio=38.87%, inlier num=28.19, registration_recall=74.04%

sun3d-hotel_uc-scan3: Feature Recall=97.35%, inlier ratio=35.69%, inlier num=25.60, registration_recall=87.61%

sun3d-mit_76_studyroom-76-1studyroom2: Feature Recall=92.12%, inlier ratio=40.76%, inlier num=30.23, registration_recall=83.22%

7-scenes-redkitchen: Feature Recall=95.06%, inlier ratio=32.07%, inlier num=23.36, registration_recall=83.79%


[94.44444444444444, 93.50649350649351, 88.46153846153847, 93.58974358974359, 88.9423076923077, 97.34513274336283, 92.12328767123287, 95.0592885375494]

All 8 scene, average recall: 92.93%

All 8 scene, average num inliers: 29.06

All 8 scene, average num inliers ratio: 39.19%

All 8 scene, average registration_recall: 82.88%

500 key points:

sun3d-hotel_umd-maryland_hotel3: Feature Recall=94.44%, inlier ratio=46.94%, inlier num=59.85, registration_recall=92.59%

sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: Feature Recall=94.81%, inlier ratio=44.73%, inlier num=67.60, registration_recall=77.92%

sun3d-hotel_umd-maryland_hotel1: Feature Recall=90.38%, inlier ratio=37.21%, inlier num=50.22, registration_recall=80.77%

sun3d-home_at-home_at_scan1_2013_jan_1: Feature Recall=96.15%, inlier ratio=46.76%, inlier num=62.71, registration_recall=89.74%

sun3d-home_md-home_md_scan9_2012_sep_30: Feature Recall=91.83%, inlier ratio=42.02%, inlier num=55.20, registration_recall=81.73%

sun3d-hotel_uc-scan3: Feature Recall=98.23%, inlier ratio=39.66%, inlier num=51.39, registration_recall=93.81%

sun3d-mit_76_studyroom-76-1studyroom2: Feature Recall=93.15%, inlier ratio=42.13%, inlier num=55.96, registration_recall=84.93%

7-scenes-redkitchen: Feature Recall=96.05%, inlier ratio=34.66%, inlier num=46.09, registration_recall=89.53%


[94.44444444444444, 94.8051948051948, 90.38461538461539, 96.15384615384616, 91.82692307692308, 98.23008849557522, 93.15068493150685, 96.04743083003953]

All 8 scene, average recall: 94.38%

All 8 scene, average num inliers: 56.13

All 8 scene, average num inliers ratio: 41.76%

All 8 scene, average registration_recall: 86.38%

I'm sorry that I can't find the result of 5000 key points at the moment, but it is indeed better than the result in the paper. If I remember correctly, the registration recall rate is about 0.88, and there is no attenuation in the paper.

I will add the result of 5000 key points later. Thank you for your reply

Hello, I would like to ask how you got such high num inlier and Inlier Rario. For example, when I used the pretrain model provided, 250 key points could only get the results of num inlier 19.86 and Inlier Rario 31.31%, which is much lower than in the paper.

@houyongkuo
Copy link
Contributor

@liuchen-2020, @qsisi

Hi, I tested the pytorch version pretrained model provided by author using test.py, and the results are quite different from those reported in the paper. I have some questions:

  1. Whether the pytorch version pretrained model can reproduce the results reported in paper? Maybe vary by some points.

  2. @liuchen-2020 , whether the results in comments is reproduced using the pytorch version D3feat and the pretrained model provided by author? If so, I will list my steps and results, hope you can give me some guidance.

  3. @qsisi , I also tried to exclude the consecutive pairs, but failed to get a consistent results, can you give me some guidance?

I tested the pretrained model using test.py with following steps:

  1. Modify the 'root: author-3dmatch-path' to 'root: my-3dmatch-path' in config.json of provided checkpoint.

  2. Set bias=False in the UnaryBlock and LastUnaryBlock, because the lastest version set bias=True which is inconsistent with provided pretrained model.

250 keypoints results with consecutive pairs:

sun3d-hotel_umd-maryland_hotel1: gt_match:104, Recall=88.46%, inlier ratio=26.16%, inlier num=16.15

sun3d-hotel_umd-maryland_hotel3: gt_match:54, Recall=96.30%, inlier ratio=35.62%, inlier num=20.96

sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: gt_match:77, Recall=89.61%, inlier ratio=32.01%, inlier num=20.92

sun3d-home_at-home_at_scan1_2013_jan_1: gt_match:156, Recall=92.31%, inlier ratio=35.64%, inlier num=23.19

7-scenes-redkitchen: gt_match:506, Recall=92.49%, inlier ratio=25.22%, inlier num=15.85

sun3d-home_md-home_md_scan9_2012_sep_30: gt_match:208, Recall=90.38%, inlier ratio=32.42%, inlier num=19.30

sun3d-hotel_uc-scan3: gt_match:226, Recall=96.90%, inlier ratio=30.34%, inlier num=18.18

sun3d-mit_76_studyroom-76-1studyroom2: gt_match:292, Recall=92.81%, inlier ratio=32.25%, inlier num=20.17


[88.46153846153847, 96.29629629629629, 89.6103896103896, 92.3076923076923, 92.4901185770751, 90.38461538461539, 96.90265486725664, 92.8082191780822]

All 8 scene, average recall: 92.41%

All 8 scene, average num inliers: 19.34

All 8 scene, average num inliers ratio: 31.21%

250 keypoints results without consecutive pairs:

sun3d-hotel_umd-maryland_hotel3: gt_match:26, Recall=92.31%, inlier ratio=32.70%, inlier num=18.96

sun3d-mit_lab_hj-lab_hj_tea_nov_2_2012_scan1_erika: gt_match:45, Recall=86.67%, inlier ratio=26.69%, inlier num=16.84

sun3d-home_at-home_at_scan1_2013_jan_1: gt_match:106, Recall=89.62%, inlier ratio=31.65%, inlier num=20.49

sun3d-hotel_umd-maryland_hotel1: gt_match:78, Recall=85.90%, inlier ratio=22.76%, inlier num=14.18

sun3d-home_md-home_md_scan9_2012_sep_30: gt_match:159, Recall=88.05%, inlier ratio=27.20%, inlier num=15.47

sun3d-hotel_uc-scan3: gt_match:182, Recall=96.70%, inlier ratio=27.80%, inlier num=16.20

sun3d-mit_76_studyroom-76-1studyroom2: gt_match:234, Recall=91.45%, inlier ratio=29.48%, inlier num=17.72

7-scenes-redkitchen: gt_match:449, Recall=91.54%, inlier ratio=21.96%, inlier num=13.20


[92.3076923076923, 86.66666666666667, 89.62264150943396, 85.8974358974359, 88.0503144654088, 96.7032967032967, 91.45299145299145, 91.53674832962137]

All 8 scene, average recall: 90.28%

All 8 scene, average num inliers: 16.63

All 8 scene, average num inliers ratio: 27.53%

Hello, I have the same problem. Have you found the reason? How to get better num inlier and Inlier Rario? For example, I can only get the results of num inlier 19.86 and Inlier Rario 31.31% from 250 key points by using the pre-training model provided, which is much lower than in the paper. Thanks

@FresherQ
Copy link

Hi, @houyongkuo , I haven't find the reason about the inconsistent problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants