Test 85 failing under Windows in Debug configuration #2570

claudiofantacci · 2018-10-19T13:54:25Z

Here is the error:

85: Test command: C:\Users\cfantacci\GitHub\pcl\buildd\bin\test_registration.exe "C:/Users/cfantacci/GitHub/pcl/test/bun0.pcd" "C:/Users/cfantacci/GitHub/pcl/test/bun4.pcd" "C:/Users/cfantacci/GitHub/pcl/test/milk_color.pcd"
85: Test timeout computed to be: 9.99988e+06
85: [==========] Running 12 tests from 1 test case.
85: [----------] Global test environment set-up.
85: [----------] 12 tests from PCL
85: [ RUN      ] PCL.findFeatureCorrespondences
85: [       OK ] PCL.findFeatureCorrespondences (6 ms)
85: [ RUN      ] PCL.IterativeClosestPoint
85: [       OK ] PCL.IterativeClosestPoint (196 ms)
85: [ RUN      ] PCL.IterativeClosestPointWithRejectors
85: [       OK ] PCL.IterativeClosestPointWithRejectors (3641 ms)
85: [ RUN      ] PCL.JointIterativeClosestPoint
85: [       OK ] PCL.JointIterativeClosestPoint (24283 ms)
85: [ RUN      ] PCL.IterativeClosestPointNonLinear
85: [       OK ] PCL.IterativeClosestPointNonLinear (9110 ms)
85: [ RUN      ] PCL.IterativeClosestPoint_PointToPlane
85: [pcl::IterativeClosestPoint::computeTransformation] Not enough correspondences found. Relax your threshold parameters.
85: [pcl::IterativeClosestPoint::computeTransformation] Not enough correspondences found. Relax your threshold parameters.
85: [pcl::IterativeClosestPoint::computeTransformation] Not enough correspondences found. Relax your threshold parameters.
85: [pcl::IterativeClosestPoint::computeTransformation] Not enough correspondences found. Relax your threshold parameters.
85: [pcl::IterativeClosestPoint::computeTransformation] Not enough correspondences found. Relax your threshold parameters.
85: [       OK ] PCL.IterativeClosestPoint_PointToPlane (361 ms)
85: [ RUN      ] PCL.GeneralizedIterativeClosestPoint
85: [       OK ] PCL.GeneralizedIterativeClosestPoint (14878 ms)
85: [ RUN      ] PCL.GeneralizedIterativeClosestPoint6D
85: [       OK ] PCL.GeneralizedIterativeClosestPoint6D (9052 ms)
85: [ RUN      ] PCL.NormalDistributionsTransform
85: [       OK ] PCL.NormalDistributionsTransform (101659 ms)
85: [ RUN      ] PCL.SampleConsensusInitialAlignment
85: [       OK ] PCL.SampleConsensusInitialAlignment (47349 ms)
85: [ RUN      ] PCL.SampleConsensusPrerejective
85: [       OK ] PCL.SampleConsensusPrerejective (39103 ms)
85: [ RUN      ] PCL.PyramidFeatureHistogram
6/7 Test  #85: a_registration_test ..............***Failed  339.47 sec

This error is quite subtle and please let me know whether my understanding of the error, and its possible fix, is correct or not.

The main point of failure is this line here

pcl/registration/include/pcl/registration/impl/pyramid_feature_matching.hpp

Line 265 in cd65105

    
           access.push_back (static_cast<size_t> (floor ((feature[dim_i] - dimension_range_target_[dim_i].first) / hist_levels[level].bin_step[dim_i])));

that pushes back into access variable an "unknown" number when feature is nan. Feature is never checked whether it could be nan or not and this may happen because of this snippet code from PPFEstiamtion feature used for the test (and maybe in user code also)

pcl/features/include/pcl/features/impl/ppf.hpp

Lines 69 to 118 in cd65105

    
             for (size_t index_i = 0; index_i < indices_->size (); ++index_i) 
        
             { 
        
               size_t i = (*indices_)[index_i]; 
        
               for (size_t j = 0 ; j < input_->points.size (); ++j) 
        
               { 
        
                 PointOutT p; 
        
                 if (i != j) 
        
                 { 
        
                   if (//pcl::computePPFPairFeature 
        
                       pcl::computePairFeatures (input_->points[i].getVector4fMap (), 
        
                                                 normals_->points[i].getNormalVector4fMap (), 
        
                                                 input_->points[j].getVector4fMap (), 
        
                                                 normals_->points[j].getNormalVector4fMap (), 
        
                                                 p.f1, p.f2, p.f3, p.f4)) 
        
                   { 
        
                     // Calculate alpha_m angle 
        
                     Eigen::Vector3f model_reference_point = input_->points[i].getVector3fMap (), 
        
                                     model_reference_normal = normals_->points[i].getNormalVector3fMap (), 
        
                                     model_point = input_->points[j].getVector3fMap (); 
        
                     float rotation_angle = acosf (model_reference_normal.dot (Eigen::Vector3f::UnitX ())); 
        
                     bool parallel_to_x = (model_reference_normal.y() == 0.0f && model_reference_normal.z() == 0.0f); 
        
                     Eigen::Vector3f rotation_axis = (parallel_to_x)?(Eigen::Vector3f::UnitY ()):(model_reference_normal.cross (Eigen::Vector3f::UnitX ()). normalized()); 
        
                     Eigen::AngleAxisf rotation_mg (rotation_angle, rotation_axis); 
        
                     Eigen::Affine3f transform_mg (Eigen::Translation3f ( rotation_mg * ((-1) * model_reference_point)) * rotation_mg); 
        
                     Eigen::Vector3f model_point_transformed = transform_mg * model_point; 
        
                     float angle = atan2f ( -model_point_transformed(2), model_point_transformed(1)); 
        
                     if (sin (angle) * model_point_transformed(2) < 0.0f) 
        
                       angle *= (-1); 
        
                     p.alpha_m = -angle; 
        
                   } 
        
                   else 
        
                   { 
        
                     PCL_ERROR ("[pcl::%s::computeFeature] Computing pair feature vector between points %u and %u went wrong.\n", getClassName ().c_str (), i, j); 
        
                     p.f1 = p.f2 = p.f3 = p.f4 = p.alpha_m = std::numeric_limits<float>::quiet_NaN (); 
        
                     output.is_dense = false; 
        
                   } 
        
                 } 
        
                 // Do not calculate the feature for identity pairs (i, i) as they are not used 
        
                 // in the following computations 
        
                 else 
        
                 { 
        
                   p.f1 = p.f2 = p.f3 = p.f4 = p.alpha_m = std::numeric_limits<float>::quiet_NaN (); 
        
                   output.is_dense = false; 
        
                 } 
        
                 output.points[index_i*input_->points.size () + j] = p; 
        
               } 
        
             } 
        
           }

variable access is then used to evaluate an index of a vector and since it results in nan its use is undefined (or platform dependent since it will be converted to a size_t).
I made some tests on some platform and apparently (take this with care, I'm not 100% sure) accessing a vector with nan results in accessing element at 0 (that is, the cast to size_t results in 0).
It turns out that in Debug configuration under Windows, the index evaluated using access variable resulted in a very large number, not nan. As a consequence I got the out-of-bound access fualt from which I started the investigation.

Please note that if access is an arbitrary large number (as in Debug mode under Windows), the following line

pcl/registration/include/pcl/registration/impl/pyramid_feature_matching.hpp

Line 239 in cd65105

vector_position += access[i] * dim_accumulator;

that translates access to a an index sometimes go over the maximum allowed representation of the type and, depending on the system, may result in a 0, thus masking the unwanted behaviour.

The fix I propose, assuming that the expected behavior is to have nan features mapped to histogram bin 0, is to change the following lines

pcl/registration/include/pcl/registration/impl/pyramid_feature_matching.hpp

Lines 264 to 265 in cd65105

    
           for (size_t dim_i = 0; dim_i < nr_dimensions; ++dim_i) 
        
             access.push_back (static_cast<size_t> (floor ((feature[dim_i] - dimension_range_target_[dim_i].first) / hist_levels[level].bin_step[dim_i])));

in

  for (size_t dim_i = 0; dim_i < nr_dimensions; ++dim_i)
  {
    if (!pcl_isnan(feature[dim_i]))
      access.push_back (static_cast<size_t>(std::floor((feature[dim_i] - dimension_range_target_[dim_i].first) / hist_levels[level].bin_step[dim_i])));
    else
      access.push_back (static_cast<size_t>(0));
  }

to force the inex to be 0 for nans. By doing this the tests then run properly.

See further comments here #2399

The text was updated successfully, but these errors were encountered:

Detailed explanation of the issue in PointCloudLibrary#2570 Closes PointCloudLibrary#2570 See also PointCloudLibrary#2399

SergioRAgostinho · 2018-10-19T14:10:44Z

I'm having trouble connecting all the dots. I need to have a look at the stack trace to be able to guide myself.

Regarding your proposal, it feels to me like we're hiding the problem under the carpet. I'm not fully into the problem like you, but intuitively it feels the problem needs to be fixed somewhere else. Inside the at method I would simply throw if feature[dim_i] is nan.

claudiofantacci · 2018-10-19T14:24:01Z

I'm having trouble connecting all the dots. I need to have a look at the stack trace to be able to guide myself.

Regarding your proposal, it feels to me like we're hiding the problem under the carpet. I'm not fully into the problem like you, but intuitively it feels the problem needs to be fixed somewhere else. Inside the at method I would simply throw I feature[dim_i] is nan.

Sorry for the confusion, I updated my initial post and hopefully it is a bit clearer. Please go over it again 😄 To be fair, I'm not sure my fix is correct becuse I'm unuware of the expected behaviour.

I agree that you should also have a look at the stack trace.

My intuition is the following:

Assuming the tests are correctly implemented, if I map nan features in 0 the tests pass in both Release and Debug configuration. I tried skipping nan feature alltogether and the tests failed in both Release and Debug mode.
Assuming that nan features should be ingored/skipped, then we need to re-write the tests and/or check the data being used inside the tests, becuase they fail as the tolerance are not respected.

Detailed explanation of the issue in PointCloudLibrary#2570 Closes PointCloudLibrary#2570 See also PointCloudLibrary#2399

SergioRAgostinho · 2018-10-21T19:48:49Z

Just want to quickly report my partial findings.

The first occurrence of nans in features happens on line 868

pcl/test/registration/test_registration.cpp

Lines 863 to 868 in 50ca79c

    
           PyramidFeatureHistogram<PPFSignature>::Ptr pyramid_source (new PyramidFeatureHistogram<PPFSignature> ()), 
        
               pyramid_target (new PyramidFeatureHistogram<PPFSignature> ()); 
        
           pyramid_source->setInputCloud (ppf_signature_source); 
        
           pyramid_source->setInputDimensionRange (dim_range_input); 
        
           pyramid_source->setTargetDimensionRange (dim_range_target); 
        
           pyramid_source->compute ();

The first and last points of the input cloud ppf_signature_source and ppf_signature_target are set to NaN. The expressions below are being evaluated at line 298

pcl/registration/include/pcl/registration/impl/pyramid_feature_matching.hpp

Lines 294 to 299 in 50ca79c

    
           for (size_t feature_i = 0; feature_i < input_->points.size (); ++feature_i) 
        
           { 
        
             std::vector<float> feature_vector; 
        
             convertFeatureToVector (input_->points[feature_i], feature_vector); 
        
             addFeature (feature_vector); 
        
           }

(lldb) p input_->points.__begin_[0]
(pcl::PPFSignature) $13 = (f1 = NaN, f2 = NaN, f3 = NaN, f4 = NaN, alpha_m = NaN)
(lldb) p input_->points.__begin_[1]
(pcl::PPFSignature) $14 = (f1 = -0.0932385772, f2 = 0.00952175259, f3 = 0.0544939041, f4 = 0.00722551253, alpha_m = 2.07673001)
(lldb) p input_->points.__begin_[157607]
(pcl::PPFSignature) $15 = (f1 = 0.00231693685, f2 = 0.00789476931, f3 = -0.0350695662, f4 = 0.0119710751, alpha_m = 0.39232108)
(lldb) p input_->points.__begin_[157608]
(pcl::PPFSignature) $16 = (f1 = NaN, f2 = NaN, f3 = NaN, f4 = NaN, alpha_m = NaN)

I was able to verify the same with ppf_signature_target.

My next steps would be to understand if:

these points can be filtered out safely. ppf_signature_source and ppf_signature_target don't have the same number of points so it might work.
understand the reason why it is apparently always the first and last point. I haven't confirmed if there are other points also set to NaN, but the first and last are consistently there.
does it make sense to let pyramid signature output NaNs. Is it trying to ensure it has the same number of points as the input cloud?

claudiofantacci · 2018-10-22T12:27:29Z

@SergioRAgostinho we are synchronized on the stack trace 👍

I also noticed that the first and last element are always NaN, but I fear is something related to the test or on a common use case of the feature (evaluate the feature on a PointCloud without providing indices). If you look at how NaN are added

pcl/features/include/pcl/features/impl/ppf.hpp

Lines 69 to 118 in cd65105

    
             for (size_t index_i = 0; index_i < indices_->size (); ++index_i) 
        
             { 
        
               size_t i = (*indices_)[index_i]; 
        
               for (size_t j = 0 ; j < input_->points.size (); ++j) 
        
               { 
        
                 PointOutT p; 
        
                 if (i != j) 
        
                 { 
        
                   if (//pcl::computePPFPairFeature 
        
                       pcl::computePairFeatures (input_->points[i].getVector4fMap (), 
        
                                                 normals_->points[i].getNormalVector4fMap (), 
        
                                                 input_->points[j].getVector4fMap (), 
        
                                                 normals_->points[j].getNormalVector4fMap (), 
        
                                                 p.f1, p.f2, p.f3, p.f4)) 
        
                   { 
        
                     // Calculate alpha_m angle 
        
                     Eigen::Vector3f model_reference_point = input_->points[i].getVector3fMap (), 
        
                                     model_reference_normal = normals_->points[i].getNormalVector3fMap (), 
        
                                     model_point = input_->points[j].getVector3fMap (); 
        
                     float rotation_angle = acosf (model_reference_normal.dot (Eigen::Vector3f::UnitX ())); 
        
                     bool parallel_to_x = (model_reference_normal.y() == 0.0f && model_reference_normal.z() == 0.0f); 
        
                     Eigen::Vector3f rotation_axis = (parallel_to_x)?(Eigen::Vector3f::UnitY ()):(model_reference_normal.cross (Eigen::Vector3f::UnitX ()). normalized()); 
        
                     Eigen::AngleAxisf rotation_mg (rotation_angle, rotation_axis); 
        
                     Eigen::Affine3f transform_mg (Eigen::Translation3f ( rotation_mg * ((-1) * model_reference_point)) * rotation_mg); 
        
                     Eigen::Vector3f model_point_transformed = transform_mg * model_point; 
        
                     float angle = atan2f ( -model_point_transformed(2), model_point_transformed(1)); 
        
                     if (sin (angle) * model_point_transformed(2) < 0.0f) 
        
                       angle *= (-1); 
        
                     p.alpha_m = -angle; 
        
                   } 
        
                   else 
        
                   { 
        
                     PCL_ERROR ("[pcl::%s::computeFeature] Computing pair feature vector between points %u and %u went wrong.\n", getClassName ().c_str (), i, j); 
        
                     p.f1 = p.f2 = p.f3 = p.f4 = p.alpha_m = std::numeric_limits<float>::quiet_NaN (); 
        
                     output.is_dense = false; 
        
                   } 
        
                 } 
        
                 // Do not calculate the feature for identity pairs (i, i) as they are not used 
        
                 // in the following computations 
        
                 else 
        
                 { 
        
                   p.f1 = p.f2 = p.f3 = p.f4 = p.alpha_m = std::numeric_limits<float>::quiet_NaN (); 
        
                   output.is_dense = false; 
        
                 } 
        
                 output.points[index_i*input_->points.size () + j] = p; 
        
               } 
        
             } 
        
           }

it depends on the content of indices that cannot be known a-priori. As a result (by my understanding), there could be multiple NaN in the feature.

Let me reply in order:

these points can be filtered out safely. ppf_signature_source and ppf_signature_target don't have the same number of points so it might work.

I tried skipping NaN altogether in L298 of

pcl/registration/include/pcl/registration/impl/pyramid_feature_matching.hpp

Lines 294 to 299 in 50ca79c

    
           for (size_t feature_i = 0; feature_i < input_->points.size (); ++feature_i) 
        
           { 
        
             std::vector<float> feature_vector; 
        
             convertFeatureToVector (input_->points[feature_i], feature_vector); 
        
             addFeature (feature_vector); 
        
           }

but the tests failed because the tolerance on the recognition tests were not respected.

My fear is that dim_accumulator in the following lines

pcl/registration/include/pcl/registration/impl/pyramid_feature_matching.hpp

Lines 237 to 243 in cd65105

    
           for (int i = static_cast<int> (access.size ()) - 1; i >= 0; --i) 
        
           { 
        
             vector_position += access[i] * dim_accumulator; 
        
             dim_accumulator *= hist_levels[level].bins_per_dimension[i]; 
        
           } 
        
           return hist_levels[level].hist[vector_position];

is accomulated differently and something in the algorithm is altered for it to work properly.

That is the reason why I forced NaN to be pushed in bin 0, because it is actually what happens in the current implementation.

understand the reason why it is apparently always the first and last point. I haven't confirmed if there are other points also set to NaN, but the first and last are consistently there.

My initial reasoning suggests that there may be other NaN points.

does it make sense to let pyramid signature output NaNs. Is it trying to ensure it has the same number of points as the input cloud?

I should read the paper of the subject feature to understand what is going on to propose a fix and some more detailed reasoning about this 😭

Anyway, I'll try to skip again the NaN points to see whether I did something wrong during my first attempts and let you know. It is important however that we agree that there is an issue in the code.

Detailed explanation of the issue in PointCloudLibrary#2570 Closes PointCloudLibrary#2570

stale · 2020-06-09T01:06:34Z

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

kunaltyagi · 2020-06-09T01:38:25Z

I made some tests on some platform and apparently (take this with care, I'm not 100% sure) accessing a vector with nan results in accessing element at 0 (that is, the cast to size_t results in 0).

Cast of floating NaN and Inf to integer as defined as UB. That means, we have to check the cast location and manually convert it to 0 in order to have consistent performance across platforms.

Though, @claudiofantacci mentioned trying to shoehorn NaNs into bin 0 already

mvieth · 2021-08-15T09:22:35Z

This problem was fixed by pull request #4711

SergioRAgostinho added module: test module: registration labels Oct 19, 2018

claudiofantacci added a commit to claudiofantacci/pcl that referenced this issue Oct 19, 2018

Fix use of NaN as container index in PyramidFeatureHistogram

8bc3c7a

Detailed explanation of the issue in PointCloudLibrary#2570 Closes PointCloudLibrary#2570 See also PointCloudLibrary#2399

claudiofantacci added a commit to claudiofantacci/pcl that referenced this issue Oct 19, 2018

Fix use of NaN as container index in PyramidFeatureHistogram

b73aa42

Detailed explanation of the issue in PointCloudLibrary#2570 Closes PointCloudLibrary#2570 See also PointCloudLibrary#2399

SergioRAgostinho added this to the pcl-1.9.0 milestone Oct 21, 2018

claudiofantacci added a commit to claudiofantacci/pcl that referenced this issue Oct 26, 2018

Fix use of NaN as container index in PyramidFeatureHistogram

e523b72

Detailed explanation of the issue in PointCloudLibrary#2570 Closes PointCloudLibrary#2570

claudiofantacci mentioned this issue Oct 26, 2018

1.9.0 Release #2399

Closed

16 tasks

SergioRAgostinho modified the milestones: pcl-1.9.0, pcl-1.10.0 Nov 6, 2018

kunaltyagi modified the milestones: pcl-1.10.0, pcl-1.11.0 Feb 21, 2020

kunaltyagi added the help wanted label May 5, 2020

kunaltyagi modified the milestones: pcl-1.11.0, pcl-1.11.1 May 9, 2020

stale bot added the status: stale label Jun 9, 2020

stale bot removed the status: stale label Jun 9, 2020

kunaltyagi added kind: todo Type of issue needs: testing Specify why not closed/merged yet labels Jun 16, 2020

kunaltyagi removed this from the pcl-1.11.1 milestone Jun 16, 2020

mvieth closed this as completed Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test 85 failing under Windows in Debug configuration #2570

Test 85 failing under Windows in Debug configuration #2570

claudiofantacci commented Oct 19, 2018 •

edited

Loading

SergioRAgostinho commented Oct 19, 2018 •

edited

Loading

claudiofantacci commented Oct 19, 2018 •

edited

Loading

SergioRAgostinho commented Oct 21, 2018 •

edited

Loading

claudiofantacci commented Oct 22, 2018

stale bot commented Jun 9, 2020

kunaltyagi commented Jun 9, 2020

mvieth commented Aug 15, 2021

Test 85 failing under Windows in Debug configuration #2570

Test 85 failing under Windows in Debug configuration #2570

Comments

claudiofantacci commented Oct 19, 2018 • edited Loading

SergioRAgostinho commented Oct 19, 2018 • edited Loading

claudiofantacci commented Oct 19, 2018 • edited Loading

SergioRAgostinho commented Oct 21, 2018 • edited Loading

claudiofantacci commented Oct 22, 2018

stale bot commented Jun 9, 2020

kunaltyagi commented Jun 9, 2020

mvieth commented Aug 15, 2021

claudiofantacci commented Oct 19, 2018 •

edited

Loading

SergioRAgostinho commented Oct 19, 2018 •

edited

Loading

claudiofantacci commented Oct 19, 2018 •

edited

Loading

SergioRAgostinho commented Oct 21, 2018 •

edited

Loading