[filters] The relationship between Convolution3D running speed and number of threads #6131

Ru1yi · 2024-09-11T10:26:20Z

Describe the bug
I was trying to use the Gaussian convolution filtering algorithm below the filters module. When I set the number of threads, I found that when setNumberOfThreads input 1, Gaussian convolution runs the fastest, and the more threads, the slower the running speed. Below is my source code:

auto st = std::chrono::high_resolution_clock::now();
pcl::filters::GaussianKernel<PointT, PointT> kernel;
// Set gaussian kernel
kernel.setSigma(4);
kernel.setThresholdRelativeToSigma(4);
kernel.setThreshold(0.05);
// Create kdtree
pcl::search::KdTree<PointT>::Ptr tree(new pcl::search::KdTree<PointT>);
tree->setInputCloud(cloudptr);
// Set convolution params
pcl::filters::Convolution3D<PointT, PointT, pcl::filters::GaussianKernel<PointT, PointT>> convolution;
convolution.setKernel(kernel);
convolution.setInputCloud(cloudptr);
convolution.setNumberOfThreads(1);
convolution.setSearchMethod(tree);
convolution.setRadiusSearch(0.01);
PointCloudPtr g_filtered(new PointCloudT);
convolution.convolve(*g_filtered);
cloudptr = g_filtered;
auto et = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> duration = et - st;
qDebug() << "[Filter] Gaussian filter time: " << duration.count() * 1000 << "ms";

Test Result

setNumberOfThreads	Average run time(ms)
1	1350
2	1700
4	2200
8	3100
default	7200

My Environment:

OS: Windows 11
IDE: VS2019
Compiler: MSVC2019
PCL Version 1.12.1
Original point cloud data：128(width)*1800(height) size：230400

Possible Solution

It looks like the time spent on thread management exceeds the time spent on the algorithm itself. So is it that my parameter settings are wrong or Convolution3D cannot be used for organized point clouds?

mvieth · 2024-09-11T12:57:48Z

@Ru1yi I did a quick test with a different cloud with GCC on Linux, and I got 5141ms - 4105ms - 2448ms - 1426ms (1 thread - 2 threads - 4 threads - 8 threads respectively).
How did you install PCL? Do you enable OpenMP while compiling your project? I don't have much experience with MSVC and OpenMP, but I do not think that thread management takes that long.

Ru1yi · 2024-09-12T03:27:20Z

I installed PCL using PCL-1.12.1-AllInOne-msvc2019-win64.exe from the official release. If I want to use PCL with OpenMP, do I need to configure anything specifically? I enabled the OPENMP support in the VS project configuration.
Here are the runtimes when I turn off OpenMP support：

setNumberOfThreads	Average run time(ms)
1	1350ms
2	1330ms
4	1350ms
8	1380ms

It looks like it was closed successfully. Could it be a problem with my data? Can you help me confirm the support of msvc2019 for openmp? Thanks a lot.

mvieth · 2024-09-12T08:06:42Z

@Ru1yi Can you post the point cloud you are using as a zipped PCD or PLY file? I will also try to test Convolution3D with MSVC as soon as I can.

Ru1yi · 2024-09-13T00:30:51Z

@mvieth 3644.500049090.zip Here is the data I used for testing, which was collected using the Robosense 128-line mechanical lidar. Thank you for your support.

mvieth · 2024-09-13T16:24:07Z

I did some testing (VS2022, PCL 1.13.0, with 3644.100042090.pcd), but I didn't notice any increasing run time with more threads. 2 threads are always faster than 1 thread. I did notice that at some point, more threads did not make it any faster, maybe around 4 threads (even though my computer has 6 physical cores with 2 hyperthreads each). I only had to enable OpenMP support at one place in the project configuration (set to Yes (/openmp)). Do you build in Debug or Release configuration? Do you run with or without debugging?

Ru1yi · 2024-09-18T07:05:29Z

Because I use Qt5 and VS2019 for joint development, the version of PCL is 1.12.1. If you use VS2022 and PCL1.13.0 or above, you need to develop with Qt6. In my previous tests, PCL1.12.1 does not support MSVC2022, and migrating the entire program to Qt6 is a large workload, so I want to know whether the version will affect this problem. I build this project in Debug and run with debugging.

mvieth · 2024-09-18T07:32:13Z

I don't think that PCL 1.12.1 vs PCL 1.13.0 or VS2019 vs VS2022 makes any difference for the multithreading performance. I would suggest to also test with your project built in Release configuration and run without debugging. I could imagine that those two lead to more thread management overhead.

Ru1yi added kind: bug Type of issue status: triage Labels incomplete labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[filters] The relationship between Convolution3D running speed and number of threads #6131

[filters] The relationship between Convolution3D running speed and number of threads #6131

Ru1yi commented Sep 11, 2024 •

edited

Loading

mvieth commented Sep 11, 2024

Ru1yi commented Sep 12, 2024 •

edited

Loading

mvieth commented Sep 12, 2024

Ru1yi commented Sep 13, 2024

mvieth commented Sep 13, 2024

Ru1yi commented Sep 18, 2024

mvieth commented Sep 18, 2024

[filters] The relationship between Convolution3D running speed and number of threads #6131

[filters] The relationship between Convolution3D running speed and number of threads #6131

Comments

Ru1yi commented Sep 11, 2024 • edited Loading

mvieth commented Sep 11, 2024

Ru1yi commented Sep 12, 2024 • edited Loading

mvieth commented Sep 12, 2024

Ru1yi commented Sep 13, 2024

mvieth commented Sep 13, 2024

Ru1yi commented Sep 18, 2024

mvieth commented Sep 18, 2024

Ru1yi commented Sep 11, 2024 •

edited

Loading

Ru1yi commented Sep 12, 2024 •

edited

Loading