Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filters] The relationship between Convolution3D running speed and number of threads #6131

Open
Ru1yi opened this issue Sep 11, 2024 · 7 comments
Labels
kind: bug Type of issue status: triage Labels incomplete

Comments

@Ru1yi
Copy link

Ru1yi commented Sep 11, 2024

Describe the bug
I was trying to use the Gaussian convolution filtering algorithm below the filters module. When I set the number of threads, I found that when setNumberOfThreads input 1, Gaussian convolution runs the fastest, and the more threads, the slower the running speed. Below is my source code:

auto st = std::chrono::high_resolution_clock::now();
pcl::filters::GaussianKernel<PointT, PointT> kernel;
// Set gaussian kernel
kernel.setSigma(4);
kernel.setThresholdRelativeToSigma(4);
kernel.setThreshold(0.05);
// Create kdtree
pcl::search::KdTree<PointT>::Ptr tree(new pcl::search::KdTree<PointT>);
tree->setInputCloud(cloudptr);
// Set convolution params
pcl::filters::Convolution3D<PointT, PointT, pcl::filters::GaussianKernel<PointT, PointT>> convolution;
convolution.setKernel(kernel);
convolution.setInputCloud(cloudptr);
convolution.setNumberOfThreads(1);
convolution.setSearchMethod(tree);
convolution.setRadiusSearch(0.01);
PointCloudPtr g_filtered(new PointCloudT);
convolution.convolve(*g_filtered);
cloudptr = g_filtered;
auto et = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> duration = et - st;
qDebug() << "[Filter] Gaussian filter time: " << duration.count() * 1000 << "ms";

Test Result

setNumberOfThreads Average run time(ms)
1 1350
2 1700
4 2200
8 3100
default 7200

My Environment:

  • OS: Windows 11
  • IDE: VS2019
  • Compiler: MSVC2019
  • PCL Version 1.12.1
  • Original point cloud data:128(width)*1800(height) size:230400

Possible Solution

It looks like the time spent on thread management exceeds the time spent on the algorithm itself. So is it that my parameter settings are wrong or Convolution3D cannot be used for organized point clouds?

@Ru1yi Ru1yi added kind: bug Type of issue status: triage Labels incomplete labels Sep 11, 2024
@mvieth
Copy link
Member

mvieth commented Sep 11, 2024

@Ru1yi I did a quick test with a different cloud with GCC on Linux, and I got 5141ms - 4105ms - 2448ms - 1426ms (1 thread - 2 threads - 4 threads - 8 threads respectively).
How did you install PCL? Do you enable OpenMP while compiling your project? I don't have much experience with MSVC and OpenMP, but I do not think that thread management takes that long.

@Ru1yi
Copy link
Author

Ru1yi commented Sep 12, 2024

I installed PCL using PCL-1.12.1-AllInOne-msvc2019-win64.exe from the official release. If I want to use PCL with OpenMP, do I need to configure anything specifically? I enabled the OPENMP support in the VS project configuration.
Here are the runtimes when I turn off OpenMP support:

setNumberOfThreads Average run time(ms)
1 1350ms
2 1330ms
4 1350ms
8 1380ms

It looks like it was closed successfully. Could it be a problem with my data? Can you help me confirm the support of msvc2019 for openmp? Thanks a lot.

@mvieth
Copy link
Member

mvieth commented Sep 12, 2024

@Ru1yi Can you post the point cloud you are using as a zipped PCD or PLY file? I will also try to test Convolution3D with MSVC as soon as I can.

@Ru1yi
Copy link
Author

Ru1yi commented Sep 13, 2024

@mvieth 3644.500049090.zip Here is the data I used for testing, which was collected using the Robosense 128-line mechanical lidar. Thank you for your support.

@mvieth
Copy link
Member

mvieth commented Sep 13, 2024

I did some testing (VS2022, PCL 1.13.0, with 3644.100042090.pcd), but I didn't notice any increasing run time with more threads. 2 threads are always faster than 1 thread. I did notice that at some point, more threads did not make it any faster, maybe around 4 threads (even though my computer has 6 physical cores with 2 hyperthreads each). I only had to enable OpenMP support at one place in the project configuration (set to Yes (/openmp)). Do you build in Debug or Release configuration? Do you run with or without debugging?

@Ru1yi
Copy link
Author

Ru1yi commented Sep 18, 2024

Because I use Qt5 and VS2019 for joint development, the version of PCL is 1.12.1. If you use VS2022 and PCL1.13.0 or above, you need to develop with Qt6. In my previous tests, PCL1.12.1 does not support MSVC2022, and migrating the entire program to Qt6 is a large workload, so I want to know whether the version will affect this problem. I build this project in Debug and run with debugging.

@mvieth
Copy link
Member

mvieth commented Sep 18, 2024

I don't think that PCL 1.12.1 vs PCL 1.13.0 or VS2019 vs VS2022 makes any difference for the multithreading performance. I would suggest to also test with your project built in Release configuration and run without debugging. I could imagine that those two lead to more thread management overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Type of issue status: triage Labels incomplete
Projects
None yet
Development

No branches or pull requests

2 participants