-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster organized search #4496
Faster organized search #4496
Conversation
Using a vector (ordered by distance) instead of a priority queue makes the nearest k search faster: - with a vector, it is possible to reserve() the needed space, but not with a priority queue - adding a new element to the priority queue is done with push(). In contrast, inserting a new element into the ordered vector is done by first determining the correct place with upper_bound(), and then calling emplace - removing the element with the largest distance (to make sure that there are at most k entries) is done with pop() for the priority queue (logarithmic complexity), and with pop_back() for the vector (constant complexity) - in the end, when k_indices and k_sqr_distances are filled with the info from results, it is faster to use a vector, since the pop() of priority queue (logarithmic complexity) can be avoided - also: caching the size instead of calling size() over and over To compare the speed of this new implementation with the current one, I measured the time that NormalEstimation needs because it spends most of its time searching the neighbors of each point. I used the datasets milk_cartoon_all_small_clorox.pcd and table_scene_mug_stereo_textured.pcd, each with k=50 and k=100. With this new organized search, NormalEstimation needs only about 80% of the time it took before, so I estimate the nearest k search takes only about three quarters of the time it took before.
Hey guys, sorry but the change to vector from a heap does not make much sense. So when measuring the speed, it should also be done using a large k, since for small k AND also complex clouds. Using a single measure that uses a small k will not give us the right data to make a call. Also there are other options to make this faster.
|
@suat-gedikli I've amended the PR message with the reasoning posted by @mvieth I'm adding @koide3 because he's been throwing multiple benchmarks quickly and will be able to help us here. One thing to note is that the notation you've used is for asymptotic upper bound. The actual performance might be dictated by the constants and lower order variables, particularly at smaller values of Here's a small demonstration of lookup speed difference in just lookup: http://rrowniak.com/performance/stdvector-vs-sorted-stdvector-vs-stdset/ Conjecture and disclaimer: |
I did some benchmark on
|
@suat-gedikli Thank you for your interest in this!
As I wrote in my explanation, the `pop' of the priority queue is rather costly, while reading from a vector is (almost) free.
The complexity of
Sure, that is something we could try, but to me it seems rather complex, since that would be a totally new container/container adapter.
Also interesting, but would work with both priority queue and sorted vector, right? (although it might work better with the strengths and weaknesses of one or the other).
I am not generally opposed to reverting this, but would only really support it if there was clear evidence that the new version is slower. If you have specific benchmarks in mind, please run them and post the results. Thank you @koide3 for the benchmarks. Based on your results, we could even think about switching between the two implementations based on Here are a few more benchmarks I got for different OS while working on #4506:
Notes: it seems like on Windows x86, the architecture was different (2394 MHz CPU for priority queue, 2095 MHz CPU for sorted vector), and for Windows x64 as well (2295 MHz CPU for priority queue, 2095 MHz CPU for sorted vector). I am not totally sure why the sorted vector is that much better on Ubuntu 20.04, I think the load average was higher when the priority queue benchmark ran. |
Another supplement: boost has several implementations of priority queues and heaps, with much richer functionality than the stl priority queue. They (partly) support reserving memory, update/increase/decrease operations as a possible replacement of calling pop then push, and iterating through the heap without popping the elements. So that is something we could try out. Maybe that would be even faster than the sorted-vector implementation. |
See commit messages.
Using a vector (ordered by distance) instead of a priority queue makes the nearest k search faster:
To compare the speed of this new implementation with the current one, I measured the time that
NormalEstimation
needs because it spends most of its time searching the neighbors of each point. I used the datasets milk_cartoon_all_small_clorox.pcd and table_scene_mug_stereo_textured.pcd, each with k=50 and k=100. With this new organized search, NormalEstimation needs only about 80% of the time it took before, so I estimate the nearest k search takes only about three quarters of the time it took before.