-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unions on multiple bitmaps at a time #58
Conversation
65e4a9f
to
713eb1a
Compare
713eb1a
to
60138bb
Compare
I just added some benchmarks and the results are not so great, probably because bitmaps lengths are not that big.
|
Even with more bitmaps the banchmarks results are not good at all.
In my understanding the advantage we have with the EDIT I read again what you said in your comment and yeah, I need to extract the stores from the containers and don't do the operation directly on the containers (because it calls |
I think a signature something like this would make sense: impl RoaringBitmap {
pub fn union_of(bitmaps: impl IntoIterator<Item = &RoaringBitmap>) -> Self
} there's no reuse of existing storage so having a I think there's likely a way to make |
I updated the algorithm to do the operations directly on the Obviously performances are impacted by the |
Ok, so I achieve better performances by using a custom
|
630525b
to
7bab470
Compare
I recommend referring back to the Java approach which has been emulated in C and Go... The code is like so... RoaringBitmap answer = new RoaringBitmap();
for (int k = 0; k < bitmaps.length; ++k) {
answer.naivelazyor(bitmaps[k]);
}
answer.repairAfterLazy();
return answer; In turn naivelazyor calls What this does, roughly, is that...
Then "repairAfterLazy" goes through the code and checks the bitmap containers and possibly convert them back to arrays. The intuition is as follows... the union between two bitmap containers, or between an array container and a bitmap container, or even between two array container when the output is a bitmap container, can be done efficiently (CPU wise). And if you have many Roaring bitmaps to begin with, you are likely to end up with bitmap containers in the end, so let us jump straight ahead and create them early in the process. Empirically, this works well quite often. |
@@ -129,6 +129,31 @@ fn union_with(c: &mut Criterion) { | |||
}); | |||
} | |||
|
|||
fn union_of(c: &mut Criterion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that it measures the time it takes to aggregate a few deterministic tiny sets... Tiny sets in a branchy setting lead to incorrect benchmarks because of branch prediction... see https://www.infoq.com/articles/making-code-faster-taming-branches/
I suggest using realistic data sets. See for example https://github.com/RoaringBitmap/RoaringBitmap/tree/master/real-roaring-dataset/src/main/resources/real-roaring-dataset
223: Implements multioperation for the bitmaps and tree maps r=Kerollmops a=irevoire Fixes RoaringBitmap#57, closes RoaringBitmap#58, closes RoaringBitmap#109, closes RoaringBitmap#139, and closes RoaringBitmap#219. There is a lot of performance improvement, but here is a before / after on the operations that were the faster currently (when we can do assign between owned bitmaps). ## And ``` group after before ----- ----- ------ Successive And/Multi And Owned/census-income 1.00 14.6±0.25µs ? ?/sec 15.42 224.9±0.76µs ? ?/sec Successive And/Multi And Owned/census-income_srt 1.00 14.2±0.25µs ? ?/sec 3.98 56.4±8.22µs ? ?/sec Successive And/Multi And Owned/census1881 1.00 20.7±0.33µs ? ?/sec 37.18 770.1±1.62µs ? ?/sec Successive And/Multi And Owned/census1881_srt 1.00 25.8±1.29µs ? ?/sec 1.12 28.8±0.09µs ? ?/sec Successive And/Multi And Owned/weather_sept_85 1.00 60.7±2.48µs ? ?/sec 2.15 130.2±2.96µs ? ?/sec Successive And/Multi And Owned/weather_sept_85_srt 1.00 48.3±2.21µs ? ?/sec 2.32 112.2±1.07µs ? ?/sec Successive And/Multi And Owned/wikileaks-noquotes 1.00 24.4±0.50µs ? ?/sec 2.73 66.6±0.27µs ? ?/sec Successive And/Multi And Owned/wikileaks-noquotes_srt 1.00 20.3±0.58µs ? ?/sec 1.09 22.0±0.30µs ? ?/sec ``` ## Or ``` group after before ----- ----- ------ Successive Or/Multi Or Owned/census-income 1.00 629.3±4.46µs ? ?/sec 2.29 1441.4±41.36µs ? ?/sec Successive Or/Multi Or Owned/census-income_srt 1.00 582.5±1.81µs ? ?/sec 1.61 937.8±4.03µs ? ?/sec Successive Or/Multi Or Owned/census1881 1.00 1143.4±4.55µs ? ?/sec 3.48 4.0±0.07ms ? ?/sec Successive Or/Multi Or Owned/census1881_srt 1.00 743.4±4.40µs ? ?/sec 3.49 2.6±0.02ms ? ?/sec Successive Or/Multi Or Owned/weather_sept_85 1.00 2.9±0.02ms ? ?/sec 1.06 3.1±0.01ms ? ?/sec Successive Or/Multi Or Owned/weather_sept_85_srt 1.00 1344.5±7.80µs ? ?/sec 1.06 1426.5±38.08µs ? ?/sec Successive Or/Multi Or Owned/wikileaks-noquotes 1.00 476.3±4.43µs ? ?/sec 5.27 2.5±0.01ms ? ?/sec Successive Or/Multi Or Owned/wikileaks-noquotes_srt 1.00 259.4±3.90µs ? ?/sec 7.17 1860.0±3.30µs ? ?/sec ``` Co-authored-by: saik0 <github@saik0.net> Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: Tamo <irevoire@protonmail.ch>
I would like to introduce methods to do operations on multiple bitmaps at a time, this can greatly improve performances. This PR only introduce the union operation.
In this PR I introduce the
RoaringBitmap::union_of
method which take aIterator
ofBitmaps
and modify returns a new bitmap. Internally it uses aBinaryHeap
to always find the lowest containers of the operand bitmaps.I am not sure of the function signature, the other operations are namedunion_with
and take a mutable self ref. The function signature I use here would be invalid for difference and symmetric difference, as the operations must be done on a basebitmap
.I changed it to become an in-place operation.And I change it again to be kind of a constructor.Related to #57.