Introduce a test that highlight a serialization bug #61

Kerollmops · 2020-09-09T16:37:49Z

Hello,

I found a bug while using roaring in the next version of the MeiliSearch engine, it appears that the serialize function doesn't correctly represent the integers in the serialized format.

I haven't had time to dive into the serialise function yet.
Could you take a look please?

Note that the original integers array was 77MB as an array text, I found a way to reduce that to 4096 values but there surely is a smaller input than this.

Thank you for your help :)

lemire · 2020-09-09T16:40:18Z

Could you post what gets deserialized? It might be possible to identify the bug just by looking at it.

Kerollmops · 2020-09-09T16:41:50Z

I can give you the debug display if you want:

 left: `RoaringBitmap<4097 values between 6619162 and 108658947>`,
right: `RoaringBitmap<4097 values between 6619137 and 108658947>`', tests/serialization.rs:96:5

Do you want more?

lemire · 2020-09-09T16:44:28Z

That's helpful because it points at an issue in the deserialization of the bitmap containers.

Kerollmops · 2020-09-09T17:01:57Z

What makes you think that? Why can't it be a serialization bug?

lemire · 2020-09-09T17:23:32Z

What makes you think that? Why can't it be a serialization bug?

Well, whether it is serialization or deserialization is impossible to tell without further work but what the numbers you got suggest is that it has to do with bitmap containers.

lemire · 2020-09-09T17:58:44Z

Yep. It is deserialization bug... see my fix at...

#62

So it is a tiny thing. By definition of the Roaring Bitmap format, if you have 4096 elements or less, you have an array container. But roaring-rs, at the deserialization level, had a "<" instead of a "<=" so it is an "off by one" error. My patch fixes that.

(There is a formal specification where this is spelled out.)

lemire · 2020-09-09T18:00:42Z

So why did I expected a deserialization bug? Because I expected the data structure to be fine (since otherwise you'd get in trouble quickly) and because the serialization/deserialization test have lots of checks during the serialization (e.g., the code tests that the right number of bytes were written) but relatively fewer tests during the deserialization. Of course, it could still have been at the serialization...

lemire · 2020-09-09T18:04:21Z

(For clarification, I did not write a single line of code in roaring-rs. I just happen to know the Roaring format well)

Kerollmops · 2020-09-09T18:32:16Z

Wow!

Thank you very much for this quick fix.
I don't think I could have found it as fast as you.

lemire · 2020-09-09T18:33:36Z

@Kerollmops Quick! We want to know how well this library is serving you!

62: Bug fix for serialization bug r=Nemo157 a=lemire See #61 Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Daniel Lemire <lemire@gmail.com>

Kerollmops · 2020-09-10T11:16:27Z

Hey @lemire,

Wasn't sure if you asked me to give some feedback, so I ask friends!

So yeah, I am pretty happy with the library so far, it is fast and easy to use, I have just some little remarks about the library:

The serialized format is too heavy for little sets (less than 100 ints), I achived to gain 26% on memory when I serialized by hand the few integers I had (like a simple slice), the headers are really heavy. The payload was simply smaller than the headers.

I also needed a function to do an operation on multiple bitmaps at the same time, I know that there exist a multi-intersection on the original C library, but didn't check for a multi-is-disjoint function. I also tried to implement the multi-union but performances weren't there, not sure why.

Thank you both for this library :)

lemire · 2020-09-10T14:16:40Z

The serialized format is too heavy for little sets (less than 100 ints), I achived to gain 26% on memory when I serialized by hand the few integers I had (like a simple slice), the headers are really heavy. The payload was simply smaller than the headers.

Right. The Roaring Bitmap format tries to account for many different goals at once, and producing a concise output is only one of them. Furthermore, I think that roaring-rs does not support run containers, which is often a good way to compress the data.

I also needed a function to do an operation on multiple bitmaps at the same time, I know that there exist a multi-intersection on the original C library, but didn't check for a multi-is-disjoint function. I also tried to implement the multi-union but performances weren't there, not sure why.

Given that I only have a passing familiarity with roaring-rs, I can only comment in broad terms.

Assuming that it is not already in the library, writing a "is disjoint?" function (as opposed to computing the intersection and checking that it has a size of zero) is easy and can be beneficial. If it does not already exist, writing it ought to be very easy.
Doing a multi-bitmap intersection via a dedicated function is probably not so beneficial. Intersections over many bitmaps can be done efficiently using a naive two-by-two intersection, most of the time. If you have few bitmaps, start from the smallest one.
For the multi-union, the best approach in practice is again, often, a two-by-two union, but using the concept of a "lazy" union. The idea is to do the union two bitmaps by two bitmaps, but to produce an intermediate result which is not, strictly speaking, a valid roaring bitmap. Then when you are done with the whole computation, you can scan the result and make it a valid bitmap again. You can also use a priority queue, which can be better in some instances, but it is both more complicated, and more memory intensive.

There are lots of little performance tricks, and you will find them in the Go, Java and C versions of Roaring. I do not know how many of these tricks roaring-rs captured. There may be quite a few missing optimization opportunities left. Given that these techniques have been documented, implemented and tested out, it should be easy to add them to Rust.

(Note that I am not an author of roaring-rs.)

lemire · 2020-09-10T14:25:50Z

For a pretty cool "fast way" to do multi-intersection, please see this code...
https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastAggregation.java#L293

Kerollmops · 2020-09-10T15:23:11Z

Furthermore, I think that roaring-rs does not support run containers, which is often a good way to compress the data.

What the difference between bitset and run containers? I can maybe find that in the search paper that I already read... Is it only a way to compress the data a little more or does it also give performances improvements?

Assuming that it is not already in the library, writing a "is disjoint?" function (as opposed to computing the intersection and checking that it has a size of zero) is easy and can be beneficial. If it does not already exist, writing it ought to be very easy.

The roaring-rs library already has the is_disjoint method but doesn't support any multi-bitmaps functions. I was pretty sure that using an is_disjoint for multiple bitmaps would be much faster than computing the intermediate intersections and checking if the final intersection is empty. Not sure if it is easy to implement or even possible.

Doing a multi-bitmap intersection via a dedicated function is probably not so beneficial. Intersections over many bitmaps can be done efficiently using a naive two-by-two intersection, most of the time. If you have few bitmaps, start from the smallest one.

Yeah was already doing that, sorting them by size, this is much faster indeed. I also find a way to stop retrieving bitmaps from the kv-store when the latest intermediate intersection produced an empty bitmap. But this can't be integrated into roaring.

For the multi-union, the best approach in practice is again, often, a two-by-two union, but using the concept of a "lazy" union. The idea is to do the union two bitmaps by two bitmaps, but to produce an intermediate result which is not, strictly speaking, a valid roaring bitmap. Then when you are done with the whole computation, you can scan the result and make it a valid bitmap again. You can also use a priority queue, which can be better in some instances, but it is both more complicated, and more memory intensive.

Yeah I read that in the C implementation if I remember well, only computing the final container type (bitset or array) at the end of the process, to avoid the type switch overhead. Unfortunatelly I wasn't able to make it fast enough in my PR, not sure why.

EDIT: Wow, didn't see your message on the PR.

Thank you for your help again.

lemire · 2020-09-10T15:26:56Z

What the difference between bitset and run containers? I can maybe find that in the search paper that I already read... Is it only a way to compress the data a little more or does it also give performances improvements?

It is quite different. There is already a PR in this repository for run containers, but it has not been merged.

lemire · 2020-09-10T15:28:09Z

I was pretty sure that using an is_disjoint for multiple bitmaps would be much faster than computing the intermediate intersections and checking if the final intersection is empty. Not sure if it is easy to implement or even possible.

This seems a bit speculative. It is best answered with a realistic benchmark and hard data.

Kerollmops · 2020-09-10T15:28:19Z

I think I understand what could be a run container: 0..23 followed by 563..4000.
Only storing the start and end of runs!

lemire · 2020-09-10T15:29:02Z

But this can't be integrated into roaring.

I don't think it needs to.

lemire · 2020-09-10T15:29:12Z

Only storing the start and end of runs!

Correct.

lemire · 2020-09-10T15:36:33Z

Yeah I read that in the C implementation if I remember well, only computing the final container type (bitset or array) at the end of the process, to avoid the type switch overhead.

I would not put it quite like that.

We may actually trigger conversions that would be unneeded, deliberately.

A bitmap container requires 1024 words. Each time you do a union, you need something like 1024 operations. Less if you have (auto)vectorization. If you can count on 128-bit SIMD registers, it is something like 512 operations. This sounds a lot but it is entirely free of branches. Now, each mispredicted branch is ~15 cycles lost. So if you are doing a lot arrays-to-arrays union, you can quickly end up spending a lot more than 1024 cycles doing a single union... because it is filled with "hard to predict branches".

Note that if your benchmark is poorly constructed (synthetic data, small data), you won't see these effects. Use large and unpredictable data for sanity.

So the actual trick is to eagerly represent your data using bitmap containers (bitset) because then you get a really simple computational model with very straight-forward performance.

Of course, it is not black and white. So one needs to benchmark and tweak the heuristics.

62: Bug fix for serialization bug r=Nemo157 a=lemire See RoaringBitmap#61 Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Daniel Lemire <lemire@gmail.com>

Introduce a test that highlight a serialization bug

6c1b644

lemire mentioned this pull request Sep 9, 2020

Bug fix for serialization bug #62

Merged

bors bot added a commit that referenced this pull request Sep 10, 2020

Merge #62

5bc2b7c

62: Bug fix for serialization bug r=Nemo157 a=lemire See #61 Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Daniel Lemire <lemire@gmail.com>

bors bot merged commit 6c1b644 into RoaringBitmap:master Sep 10, 2020

Kerollmops deleted the de-serialize-bug branch September 10, 2020 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a test that highlight a serialization bug #61

Introduce a test that highlight a serialization bug #61

Kerollmops commented Sep 9, 2020 •

edited

Loading

lemire commented Sep 9, 2020

Kerollmops commented Sep 9, 2020 •

edited

Loading

lemire commented Sep 9, 2020

Kerollmops commented Sep 9, 2020

lemire commented Sep 9, 2020

lemire commented Sep 9, 2020

lemire commented Sep 9, 2020

lemire commented Sep 9, 2020

Kerollmops commented Sep 9, 2020

lemire commented Sep 9, 2020

Kerollmops commented Sep 10, 2020

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

Kerollmops commented Sep 10, 2020 •

edited

Loading

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

Kerollmops commented Sep 10, 2020 •

edited

Loading

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

Introduce a test that highlight a serialization bug #61

Introduce a test that highlight a serialization bug #61

Conversation

Kerollmops commented Sep 9, 2020 • edited Loading

lemire commented Sep 9, 2020

Kerollmops commented Sep 9, 2020 • edited Loading

lemire commented Sep 9, 2020

Kerollmops commented Sep 9, 2020

lemire commented Sep 9, 2020

lemire commented Sep 9, 2020

lemire commented Sep 9, 2020

lemire commented Sep 9, 2020

Kerollmops commented Sep 9, 2020

lemire commented Sep 9, 2020

Kerollmops commented Sep 10, 2020

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

Kerollmops commented Sep 10, 2020 • edited Loading

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

Kerollmops commented Sep 10, 2020 • edited Loading

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

lemire commented Sep 10, 2020

Kerollmops commented Sep 9, 2020 •

edited

Loading

Kerollmops commented Sep 9, 2020 •

edited

Loading

Kerollmops commented Sep 10, 2020 •

edited

Loading

Kerollmops commented Sep 10, 2020 •

edited

Loading