Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 64 bits integers for BITMAP type #2772

Merged
merged 4 commits into from
Jan 17, 2020
Merged

Conversation

gaodayue
Copy link
Contributor

@gaodayue gaodayue commented Jan 15, 2020

Fixes #2771

Main changes in this CL

  • RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
  • leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
  • introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)

In order to support BIGINT element and keep backward compatibility, introduce two new format

SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

  1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the C++ version is different from the Java version. Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
  2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.

@gaodayue gaodayue requested review from kangkaisen and imay January 15, 2020 11:51
@gaodayue gaodayue changed the title Support unsigned BIGINT element for BITMAP type Support 64 bits integers for BITMAP type Jan 16, 2020
@gaodayue
Copy link
Contributor Author

Hi @kangkaisen @imay , I have addressed previous review comments and passed the CI, please review again.

kangkaisen
kangkaisen previously approved these changes Jan 16, 2020
Copy link
Contributor

@kangkaisen kangkaisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. LGTM.

be/src/exprs/bitmap_function.cpp Outdated Show resolved Hide resolved
be/src/util/bitmap_value.h Show resolved Hide resolved
Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imay imay merged commit 3b24287 into apache:master Jan 17, 2020
@chaoyli chaoyli mentioned this pull request Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support 64-bits integers as elements for BITMAP type
3 participants