-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32107: [C++] Create Filter Kernels for REE values #35001
base: main
Are you sure you want to change the base?
Conversation
|
83a4655
to
ed4aafe
Compare
|
ed4aafe
to
c12e274
Compare
…lable to other compute kernels (#35002) ### Rationale for this change Make it easier to implement correct kernels dealing with run-end encoded data. ### What changes are included in this PR? Extraction of code to a header so kernels like REE filter kernels (see #35001) can use them. ### Are these changes tested? Indirectly via the `run_end_encode` and `run_end_decode` tests. ### Are there any user-facing changes? No, these are internal compute headers. Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
c12e274
to
28aa053
Compare
…lable to other compute kernels (apache#35002) ### Rationale for this change Make it easier to implement correct kernels dealing with run-end encoded data. ### What changes are included in this PR? Extraction of code to a header so kernels like REE filter kernels (see apache#35001) can use them. ### Are these changes tested? Indirectly via the `run_end_encode` and `run_end_decode` tests. ### Are there any user-facing changes? No, these are internal compute headers. Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
|
@felipecrv do you have any anecdotal benchmark numbers? |
@alippai For PlainxREE, only the intuition that for a sufficiently run-end-compressed filter it will be faster than scanning a full bitmap. But I will be writing benchmarks for the 3 combinations when I complete the unit tests. |
…lable to other compute kernels (apache#35002) ### Rationale for this change Make it easier to implement correct kernels dealing with run-end encoded data. ### What changes are included in this PR? Extraction of code to a header so kernels like REE filter kernels (see apache#35001) can use them. ### Are these changes tested? Indirectly via the `run_end_encode` and `run_end_decode` tests. ### Are there any user-facing changes? No, these are internal compute headers. Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…lable to other compute kernels (apache#35002) ### Rationale for this change Make it easier to implement correct kernels dealing with run-end encoded data. ### What changes are included in this PR? Extraction of code to a header so kernels like REE filter kernels (see apache#35001) can use them. ### Are these changes tested? Indirectly via the `run_end_encode` and `run_end_decode` tests. ### Are there any user-facing changes? No, these are internal compute headers. Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
bebe755
to
f53be80
Compare
…#35751) ### Rationale for this change When working on #35001 I had a hard time figuring where to place the code for all possible combinations of filters and REE data. `vector_selection.cc` is hard to follow with so many kernels implemented in a single file. This PR splits the two biggest ones: filter and take. Stuff that can be shared by both stays is in `vector_selection_internal.cc` and `vector_selection.cc` is concerned with the registering of the functions and a few smaller kernels. ### What changes are included in this PR? - [x] `vector_selection_(internal|take|filter).(cc|h)` source files were extracted from `vector_selection.cc` ### Are these changes tested? Yes, by existing tests. * Closes: #35765 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
897b922
to
7b8a753
Compare
Lead-authored-by: Tobias Zagorni <tobias@zagorni.eu> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Size goes down by 1_147_272 bytes. $ ninja libarrow.so && strip release/libarrow.so.1300.0.0 && du -sb release/libarrow.so.1300.0.0 [3/3] Creating library symlink release/libarrow.so.1300 release/libarrow.so 104555424 release/libarrow.so.1300.0.0
Another reduction of 1_073_528 bytes.
This doesn't affect binary size at all, but documents better what is expected of this visitor function.
7b8a753
to
2337e8d
Compare
As this has been incorporated in a different way -- there is a runtime check on Plain x * filters that check if the filter array is run-end encoded and dispatches the boolean run visits accordingly.
Rationale for this change
Run-end encoded arrays can very compactly represent filters with repeated values and run-end encoded arrays can be filtered efficiently without inflating all the runs for filtering. This PR implements algorithms that make it possible to leverage these possibilities.
What changes are included in this PR?
run_end_encode
)Closes ARROW-16774: [C++] Create Filter Kernel on RLE data #13657
Are these changes tested?
By unit tests that handle all the REE/Plain combinations.
Are there any user-facing changes?
Users don't control which filter kernels get picked directly, so there isn't a user-facing API area yet.