-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35765: [C++] Split vector_selection.cc into more compilation units #35751
Conversation
…eriving Selection
@pitrou @westonpace I tried to separate the code-moving commits from the ones that make changes to explain what's being done step-by-step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is not minor in code changes, even though most of it is moving code around. Can you open a GH issue?
const FunctionOptions* default_options, | ||
FunctionRegistry* registry); | ||
|
||
Status PreallocateData(KernelContext* ctx, int64_t length, int bit_width, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name PreallocateData
is probably too general, since it's selection-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to PreallocatePrimitiveArrayData
. It's here because it's used by both take
and filter
on primitive arrays.
@@ -58,2119 +60,22 @@ using internal::OptionalBitIndexer; | |||
namespace compute { | |||
namespace internal { | |||
|
|||
using FilterState = OptionsWrapper<FilterOptions>; | |||
using TakeState = OptionsWrapper<TakeOptions>; | |||
|
|||
int64_t GetFilterOutputSize(const ArraySpan& filter, | |||
FilterOptions::NullSelectionBehavior null_selection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function still useful, since it's a trivial wrapper around GetBitmapFilterOutputSize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It becomes this in #35750.
int64_t GetFilterOutputSize(const ArraySpan& filter,
FilterOptions::NullSelectionBehavior null_selection) {
if (filter.type->id() == Type::BOOL) {
return GetBitmapFilterOutputSize(filter, null_selection);
}
return GetREEFilterOutputSize(filter, null_selection);
I will move it completely to vector_selection_filter.cc
now and remove it here, but I haven't decided yet if the REEx* filter kernels are going to be implemented in vector_selection_filter.cc
or in a separate .cc
, so I might undo this later.
For reference: A minor PR touches <=2 files with no changes to behaviour. (e.g. fixing typos and similar things) |
00a43f7
to
5e3e42e
Compare
|
||
#include "arrow/array/data.h" | ||
#include "arrow/compute/api_vector.h" | ||
#include "arrow/compute/kernels/vector_selection_internal.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not include *internal.h
from a non *internal.h
headers because *internal.h
aren't installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector_selection_filter.h
is also internal — included only by vector_selection.cc
. Should I rename it with an internal
suffix in the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I just realized it's not needed. 😄 This is why I made PopulateFilterKernels
take an output param at some point. That allows me to forward-declare SelectionKernelData
and not include the full header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's been done with vector_selection_take.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the internal.h
suffix is important as the cmake uses it to decide which headers get installed. So if it is actually a private/internal header it should be renamed imo. But @kou would know best!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the
internal.h
suffix is important as the cmake uses it to decide which headers get installed. So if it is actually a private/internal header it should be renamed imo.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'm also renaming vector_selection_(filter|take).(h|cc)
and pushing now.
@felipecrv Is this ready for review again? |
Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for doing this @felipecrv
Benchmark runs are scheduled for baseline = 431785f and contender = 44d1b61. 44d1b61 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Rationale for this change
When working on #35001 I had a hard time figuring where to place the code for all possible combinations of filters and REE data.
vector_selection.cc
is hard to follow with so many kernels implemented in a single file. This PR splits the two biggest ones: filter and take. Stuff that can be shared by both stays is invector_selection_internal.cc
andvector_selection.cc
is concerned with the registering of the functions and a few smaller kernels.What changes are included in this PR?
vector_selection_(internal|take|filter).(cc|h)
source files were extracted fromvector_selection.cc
Are these changes tested?
Yes, by existing tests.