-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types #7871
Conversation
frankdjx
commented
Jul 31, 2020
•
edited
Loading
edited
- Use BitBlockCounter to speedup the performance for typical 0.01% null probability data.
- Enable compiler auto SIMD vectorize for no-nulls on int types. Float/Double use fmin/fmax to handle NaN which can't be vectorize by compiler.
- Also add test case to cover different null probability.
I can trigger a benchmark action once #7870 get merged. Below is the BM data for int types on my setup:
|
9141619
to
553266e
Compare
553266e
to
4995295
Compare
4995295
to
1c56802
Compare
|
@ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark-filter=MinMax |
Below is the results for null_percent 0.01% and 0% on https://ci.ursalabs.org/#/builders/73/builds/101
|
@jianxind Sorry for the delay. Could you please rebase this PR? It looks like there are some conflicts now. |
1c56802
to
c9e781d
Compare
No problem at all. Rebased now. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. Some comments still.
Passing |
Below is the cmd I used, and compiler vectorise happens only on Int types.
|
Ah, I also had |
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
This reverts commit 8b5b1a6.
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
9d84640
to
ab6f2be
Compare
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Test failures are unrelated, will merge. |
…er types 1. Use BitBlockCounter to speedup the performance for typical 0.01% null probability data. 2. Enable compiler auto SIMD vectorize for no-nulls on int types. Float/Double use fmin/fmax to handle NaN which can't be vectorize by compiler. 3. Also add test case to cover different null probability. Closes apache#7871 from jianxind/kernel_min_max Lead-authored-by: Frank Du <frank.du@intel.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…er types 1. Use BitBlockCounter to speedup the performance for typical 0.01% null probability data. 2. Enable compiler auto SIMD vectorize for no-nulls on int types. Float/Double use fmin/fmax to handle NaN which can't be vectorize by compiler. 3. Also add test case to cover different null probability. Closes apache#7871 from jianxind/kernel_min_max Lead-authored-by: Frank Du <frank.du@intel.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
I've came here after looking at the code and being confused. It sounds like there was never a need to instantiate templates with a I might be wrong, in that case, I would love to have a pointer to the specialized code. |
@felipecrv I may be misunderstanding your question, but the |
I understand that, but if no specialization exists for different SIMD levels, we don't need more than the |
The source code is usually the same for all variations, but the generated code (which matters for ODR) varies thanks to different compiler options. arrow/cpp/src/arrow/CMakeLists.txt Lines 353 to 359 in 8b634ad
We do: arrow/cpp/src/arrow/compute/function.cc Lines 133 to 150 in 8b634ad
|
OK. Now I get it. The compiler options are source file specific and not global to the entire build. |
Right :-) I agree it's a bit difficult to follow. |
…e same code in different compilation units (#43720) ### Rationale for this change More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters. What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects. This PR organizes the code to make this more explicit. [1] #7871 (comment) ### What changes are included in this PR? - Introduction of aggregate_basic-inl.h - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`) ### Are these changes tested? By the compilation process, existing tests, and benchmarks. * GitHub Issue: #43719 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…rom the same code in different compilation units (apache#43720) ### Rationale for this change More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters. What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects. This PR organizes the code to make this more explicit. [1] apache#7871 (comment) ### What changes are included in this PR? - Introduction of aggregate_basic-inl.h - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`) ### Are these changes tested? By the compilation process, existing tests, and benchmarks. * GitHub Issue: apache#43719 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…rom the same code in different compilation units (apache#43720) ### Rationale for this change More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters. What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects. This PR organizes the code to make this more explicit. [1] apache#7871 (comment) ### What changes are included in this PR? - Introduction of aggregate_basic-inl.h - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`) ### Are these changes tested? By the compilation process, existing tests, and benchmarks. * GitHub Issue: apache#43719 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>