Skip to content

Commit

Permalink
GH-15290: [C++][Compute] Optimize IfElse kernel AAS/ASA case when the…
Browse files Browse the repository at this point in the history
… scalar is null (#15291)

# Which issue does this PR close?

<!--
Thanks for opening a pull request!
If this is your first pull request you can find detailed information on how 
to contribute here:
  * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
  * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)


If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

    GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

    MINOR: [${COMPONENT}] ${SUMMARY}

In the case of old issues on JIRA the title also supports:

    ARROW-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
    PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

-->
Closes #15290

# Rationale for this change

<!--
 Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
 Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
-->

In AAS and ASA case, when the scalar is null, we only need to construct a new validity bitmap and copy the input array data to output. We can skip looping over the input array.

# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
-->

The above optimization for numeric and binary arrays. 

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->

Tested with the original IfElse tests.

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `breaking-change` label.
-->

No.
* Closes: #15290

Authored-by: Jin Shang <shangjin1997@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
  • Loading branch information
js8544 authored Jan 12, 2023
1 parent 3d26a43 commit 37a7965
Showing 1 changed file with 38 additions and 3 deletions.
41 changes: 38 additions & 3 deletions cpp/src/arrow/compute/kernels/scalar_if_else.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,16 @@
// specific language governing permissions and limitations
// under the License.

#include <cstring>
#include "arrow/array/builder_nested.h"
#include "arrow/array/builder_primitive.h"
#include "arrow/array/builder_time.h"
#include "arrow/array/builder_union.h"
#include "arrow/compute/api.h"
#include "arrow/compute/kernels/codegen_internal.h"
#include "arrow/compute/kernels/copy_data_internal.h"
#include "arrow/result.h"
#include "arrow/status.h"
#include "arrow/util/bit_block_counter.h"
#include "arrow/util/bit_run_reader.h"
#include "arrow/util/bitmap.h"
Expand Down Expand Up @@ -470,6 +473,10 @@ struct IfElseFunctor<Type,
// copy right data to out_buff
std::memcpy(out_values, right.GetValues<T>(1), right.length * sizeof(T));

if (!left.is_valid) { // left is null scalar, only need to copy right data to output
return Status::OK();
}

// selectively copy values from left data
T left_data = internal::UnboxScalar<Type>::Unbox(left);

Expand All @@ -490,6 +497,10 @@ struct IfElseFunctor<Type,
const T* left_data = left.GetValues<T>(1);
std::memcpy(out_values, left_data, left.length * sizeof(T));

if (!right.is_valid) { // right is null scalar, only need to copy left data to output
return Status::OK();
}

T right_data = internal::UnboxScalar<Type>::Unbox(right);

RunIfElseLoopInverted(cond, [&](int64_t data_offset, int64_t num_elems) {
Expand Down Expand Up @@ -723,12 +734,24 @@ struct IfElseFunctor<Type, enable_if_base_binary<Type>> {
// ASA
static Status Call(KernelContext* ctx, const ArraySpan& cond, const Scalar& left,
const ArraySpan& right, ExecResult* out) {
std::string_view left_data = internal::UnboxScalar<Type>::Unbox(left);
auto left_size = static_cast<OffsetType>(left_data.size());

const auto* right_offsets = right.GetValues<OffsetType>(1);
const uint8_t* right_data = right.buffers[2].data;

if (!left.is_valid) { // left is null scalar, only need to copy right data to output
auto* out_data = out->array_data().get();
auto offset_length = (cond.length + 1) * sizeof(OffsetType);
ARROW_ASSIGN_OR_RAISE(out_data->buffers[1], ctx->Allocate(offset_length));
std::memcpy(out_data->buffers[1]->mutable_data(), right_offsets, offset_length);

auto right_data_length = right_offsets[right.length] - right_offsets[0];
ARROW_ASSIGN_OR_RAISE(out_data->buffers[2], ctx->Allocate(right_data_length));
std::memcpy(out_data->buffers[2]->mutable_data(), right_data, right_data_length);
return Status::OK();
}

std::string_view left_data = internal::UnboxScalar<Type>::Unbox(left);
auto left_size = static_cast<OffsetType>(left_data.size());

// allocate data buffer conservatively
int64_t data_buff_alloc =
left_size * cond.length + right_offsets[right.length] - right_offsets[0];
Expand All @@ -754,6 +777,18 @@ struct IfElseFunctor<Type, enable_if_base_binary<Type>> {
const auto* left_offsets = left.GetValues<OffsetType>(1);
const uint8_t* left_data = left.buffers[2].data;

if (!right.is_valid) { // right is null scalar, only need to copy left data to output
auto* out_data = out->array_data().get();
auto offset_length = (cond.length + 1) * sizeof(OffsetType);
ARROW_ASSIGN_OR_RAISE(out_data->buffers[1], ctx->Allocate(offset_length));
std::memcpy(out_data->buffers[1]->mutable_data(), left_offsets, offset_length);

auto left_data_length = left_offsets[left.length] - left_offsets[0];
ARROW_ASSIGN_OR_RAISE(out_data->buffers[2], ctx->Allocate(left_data_length));
std::memcpy(out_data->buffers[2]->mutable_data(), left_data, left_data_length);
return Status::OK();
}

std::string_view right_data = internal::UnboxScalar<Type>::Unbox(right);
auto right_size = static_cast<OffsetType>(right_data.size());

Expand Down

0 comments on commit 37a7965

Please sign in to comment.