GH-43693: [C++][Acero] Support AVX2 swiss join decoding #43832

zanmato1984 · 2024-08-26T17:08:27Z

Rationale for this change

You can find the background in #43693.

By looking at how Visit_avx2/VisitNulls_avx2's non-simd counterparts (Visit/VisitNulls) are used, I found they are solely for decoding rows from the build side of the join. So I added AVX2 versions for those decoding methods and wired Visit_avx2/VisitNulls_avx2.

What changes are included in this PR?

Split the decoding methods into smaller pieces to make each of them able to cooperate with the AVX2 version.
Concrete AVX2 specialized functions utilizing the Visit*_avx2 functions to decode fixed-length/offsets/var-length/nulls of the row table.
Fix some bugs in the original Visit*_avx2 functions.
Related benchmarks.

Are these changes tested?

No new tests needed.

The benchmarking result is a bit complicated, I put them in comment #43832 (comment).

Are there any user-facing changes?

No changes other than positive performance improvement. Users can expect such improvement for hash joins related workload. Nevertheless the improvement degree highly depends on not only the workload, but also the CPU models. For Intel CPUs from Skylake to Icelake/Tigerlake, which suffer the performance degradation of AVX2 gather because of an vulnerability mitigation of Intel's (detailed in #43832 (comment)), the improvement is less significant - single digit percent. Other models, e.g. AMD, and the most recent Intel, can achieve better improvement up to 30%.

GitHub Issue: [C++][Acero] AVX2 specialized swiss join functions not wired #43693

zanmato1984 · 2024-08-26T17:13:39Z

@github-actions crossbow submit -g cpp

zanmato1984 · 2024-08-26T17:14:23Z

@ursabot please benchmark

ursabot · 2024-08-26T17:14:29Z

Benchmark runs are scheduled for commit e2af277. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

github-actions · 2024-08-26T17:16:13Z

Revision: e2af277

Submitted crossbow builds: ursacomputing/crossbow @ actions-1db8e52faf

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-13-bundled
test-ubuntu-24.04-cpp-gcc-14

conbench-apache-arrow · 2024-08-26T22:24:35Z

Thanks for your patience. Conbench analyzed the 2 benchmarking runs that have been run so far on PR commit e2af277.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

zanmato1984 · 2024-09-06T07:04:05Z

The merge/rebase has now been fixed.

zanmato1984 · 2024-11-19T08:05:16Z

Hi @pitrou , do you get sometime to take a look at this? Thank you.

cc @westonpace @felipecrv @mapleFU

pitrou

Some comments and questions below. Are we sure these are tested thoroughly enough?

cpp/src/arrow/acero/swiss_join_avx2.cc

cpp/src/arrow/acero/swiss_join.cc

pitrou · 2024-11-19T09:53:30Z

cpp/src/arrow/acero/swiss_join.cc

+            uint64_t* dst = reinterpret_cast<uint64_t*>(
+                output->mutable_data(1) + num_bytes * (output_start_row + i));
+            const uint64_t* src = reinterpret_cast<const uint64_t*>(ptr);
+            for (uint32_t word_id = 0;
+                 word_id < bit_util::CeilDiv(num_bytes, sizeof(uint64_t)); ++word_id) {
+              arrow::util::SafeStore<uint64_t>(dst + word_id,
+                                               arrow::util::SafeLoad(src + word_id));
+            }


So this is a crude hand-written memcpy that overshoots the copy length?

Yes this is.

pitrou · 2024-11-19T09:53:43Z

cpp/src/arrow/acero/swiss_join.cc

+        uint64_t* dst = reinterpret_cast<uint64_t*>(
+            output->mutable_data(2) + reinterpret_cast<const uint32_t*>(
+                                          output->mutable_data(1))[output_start_row + i]);
+        const uint64_t* src = reinterpret_cast<const uint64_t*>(ptr);
+        for (uint32_t word_id = 0;
+             word_id < bit_util::CeilDiv(num_bytes, sizeof(uint64_t)); ++word_id) {
+          arrow::util::SafeStore<uint64_t>(dst + word_id,
+                                           arrow::util::SafeLoad(src + word_id));
+        }


Same question re memcpy.

pitrou · 2024-11-19T09:54:28Z

cpp/src/arrow/acero/swiss_join.cc

+                           const uint32_t* row_ids) const {
+  RowArrayAccessor::VisitNulls(
+      rows_, column_id, num_rows_to_append, row_ids, [&](int i, uint8_t value) {
+        bit_util::SetBitTo(output->mutable_data(0), output_start_row + i, value == 0);


Er, so the convention is that the null bytes in the RowArray store a 1 for a null value, and 0 for a non-null value?

cpp/src/arrow/acero/swiss_join_avx2.cc

pitrou · 2024-11-19T11:30:20Z

cpp/src/arrow/acero/swiss_join_avx2.cc

@@ -194,17 +202,19 @@ int RowArrayAccessor::Visit_avx2(const RowTableImpl& rows, int column_id, int nu
      //
      const uint8_t* row_ptr_base = rows.data(2);
      const RowTableImpl::offset_type* row_offsets = rows.offsets();
+      auto row_offsets_i64 =
+          reinterpret_cast<const arrow::util::int64_for_gather_t*>(row_offsets);
      for (int i = 0; i < num_rows / unroll; ++i) {


Can we take the opportunity to rename all these unroll constants to kUnroll?

cpp/src/arrow/acero/swiss_join_avx2.cc

zanmato1984 · 2024-11-19T12:48:08Z

Some comments and questions below. Are we sure these are tested thoroughly enough?

Just a quick answer to this specific question: yes the code changed are in a very common path that almost every swiss join case will run into, and my experience of developing/debugging this feature told me so too. So I'm positive that existing tests exercise them well.

And thank you for the rest of the thorough comments. Will get to them later.

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

zanmato1984 added 12 commits August 20, 2024 20:42

Reorg row array code to support avx2

624d3c9

Add hardware flags to relative functions

47072a8

Reorg code to make the avx2 path more clear

c002367

Decode selected avx2 WIP

9b92cec

Return processed rows in avx2 functions

27bc9da

WIP

5827fa2

WIP

e4d2c2e

WIP

b7cde57

Fix visit null avx2

d564859

Fix 1/2 bytes decoding

463dc03

Fix 1 bit decoding

cad7e25

Add decode offset avx2

e51cf4a

github-actions bot added Component: C++ awaiting review Awaiting review labels Aug 26, 2024

Typo

e2af277

zanmato1984 added 4 commits August 27, 2024 10:22

Fix warnings

7bcafad

Merge remote-tracking branch 'origin/main' into swiss-join-avx2

6285806

Fix warnings

70df4d4

Add benchmark

534b7c4

zanmato1984 changed the title ~~GH-43693: [C++][Acero] Support AVX swiss join decoding~~ GH-43693: [C++][Acero] Support AVX2 swiss join decoding Aug 29, 2024

zanmato1984 added 5 commits August 29, 2024 17:48

Remove useless pause/resume

2afaf47

Merge remote-tracking branch 'origin/main' into swiss-join-avx2

c46f901

Fix merge conflict

7ff0097

Fix warning

9f67ec9

Disable some avx2 path based on benchmarking result

942f953

zanmato1984 requested review from pitrou and removed request for kevingurney, sgilmore10, domoritz, trxcllnt, zeroshade, CurtHagenlocher and wgtmac September 6, 2024 06:59

pitrou reviewed Nov 19, 2024

View reviewed changes

zanmato1984 and others added 18 commits November 19, 2024 22:06

Update cpp/src/arrow/acero/swiss_join.cc

e093291

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Add mutable_data_as for ResizableArrayData

94833e7

unroll -> kUnroll

4616c2e

Update cpp/src/arrow/acero/swiss_join_avx2.cc

60fdef0

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Update cpp/src/arrow/acero/swiss_join_avx2.cc

2aa6221

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Update cpp/src/arrow/acero/swiss_join_avx2.cc

1901f51

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Apply suggestions from code review

1cc694a

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Format

182992a

Update comment rows -> values

4535982

Fix typo

5d3c4a2

Update comment rows -> values

062a9c7

Update cpp/src/arrow/acero/swiss_join_avx2.cc

8b940b0

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Format

184ffad

Update cpp/src/arrow/acero/swiss_join_avx2.cc

fae2379

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Update cpp/src/arrow/acero/swiss_join_avx2.cc

da5b537

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Replace c-style cast with reinterprete_cast

f086f72

Use mutable_data_as in swiss join avx2

925c069

Apply suggestions from code review

275917f

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-43693: [C++][Acero] Support AVX2 swiss join decoding #43832

GH-43693: [C++][Acero] Support AVX2 swiss join decoding #43832

zanmato1984 commented Aug 26, 2024 •

edited

Loading

zanmato1984 commented Aug 26, 2024

zanmato1984 commented Aug 26, 2024

ursabot commented Aug 26, 2024

github-actions bot commented Aug 26, 2024

conbench-apache-arrow bot commented Aug 26, 2024

zanmato1984 commented Sep 6, 2024

zanmato1984 commented Nov 19, 2024

pitrou left a comment

pitrou Nov 19, 2024

zanmato1984 Nov 19, 2024

pitrou Nov 19, 2024

zanmato1984 Nov 19, 2024

pitrou Nov 19, 2024

zanmato1984 Nov 19, 2024

pitrou Nov 19, 2024

zanmato1984 Nov 19, 2024

zanmato1984 commented Nov 19, 2024

GH-43693: [C++][Acero] Support AVX2 swiss join decoding #43832

Are you sure you want to change the base?

GH-43693: [C++][Acero] Support AVX2 swiss join decoding #43832

Conversation

zanmato1984 commented Aug 26, 2024 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

zanmato1984 commented Aug 26, 2024

zanmato1984 commented Aug 26, 2024

ursabot commented Aug 26, 2024

github-actions bot commented Aug 26, 2024

conbench-apache-arrow bot commented Aug 26, 2024

zanmato1984 commented Sep 6, 2024

zanmato1984 commented Nov 19, 2024

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zanmato1984 commented Nov 19, 2024

zanmato1984 commented Aug 26, 2024 •

edited

Loading