Update workflow for changes in benchmark tooling #54

mosullivan93 · 2023-06-29T04:41:10Z

Google's Benchmark project has rearranged some of the files we need for the testing workflows. This commit points pip towards to correct requirements.txt for the latest versions of the framework. Alternatively, we could use -b <name> to ensure we always use the same version of the benchmark dependency.

mosullivan@melchior:~/source/repos/googlebenchmarks> git diff v1.8.0:requirements.txt HEAD:tools/requirements.txt
diff --git a/requirements.txt b/tools/requirements.txt
index 1c8a4bd..afbc596 100644
--- a/requirements.txt
+++ b/tools/requirements.txt
@@ -1,2 +1,2 @@
-numpy == 1.22
+numpy == 1.25
 scipy == 1.5.4

Google's Benchmark project has rearranged some of the files we need for the testing workflows. This commit points pip towards to correct requirements.txt for the latest versions of the framework.

mosullivan93 · 2023-06-30T07:41:18Z

First of all, unfortunately it seems we can't control whether the Github-hosted runners for tests will have AVX512 compatible CPUs. This shouldn't be a problem in general, but it is for reasons I detail below.

By compiling and running the benchmarks using sde -knl I found that illegal instructions can be encountered during the benchmarks (despite the runtime checks in utils/cpuinfo.cpp) when using -O3 (specifically -fvect-cost-model seems to be the culprit). See:

mosullivan@sprvm-1:~/x86-simd-sort$ git status
On branch workflow-update
Your branch is up to date with 'origin/workflow-update'.

nothing to commit, working tree clean
mosullivan@sprvm-1:~/x86-simd-sort$ make -s -B bench
mosullivan@sprvm-1:~/x86-simd-sort$ sde -knl -- ./benchexe --benchmark_filter='avx.*reverse.*uint16'
2023-06-30T07:19:23+00:00
Running ./benchexe
Run on (8 X 2700 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 2048 KiB (x4)
  L3 Unified 107520 KiB (x1)
Load Average: 0.15, 0.07, 0.01
TID 0 SDE-ERROR: Executed instruction not valid for specified chip (KNL): 0x55e418465820: vmovdqu16 ymm1, ymmword ptr [rax]
Image: /home/mosullivan/x86-simd-sort/benchexe+0x53820 (in multi-region image, region# 1)
Function: _Z11avx512qsortItJiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRN9benchmark5StateEDpOT0_
Instruction bytes are: 62 f1 ff 28 6f 08

As far as I can tell, this instruction would only occur for the _mm256_maskz_loadu_epi16 found in replace_nan_with_inf(uint16_t *arr, int64_t arrsize), but this shouldn't be being reached... Turns out that if you comment out this line then it won't encounter the illegal instruction:

x86-simd-sort/benchmarks/bench-qsort.hpp

Line 70 in 348343e

std::reverse(arr.begin(), arr.end());

Obviously since want the "reverse" tests to actually have a reversed array, I tried a custom comparator in std::sort. It seems that the runtime check can work as expected with this change:

mosullivan@sprvm-1:~/x86-simd-sort$ git apply <<PATCH
diff --git a/benchmarks/bench-qsort.hpp b/benchmarks/bench-qsort.hpp
index ae02ac9..185ef19 100644
--- a/benchmarks/bench-qsort.hpp
+++ b/benchmarks/bench-qsort.hpp
@@ -66,8 +66,7 @@ static void avx512qsort(benchmark::State &state, Args &&...args)
     }
     else if (arrtype == "reverse") {
         arr = get_uniform_rand_array<T>(ARRSIZE);
-        std::sort(arr.begin(), arr.end());
-        std::reverse(arr.begin(), arr.end());
+        std::sort(arr.begin(), arr.end(), std::greater<T>());
     }
     arr_bkp = arr;

PATCH
mosullivan@sprvm-1:~/x86-simd-sort$ make -s -B bench
mosullivan@sprvm-1:~/x86-simd-sort$ sde -knl -- ./benchexe --benchmark_filter='avx.*reverse.*uint16'
2023-06-30T07:19:59+00:00
Running ./benchexe
Run on (8 X 2700 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 2048 KiB (x4)
  L3 Unified 107520 KiB (x1)
Load Average: 0.22, 0.09, 0.02
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
avx512qsort/reverse_10k/uint16_t SKIPPED: 'Requires AVX512 BW ISA'

So, do you want me to make a new PR to fix this weird behaviour or put it in here?

Workaround:

diff --git a/benchmarks/bench-qsort.hpp b/benchmarks/bench-qsort.hpp
index ae02ac9..23c16ca 100644
--- a/benchmarks/bench-qsort.hpp
+++ b/benchmarks/bench-qsort.hpp
@@ -23,8 +23,7 @@ static void stdsort(benchmark::State &state, Args &&...args)
     }
     else if (arrtype == "reverse") {
         arr = get_uniform_rand_array<T>(ARRSIZE);
-        std::sort(arr.begin(), arr.end());
-        std::reverse(arr.begin(), arr.end());
+        std::sort(arr.begin(), arr.end(), std::greater<T>());
     }
     arr_bkp = arr;
 
@@ -66,8 +65,7 @@ static void avx512qsort(benchmark::State &state, Args &&...args)
     }
     else if (arrtype == "reverse") {
         arr = get_uniform_rand_array<T>(ARRSIZE);
-        std::sort(arr.begin(), arr.end());
-        std::reverse(arr.begin(), arr.end());
+        std::sort(arr.begin(), arr.end(), std::greater<T>());
     }
     arr_bkp = arr;

r-devulap

Thank you!

r-devulap · 2023-07-20T18:13:45Z

I think the benchmarks isn't compiled with the right attributes. We are compiling the entire file with-march=icelake-client and the compiler can chose to emit any instruction it wants for all of the non avx512 sorting code.

I wouldn't worry about it. This particular CI isn't a "Required" one to pass to enabling merging a PR. I am considering just getting rid of it all together.

mosullivan93 · 2023-07-21T02:44:30Z

I was thinking of zhoozhing this CI up a bit by having it run the benchmarks separately first and writing the results to JSON. This could then be split into qsort/qselect/partial_qsort comparisons using jq. Plus I could then have it exit when the CPU is not AVX512 compatible. Gets me worried for a moment when I see the little red x after a push, but I can let it go if it's not too important in the grand scheme of things 😄

Update workflow for changes in benchmark tooling

348343e

Google's Benchmark project has rearranged some of the files we need for the testing workflows. This commit points pip towards to correct requirements.txt for the latest versions of the framework.

mosullivan93 mentioned this pull request Jun 29, 2023

Reduce overhead and optimise partial sorting algorithm #53

Closed

r-devulap approved these changes Jul 20, 2023

View reviewed changes

r-devulap merged commit f220118 into intel:main Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update workflow for changes in benchmark tooling #54

Update workflow for changes in benchmark tooling #54

mosullivan93 commented Jun 29, 2023 •

edited

Loading

mosullivan93 commented Jun 30, 2023 •

edited

Loading

r-devulap left a comment

r-devulap commented Jul 20, 2023

mosullivan93 commented Jul 21, 2023

Update workflow for changes in benchmark tooling #54

Update workflow for changes in benchmark tooling #54

Conversation

mosullivan93 commented Jun 29, 2023 • edited Loading

mosullivan93 commented Jun 30, 2023 • edited Loading

r-devulap left a comment

Choose a reason for hiding this comment

r-devulap commented Jul 20, 2023

mosullivan93 commented Jul 21, 2023

mosullivan93 commented Jun 29, 2023 •

edited

Loading

mosullivan93 commented Jun 30, 2023 •

edited

Loading