Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflow for changes in benchmark tooling #54

Merged
merged 1 commit into from
Jul 20, 2023

Conversation

mosullivan93
Copy link
Contributor

@mosullivan93 mosullivan93 commented Jun 29, 2023

Google's Benchmark project has rearranged some of the files we need for the testing workflows. This commit points pip towards to correct requirements.txt for the latest versions of the framework. Alternatively, we could use -b <name> to ensure we always use the same version of the benchmark dependency.

mosullivan@melchior:~/source/repos/googlebenchmarks> git diff v1.8.0:requirements.txt HEAD:tools/requirements.txt
diff --git a/requirements.txt b/tools/requirements.txt
index 1c8a4bd..afbc596 100644
--- a/requirements.txt
+++ b/tools/requirements.txt
@@ -1,2 +1,2 @@
-numpy == 1.22
+numpy == 1.25
 scipy == 1.5.4

Google's Benchmark project has rearranged some of the files we need for
the testing workflows. This commit points pip towards to correct
requirements.txt for the latest versions of the framework.
@mosullivan93
Copy link
Contributor Author

mosullivan93 commented Jun 30, 2023

First of all, unfortunately it seems we can't control whether the Github-hosted runners for tests will have AVX512 compatible CPUs. This shouldn't be a problem in general, but it is for reasons I detail below.

By compiling and running the benchmarks using sde -knl I found that illegal instructions can be encountered during the benchmarks (despite the runtime checks in utils/cpuinfo.cpp) when using -O3 (specifically -fvect-cost-model seems to be the culprit). See:

mosullivan@sprvm-1:~/x86-simd-sort$ git status
On branch workflow-update
Your branch is up to date with 'origin/workflow-update'.

nothing to commit, working tree clean
mosullivan@sprvm-1:~/x86-simd-sort$ make -s -B bench
mosullivan@sprvm-1:~/x86-simd-sort$ sde -knl -- ./benchexe --benchmark_filter='avx.*reverse.*uint16'
2023-06-30T07:19:23+00:00
Running ./benchexe
Run on (8 X 2700 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 2048 KiB (x4)
  L3 Unified 107520 KiB (x1)
Load Average: 0.15, 0.07, 0.01
TID 0 SDE-ERROR: Executed instruction not valid for specified chip (KNL): 0x55e418465820: vmovdqu16 ymm1, ymmword ptr [rax]
Image: /home/mosullivan/x86-simd-sort/benchexe+0x53820 (in multi-region image, region# 1)
Function: _Z11avx512qsortItJiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRN9benchmark5StateEDpOT0_
Instruction bytes are: 62 f1 ff 28 6f 08

As far as I can tell, this instruction would only occur for the _mm256_maskz_loadu_epi16 found in replace_nan_with_inf(uint16_t *arr, int64_t arrsize), but this shouldn't be being reached... Turns out that if you comment out this line then it won't encounter the illegal instruction:

std::reverse(arr.begin(), arr.end());

Obviously since want the "reverse" tests to actually have a reversed array, I tried a custom comparator in std::sort. It seems that the runtime check can work as expected with this change:

mosullivan@sprvm-1:~/x86-simd-sort$ git apply <<PATCH
diff --git a/benchmarks/bench-qsort.hpp b/benchmarks/bench-qsort.hpp
index ae02ac9..185ef19 100644
--- a/benchmarks/bench-qsort.hpp
+++ b/benchmarks/bench-qsort.hpp
@@ -66,8 +66,7 @@ static void avx512qsort(benchmark::State &state, Args &&...args)
     }
     else if (arrtype == "reverse") {
         arr = get_uniform_rand_array<T>(ARRSIZE);
-        std::sort(arr.begin(), arr.end());
-        std::reverse(arr.begin(), arr.end());
+        std::sort(arr.begin(), arr.end(), std::greater<T>());
     }
     arr_bkp = arr;

PATCH
mosullivan@sprvm-1:~/x86-simd-sort$ make -s -B bench
mosullivan@sprvm-1:~/x86-simd-sort$ sde -knl -- ./benchexe --benchmark_filter='avx.*reverse.*uint16'
2023-06-30T07:19:59+00:00
Running ./benchexe
Run on (8 X 2700 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 2048 KiB (x4)
  L3 Unified 107520 KiB (x1)
Load Average: 0.22, 0.09, 0.02
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
avx512qsort/reverse_10k/uint16_t SKIPPED: 'Requires AVX512 BW ISA'

So, do you want me to make a new PR to fix this weird behaviour or put it in here?

Workaround:

diff --git a/benchmarks/bench-qsort.hpp b/benchmarks/bench-qsort.hpp
index ae02ac9..23c16ca 100644
--- a/benchmarks/bench-qsort.hpp
+++ b/benchmarks/bench-qsort.hpp
@@ -23,8 +23,7 @@ static void stdsort(benchmark::State &state, Args &&...args)
     }
     else if (arrtype == "reverse") {
         arr = get_uniform_rand_array<T>(ARRSIZE);
-        std::sort(arr.begin(), arr.end());
-        std::reverse(arr.begin(), arr.end());
+        std::sort(arr.begin(), arr.end(), std::greater<T>());
     }
     arr_bkp = arr;
 
@@ -66,8 +65,7 @@ static void avx512qsort(benchmark::State &state, Args &&...args)
     }
     else if (arrtype == "reverse") {
         arr = get_uniform_rand_array<T>(ARRSIZE);
-        std::sort(arr.begin(), arr.end());
-        std::reverse(arr.begin(), arr.end());
+        std::sort(arr.begin(), arr.end(), std::greater<T>());
     }
     arr_bkp = arr;
 

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@r-devulap
Copy link
Contributor

I think the benchmarks isn't compiled with the right attributes. We are compiling the entire file with-march=icelake-client and the compiler can chose to emit any instruction it wants for all of the non avx512 sorting code.

I wouldn't worry about it. This particular CI isn't a "Required" one to pass to enabling merging a PR. I am considering just getting rid of it all together.

@r-devulap r-devulap merged commit f220118 into intel:main Jul 20, 2023
@mosullivan93
Copy link
Contributor Author

I was thinking of zhoozhing this CI up a bit by having it run the benchmarks separately first and writing the results to JSON. This could then be split into qsort/qselect/partial_qsort comparisons using jq. Plus I could then have it exit when the CPU is not AVX512 compatible. Gets me worried for a moment when I see the little red x after a push, but I can let it go if it's not too important in the grand scheme of things 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants