Add optimal optimization flags for Intel compilers on AMD CPUs #3793

Flamefire · 2021-08-02T16:25:54Z

We have the following code for Intel toolchains:

    # used when 'optarch' toolchain option is enabled (and --optarch is not specified)
    COMPILER_OPTIMAL_ARCHITECTURE_OPTION = {
        (systemtools.X86_64, systemtools.AMD): 'xHost',
        (systemtools.X86_64, systemtools.INTEL): 'xHost',
    }

However for AMD CPUs this is bad: xHost will use SSE only even when AVX2 is available. This then even fully fails installing software or installs it with worse optimizations.

As we usually care about CPU vendor and vector instructions supported, I'd optionally add a third argument here: The max supported vector instruction set.
In EB, we then need to define a list of supported types, e.g. "avx2, avx, sse2, sse" in that order. If a tuple (<arch>, <vendor>, <vector>) is found in the dict that flag is used, otherwise the next lower vec is tried. If all were tried, the last entry is removed and the remaining is tried
Extending this to the vendor part would allow this:

    COMPILER_OPTIMAL_ARCHITECTURE_OPTION = {
        (systemtools.X86_64, ): 'xHost',
        (systemtools.X86_64, systemtools.AMD, systemtools.AVX2): 'mavx2',
    }

We should also allow to set this via env vars, similar to EASYBUILD_OPTARCH: Instead of conditionally setting it to EASYBUILD_OPTARCH="Intel:mavx2 -fma; GCC:march=native" as we do on our site while having to check for AMD in the shell script we could do:
EASYBUILD_OPTARCH="Intel,x86:xHost; Intel,x86,AMD,AVX2:mavx2 -fma; GCC:march=native"

This would allow some sort of future-proofing this.

For detecting the supported vector extensions we could use archspec or just use the cpu features query we already have and search for avx2 etc.

The text was updated successfully, but these errors were encountered:

bartoldeman · 2021-08-04T13:32:19Z

@Flamefire note that we set
export EASYBUILD_OPTARCH='NVHPC:tp=haswell;Intel:march=core-avx2 -axCore-AVX512;GCC:march=core-avx2'
for the "avx2" arch to be compatible with both Intel and AMD. Do you know if march=core-avx2 is any worse or better than mavx2 -fma?

Flamefire · 2021-08-04T15:06:11Z

No I don't know. These are just the flags we use and one of our long-term admins said they are good. So: Magic! ;)

Flamefire · 2021-08-10T12:52:19Z

@bartoldeman https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/code-generation-options/m.html#m lists -march=core-avx2 as the suggested replacement for -mfma
So I'd say they are the same.

bartoldeman · 2022-09-23T15:30:53Z

it's even worse than that, -xHOST or -march=native for intel compilers 19.1+ (2020+) produces even slower x87 code, not even sse2:

$ lscpu | grep Model\ name
Model name:          AMD EPYC 7532 32-Core Processor
$ icc -v
icc version 19.1.1.217 (gcc version 9.3.0 compatibility)
$ cat test-amd.c
#include <stdio.h>
int main(void)
{
  double y;
  scanf("%lg\n", &y);
  printf("%g\n", y*y);
  return 0;
}
$ icc -c -xHost test-amd.c
$ objdump -d test-amd.o 

test-amd.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 e4 80             and    $0xffffffffffffff80,%rsp
   8:   48 81 ec 80 00 00 00    sub    $0x80,%rsp
   f:   bf 03 00 00 00          mov    $0x3,%edi
  14:   33 f6                   xor    %esi,%esi
  16:   e8 00 00 00 00          callq  1b <main+0x1b>
  1b:   bf 00 00 00 00          mov    $0x0,%edi
  20:   48 8d 34 24             lea    (%rsp),%rsi
  24:   33 c0                   xor    %eax,%eax
  26:   e8 00 00 00 00          callq  2b <main+0x2b>
  2b:   dd 04 24                fldl   (%rsp)
  2e:   bf 00 00 00 00          mov    $0x0,%edi
  33:   d8 c8                   fmul   %st(0),%st
  35:   b8 01 00 00 00          mov    $0x1,%eax
  3a:   dd 5c 24 08             fstpl  0x8(%rsp)
  3e:   f2 0f 10 44 24 08       movsd  0x8(%rsp),%xmm0
  44:   e8 00 00 00 00          callq  49 <main+0x49>
  49:   33 c0                   xor    %eax,%eax
  4b:   48 89 ec                mov    %rbp,%rsp
  4e:   5d                      pop    %rbp
  4f:   c3                      retq

see also https://community.intel.com/t5/Intel-Fortran-Compiler/SSE-error-in-compilation-with-xHost-option-on-AMD-Zen-3-CPU/m-p/1287143
(oneapi compilers don't have this issue, and -march=core-avx2 works fine, it's just the cpu detection that's broken)

Flamefire · 2022-09-24T18:43:18Z

So #3797 would really help, I'd say.

boegel added the enhancement label Aug 4, 2021

boegel added this to the next release (4.4.2?) milestone Aug 4, 2021

Flamefire linked a pull request Aug 10, 2021 that will close this issue

Allow optarch values to be partial maps including vector extensions #3797

Open

boegel modified the milestones: 4.4.2, release after 4.4.2 Sep 1, 2021

boegel modified the milestones: 4.5.0 (next release), release after 4.5.0 Oct 25, 2021

boegel modified the milestones: 4.5.1, release after 4.5.1 Dec 7, 2021

boegel modified the milestones: 4.5.2, release after 4.5.2 Jan 14, 2022

boegel modified the milestones: 4.5.3, release after 4.5.3 Feb 8, 2022

boegel modified the milestones: next release (4.5.4), release after 4.5.4 Mar 25, 2022

boegel modified the milestones: 4.5.5, release after 4.5.5 May 25, 2022

boegel modified the milestones: next release (4.6.0), release after 4.6.0 Jul 6, 2022

boegel modified the milestones: 4.6.1, release after 4.6.1 Sep 9, 2022

boegel modified the milestones: next release (4.6.2?), release after 4.6.2 Oct 18, 2022

boegel modified the milestones: next release (4.7.0), release after 4.7.0 Dec 20, 2022

boegel modified the milestones: next release (4.7.1), release after 4.7.1 Feb 25, 2023

boegel modified the milestones: next release (4.7.2?), release after 4.7.2 Apr 12, 2023

boegel modified the milestones: 4.7.3, release after 4.7.3 Jul 6, 2023

boegel modified the milestones: next release (4.8.1?), release after 4.8.1 Sep 3, 2023

boegel modified the milestones: next release (4.8.2), release after 4.8.2 Oct 27, 2023

boegel modified the milestones: next release (4.9.0), release after 4.9.0 Dec 26, 2023

boegel modified the milestones: 4.9.1, release after 4.9.1 Apr 3, 2024

jfgrimm modified the milestones: release after 4.9.1, 5.0 Apr 4, 2024

jfgrimm added the EasyBuild-5.0 EasyBuild 5.0 label Apr 4, 2024

boegel added this to EasyBuild v5.0 Aug 26, 2024

boegel moved this to Nice-to-have in EasyBuild v5.0 Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimal optimization flags for Intel compilers on AMD CPUs #3793

Add optimal optimization flags for Intel compilers on AMD CPUs #3793

Flamefire commented Aug 2, 2021

bartoldeman commented Aug 4, 2021

Flamefire commented Aug 4, 2021

Flamefire commented Aug 10, 2021

bartoldeman commented Sep 23, 2022

Flamefire commented Sep 24, 2022

Add optimal optimization flags for Intel compilers on AMD CPUs #3793

Add optimal optimization flags for Intel compilers on AMD CPUs #3793

Comments

Flamefire commented Aug 2, 2021

bartoldeman commented Aug 4, 2021

Flamefire commented Aug 4, 2021

Flamefire commented Aug 10, 2021

bartoldeman commented Sep 23, 2022

Flamefire commented Sep 24, 2022