Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New arch: HPC2020 nvhpc/23.7 #103

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

MichaelSt98
Copy link
Contributor

Add new env and toolchain file for NVHPC 23.7

@MichaelSt98 MichaelSt98 requested a review from reuterbal December 6, 2024 10:59
@reuterbal reuterbal force-pushed the nams-atos-nvhpc-23-7 branch from 9eefa04 to 8af29b6 Compare December 6, 2024 11:25
@MichaelSt98
Copy link
Contributor Author

Thanks for rebasing!

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this doesn't seem to work out-of-the-box for me.

Building with

./cloudsc-bundle build --arch arch/ecmwf/hpc2020/nvhpc/23.7/ --with-gpu --with-loki --with-cuda

fails to discover OpenMP correctly:

-- Found OpenMP_Fortran: -mp -mp=gpu,bind,allcores,numa
-- Could NOT find OpenMP_C (missing: OpenMP_acchost_LIBRARY)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND C)
-- dwarf-p-cloudsc FAILED to find OPTIONAL package OpenMP
-- Could NOT find package OpenMP required for feature OMP --
CMake Error at ecbuild/cmake/ecbuild_log.cmake:190 (message):
  CRITICAL - Feature OMP cannot be enabled -- following required
  packages weren't found: OpenMP
Call Stack (most recent call first):
  ecbuild/cmake/ecbuild_add_option.cmake:260 (ecbuild_critical)
  cloudsc-dwarf/CMakeLists.txt:78 (ecbuild_add_option)


-- Configuring incomplete, errors occurred!

This is likely due to the explicit OpenMP library override in the toolchain.cmake:

# Note: OpenMP_C_FLAGS and OpenMP_C_LIB_NAMES have to be provided _both_ to
# keep FindOpenMP from overwriting the FLAGS variable (the cache entry alone
# doesn't have any effect here as the module uses FORCE to overwrite the
# existing value)
set( OpenMP_C_FLAGS "-mp -mp=bind,allcores,numa" CACHE STRING "" )
set( OpenMP_C_LIB_NAMES "acchost" CACHE STRING "")

With these lines removed, the build is successful and has also been successful for me with the 22.11 arch. Iirc, this may have been fixed in CMake and the 3.25 version that we use in both arch files doesn't require the workaround anymore. I would suggest just removing the relevant bit from the toolchain file.

However, even with this almost all variants fail with a SIGFPE:

The following tests FAILED:
          3 - dwarf-cloudsc-fortran-serial (NUMERICAL)
          4 - dwarf-cloudsc-fortran-omp (NUMERICAL)
         10 - dwarf-cloudsc-gpu-scc-serial (NUMERICAL)
         11 - dwarf-cloudsc-gpu-scc-stack-serial (NUMERICAL)
         12 - dwarf-cloudsc-gpu-scc-hoist-serial (NUMERICAL)
         13 - dwarf-cloudsc-gpu-scc-k-caching-serial (NUMERICAL)
         14 - dwarf-cloudsc-gpu-omp-scc-serial (NUMERICAL)
         15 - dwarf-cloudsc-gpu-omp-scc-stack-serial (NUMERICAL)
         16 - dwarf-cloudsc-gpu-omp-scc-hoist-serial (NUMERICAL)
         17 - dwarf-cloudsc-gpu-omp-scc-k-caching-serial (NUMERICAL)
         18 - dwarf-cloudsc-gpu-scc-cuf-serial (NUMERICAL)
         19 - dwarf-cloudsc-gpu-scc-cuf-k-caching-serial (NUMERICAL)
         20 - dwarf-cloudsc-loki-idem-serial (NUMERICAL)
         21 - dwarf-cloudsc-loki-idem-omp (NUMERICAL)
         22 - dwarf-cloudsc-loki-idem-stack-serial (NUMERICAL)
         23 - dwarf-cloudsc-loki-idem-stack-omp (NUMERICAL)
         24 - dwarf-cloudsc-loki-sca-serial (NUMERICAL)
         25 - dwarf-cloudsc-loki-scc-serial (NUMERICAL)
         26 - dwarf-cloudsc-loki-scc-stack-serial (NUMERICAL)
         27 - dwarf-cloudsc-loki-scc-hoist-serial (NUMERICAL)
         28 - dwarf-cloudsc-loki-scc-cuf-parametrise-serial (NUMERICAL)
         29 - dwarf-cloudsc-loki-scc-cuf-hoist-serial (NUMERICAL)
         30 - dwarf-cloudsc-loki-c-serial (NUMERICAL)
         31 - dwarf-cloudsc-loki-c-omp (NUMERICAL)

Could you have another look at that please?

@reuterbal reuterbal changed the title atos nvhpc 23 7 New arch: HPC2020 nvhpc/23.7 Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants