Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the Kokkos testing compilation error on Frontier #4358

Closed
wants to merge 5 commits into from

Conversation

anagainaru
Copy link
Contributor

@anagainaru anagainaru commented Oct 7, 2024

Instead of #4357

If hipcc is used as the CXX compiler there is no error, but the CI is using g++ so we need to manually set the CXX compiler to whatever was used to build Kokkos.

The rocm version had to be updated to rocm/5.7.1 in the frontier CI due to an incompatibility between the g++ version in PrgEnv-gnu/8.5.0 and the hipcc version in rocm/5.4.3

The Kokkos version had to be updated to > 4.0.0 due to a warning with 3.7.01 and newer rocm compilers.

@anagainaru anagainaru removed the request for review from vicentebolea October 7, 2024 14:51
@anagainaru anagainaru force-pushed the kokkos-testing-fix branch 2 times, most recently from f92fc25 to 81aed52 Compare October 7, 2024 17:42
@anagainaru
Copy link
Contributor Author

@vicentebolea I don't think the errors on Frontier are about the Kokkos testing. Could you please advise?

@vicentebolea
Copy link
Collaborator

Probably not, let met come back to this later this week

@anagainaru
Copy link
Contributor Author

@vicentebolea thanks !

Just fyi, I used our build-frontier script from the script folder (but using g++ instead of hipcc and the modules used in the CI) and I do not see this errors.

  1) craype-x86-trento                      6) PrgEnv-gnu/8.5.0     11) hsi/default        16) ninja/1.10.2       21) darshan-runtime/3.4.4-mpi  26) rocm/5.4.3
  2) perftools-base/23.12.0                 7) gcc-native/12.3      12) lfs-wrapper/0.0.1  17) libffi/3.4.2       22) zstd/1.5.2
  3) xpmem/2.8.4-1.0_7.3__ga37cbd9.shasta   8) cray-libsci/23.12.5  13) DefApps            18) cray-mpich/8.1.28  23) cray-dsmml/0.2.2
  4) cray-pmi/6.1.13                        9) Core/24.00           14) cmake/3.23.2       19) hdf5/1.14.3-mpi    24) craype-accel-amd-gfx90a
  5) craype-network-ofi                    10) tmux/3.2a            15) git/2.36.1         20) craype/2.7.31.11   25) libfabric/1.20.1

@vicentebolea
Copy link
Collaborator

which kokkos are you testing, the ci tests 3.7.01, I recall kokkos 4.X not needing to explictly set hipcc.

@anagainaru
Copy link
Contributor Author

which kokkos are you testing, the ci tests 3.7.01, I recall kokkos 4.X not needing to explictly set hipcc.

I was able to reproduce this, will come back with hopefully a fix

@anagainaru
Copy link
Contributor Author

There seems to be an incompatibility between the g++ version in PrgEnv-gnu/8.5.0 and the hipcc version in rocm/5.4.3

When I use PrgEnv-gnu-amd/8.5.0 everything works great. The hipcc version is 5.7.1 instead of 5.4.3 so I've updated the rocm version to rocm/5.7.1 in the frontier CI (if you want a different fix please feel free to modify this PR).

@anagainaru anagainaru force-pushed the kokkos-testing-fix branch 3 times, most recently from 5b17b42 to 6ae93e2 Compare October 12, 2024 01:42
@anagainaru
Copy link
Contributor Author

anagainaru commented Oct 12, 2024

Kokkos 3.7 is becoming an old version, and is triggering the following warning with newer rocm compilers:

In file included from /ccs/home/againaru/kokkos/kokkos-3.7.01/core/src/HIP/Kokkos_HIP_Abort.hpp:55:
/opt/rocm-5.7.1/include/rocm_version.h:34:2: warning: "rocm_version.h has moved to /opt/rocm-5.7.1/include/rocm-core " [-W#warnings]
#warning "rocm_version.h has moved to /opt/rocm-5.7.1/include/rocm-core "

I updated the CI to use Kokkos 4.4.01 (the warning issue was fixed in 4.0) instead of 3.7. The other option is to ignore this warning, but I think it's better to test a with a newer version of Kokkos.

@vicentebolea
Copy link
Collaborator

@anagainaru sounds good, lets update the CI so that it uses kokkos > 4. I will follow up with changes in this PR.

@anagainaru
Copy link
Contributor Author

@anagainaru sounds good, lets update the CI so that it uses kokkos > 4. I will follow up with changes in this PR.

I thought already did, that's how the CI is passing now. Take a look at my changes and update what you think needs updating.

Copy link
Collaborator

@vicentebolea vicentebolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

Comment on lines 370 to 372
// Add lines back when using Kokkos 4.2
// Kokkos::printf(" Non-match at pos = (%d %d) : input = %f : output = %f\n", x,
// y, a(start_0 + x, start_1 + y), b(x, y));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Add lines back when using Kokkos 4.2
// Kokkos::printf(" Non-match at pos = (%d %d) : input = %f : output = %f\n", x,
// y, a(start_0 + x, start_1 + y), b(x, y));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually dislike commenting code. I left it here just because the c++ version of this testing is printing the mismatches and I wanted to be consistent. But I am also fine making a note to reintroduce this printf when we decide to move the min Kokkos version to 4.2

@anagainaru
Copy link
Contributor Author

The frontier-Cray CI is still failing but the Kokkos-HIP one passes, so I'll merge this. We can open a new PR to fix the cray CI.

@anagainaru anagainaru closed this Oct 15, 2024
@anagainaru anagainaru deleted the kokkos-testing-fix branch October 15, 2024 21:45
@vicentebolea
Copy link
Collaborator

hmm the pr was closed and deleted rather than merged. Was this intended?

@anagainaru
Copy link
Contributor Author

I was so sure I merged it. I don't understand how this happened (nor how the branch was deleted). Thanks for catching this :|
I will create a new PR, I can't seem to reopen this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants