Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Fused Multiply-Add instructions on WCOSS2 when -DFASTER=ON #1618

Closed

Conversation

SamuelTrahanNOAA
Copy link
Collaborator

@SamuelTrahanNOAA SamuelTrahanNOAA commented Feb 17, 2023

Description

Two HAFS-B tests crashed on WCOSS2 with the new -DFASTER=ON mode unless fused multiply-add instructions were disabled. This isn't too surprising since the Intel compiler documentation warns that the compiler isn't guaranteed to optimize correctly for non-Intel processors. WCOSS2 uses AMD processors.

To fix this problem, this PR adds -no-fma only on WCOSS2, and only when -DFASTER=ON is enabled. You can also send -no-fma on demand by setting -DDISABLE_FMA=ON. Developer @BijuThomas-NOAA has independently verified this fixes the HAFS-B crashes.

The reason the crash only happens with -DFASTER=ON is that the default Release optimization already disables FMA through the -fp-model consistent. While the vague documentation of -fp-model consistent is inconsistent from version to version of the Intel compiler, that part at least is consistent.

Top of commit queue on: TBD

Input data additions/changes

  • No changes are expected to input data.

Anticipated changes to regression tests:

  • Changes are expected to the following tests: hafs_regional_storm_following_1nest_atm_ocn_wav on WCOSS2 and Acorn only.

That is because it is the only test that uses the -DFASTER=ON option, and the change is only on WCOSS2 and Acorn.

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Combined with PR's (If Applicable):

Commit Queue Checklist:

  • Link PR's from all sub-components involved
  • Confirm reviews completed in sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne with both Intel/GNU compilers
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Linked PR's and Issues:

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Intel
      • Hera
      • Orion
      • Jet
      • Gaea
      • Cheyenne
    • GNU
      • Hera
      • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@SamuelTrahanNOAA SamuelTrahanNOAA changed the title Disable Fused Multiply-Add instructions on WCOSS2 when -DFASTER=ON Disable Fused Multiply-Add instructions on WCOSS2 when -DFASTER=ON Feb 17, 2023
@junwang-noaa
Copy link
Collaborator

@SamuelTrahanNOAA Do we want to use -nofma on wcoss2 only for FASTER option or it might be good that we turn off fma on all the platforms?

@SamuelTrahanNOAA
Copy link
Collaborator Author

Do we want to use -nofma on wcoss2 only for FASTER option or it might be good that we turn off fma on all the platforms?

I don't think that is necessary; only turning it of on AMD platforms should be sufficient. Intel's compiler is written only to optimize for Intel processors, and they test it thoroughly on their own processors. Throughout their documentation, you will see warnings that it may not optimize correctly for non-Intel processors. We're just seeing one instance of that.

@junwang-noaa
Copy link
Collaborator

Thanks, @SamuelTrahanNOAA. Do you see any performance impact on wcoss2 when turning off fma?

@SamuelTrahanNOAA
Copy link
Collaborator Author

Thanks, @SamuelTrahanNOAA. Do you see any performance impact on wcoss2 when turning off fma?

I expect some performance impact, but not a terribly large one, since it's one of many optimizations and instruction sets enabled by FASTER=ON. The HAFS team has not told me how large the impact is, so I'll contact them again for an update on their testing.

@SamuelTrahanNOAA SamuelTrahanNOAA marked this pull request as draft February 24, 2023 16:42
@SamuelTrahanNOAA
Copy link
Collaborator Author

I'm converting this to a draft so it is not accidentally merged before I finish speed tests with Biju.

@SamuelTrahanNOAA SamuelTrahanNOAA marked this pull request as ready for review February 27, 2023 15:44
@SamuelTrahanNOAA
Copy link
Collaborator Author

According to @BijuThomas-NOAA's tests, the difference in runtime, if any, is much less than random variability. For HAFS, at least, there's no disadvantage to this change. This is from his email. Runtimes are in seconds, and the test was done on the production side of WCOSS2.

HAFSA
fma    5471.756311
nofma  5479.343784


HAFSB
fma    5430.821674
nofma  5427.427221

I apologize for the delay; the HAFS team has been busy with the implementation and have had to prioritize their time, and I don't have production access.

@BrianCurtis-NOAA
Copy link
Collaborator

I ran this on WCOSS2, and i can confirm only hafs_regional_storm_following_1nest_atm_ocn_wav failed.

@SamuelTrahanNOAA
Copy link
Collaborator Author

I ran this on WCOSS2, and i can confirm only hafs_regional_storm_following_1nest_atm_ocn_wav failed.

Good, good. That is as expected since the results of that test will change.

@zach1221
Copy link
Collaborator

zach1221 commented Mar 7, 2023

Hi, @SamuelTrahanNOAA We're going to start working this PR next, would you be able to create another PR to have it merged into #1640 ?

@SamuelTrahanNOAA
Copy link
Collaborator Author

Hi, @SamuelTrahanNOAA We're going to start working this PR next, would you be able to create another PR to have it merged into #1640 ?

I'm willing to have the two merged into a single PR. Could you clarify: are you asking me to open a new PR that combines both sets of changes?

@SamuelTrahanNOAA
Copy link
Collaborator Author

I have opened the combined PR here. I've never been asked to do it this way though, so please confirm this is what you wanted.

#1645

@DusanJovic-NOAA
Copy link
Collaborator

Opening yet another PR is completely unnecessary. You could simply merge the changes from #1640 in this PR. Or the #1640 owner could simply pull the changes from this PR.

@zach1221
Copy link
Collaborator

zach1221 commented Mar 8, 2023

@DusanJovic-NOAA Sorry for the miscommunication, since 1645 had been created already we'll move forward with that PR.

@SamuelTrahanNOAA
Copy link
Collaborator Author

Opening yet another PR is completely unnecessary. You could simply merge the changes from #1640 in this PR. Or the #1640 owner could simply pull the changes from this PR.

Indeed, the request confused me too, but since they asked me to open a new PR, I did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two HAFS forecasts crashed on WCOSS2 when Fused Multiply-Add instructions were enabled
5 participants