-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Fused Multiply-Add instructions on WCOSS2 when -DFASTER=ON
#1618
Disable Fused Multiply-Add instructions on WCOSS2 when -DFASTER=ON
#1618
Conversation
-DFASTER=ON
@SamuelTrahanNOAA Do we want to use -nofma on wcoss2 only for FASTER option or it might be good that we turn off fma on all the platforms? |
I don't think that is necessary; only turning it of on AMD platforms should be sufficient. Intel's compiler is written only to optimize for Intel processors, and they test it thoroughly on their own processors. Throughout their documentation, you will see warnings that it may not optimize correctly for non-Intel processors. We're just seeing one instance of that. |
Thanks, @SamuelTrahanNOAA. Do you see any performance impact on wcoss2 when turning off fma? |
I expect some performance impact, but not a terribly large one, since it's one of many optimizations and instruction sets enabled by FASTER=ON. The HAFS team has not told me how large the impact is, so I'll contact them again for an update on their testing. |
I'm converting this to a draft so it is not accidentally merged before I finish speed tests with Biju. |
According to @BijuThomas-NOAA's tests, the difference in runtime, if any, is much less than random variability. For HAFS, at least, there's no disadvantage to this change. This is from his email. Runtimes are in seconds, and the test was done on the production side of WCOSS2.
I apologize for the delay; the HAFS team has been busy with the implementation and have had to prioritize their time, and I don't have production access. |
I ran this on WCOSS2, and i can confirm only hafs_regional_storm_following_1nest_atm_ocn_wav failed. |
Good, good. That is as expected since the results of that test will change. |
Hi, @SamuelTrahanNOAA We're going to start working this PR next, would you be able to create another PR to have it merged into #1640 ? |
I'm willing to have the two merged into a single PR. Could you clarify: are you asking me to open a new PR that combines both sets of changes? |
I have opened the combined PR here. I've never been asked to do it this way though, so please confirm this is what you wanted. |
@DusanJovic-NOAA Sorry for the miscommunication, since 1645 had been created already we'll move forward with that PR. |
Description
Two HAFS-B tests crashed on WCOSS2 with the new
-DFASTER=ON
mode unless fused multiply-add instructions were disabled. This isn't too surprising since the Intel compiler documentation warns that the compiler isn't guaranteed to optimize correctly for non-Intel processors. WCOSS2 uses AMD processors.To fix this problem, this PR adds
-no-fma
only on WCOSS2, and only when-DFASTER=ON
is enabled. You can also send-no-fma
on demand by setting-DDISABLE_FMA=ON
. Developer @BijuThomas-NOAA has independently verified this fixes the HAFS-B crashes.The reason the crash only happens with
-DFASTER=ON
is that the default Release optimization already disables FMA through the-fp-model consistent
. While the vague documentation of-fp-model consistent
is inconsistent from version to version of the Intel compiler, that part at least is consistent.Top of commit queue on: TBD
Input data additions/changes
Anticipated changes to regression tests:
hafs_regional_storm_following_1nest_atm_ocn_wav
on WCOSS2 and Acorn only.That is because it is the only test that uses the
-DFASTER=ON
option, and the change is only on WCOSS2 and Acorn.Subcomponents involved:
Combined with PR's (If Applicable):
Commit Queue Checklist:
Linked PR's and Issues:
Testing Day Checklist:
Testing Log (for CM's):