-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor global sums to improve performance #780
Comments
@phil-blain, I'm doing some timing tests with the global sums including with VP. I'm trying to understand this section of the vp, https://github.com/CICE-Consortium/CICE/blob/main/cicecore/cicedyn/dynamics/ice_dyn_vp.F90#L3348. The haloupdate is not done when bfbflag='off'. Why is this and is it related to how the computation of the global_sum is done when bfbflag='off' here, https://github.com/CICE-Consortium/CICE/blob/main/cicecore/cicedyn/dynamics/ice_dyn_vp.F90#L2556? The reason I ask is that the performance and results change in rather surprising ways if I comment out one or the other bfbflag='off' logic in VP. One goal is to improve the global sum performance in the infrastructure such that there is no performance degradatation in VP when using it. But I think I also need to understand if/when/why the haloupdate is executed at line 3348 under various settings of the bfbflag. Thanks. |
Some Dynamics timing data from various tests. I'm setting the VP global_sum to "local" or "lsum8", and I'm setting the haloUpdate at line 3348 to "off" or "on" manually via code mods. Here's the timing sum=local, halo=on, 24.3s No answers are bit-for-bit, as expected. The timing is pretty surprising though, turning the halo off slows down the VP code significantly. That seems more important than the global sum implementation. Is that halo update actually important for the preconditioning and efficiency of the solver? Maybe I'm doing something wrong in my testing? Again, turning on the haloUpdate at line 3348 is always better than having it off in overall time. |
Another question since I'm looking at the VP code, is there a bug at these lines? Should the RHS be workspace_y? https://github.com/CICE-Consortium/CICE/blob/main/cicecore/cicedyn/dynamics/ice_dyn_vp.F90#L3123 |
Yes, these look like copy-paste errors, should be |
So at line 3348: CICE/cicecore/cicedyn/dynamics/ice_dyn_vp.F90 Lines 3345 to 3348 in 9424497
we are in the I must say I did not think of testing what you tested above, i.e. that it might make the overall runtime faster to always perform these halo updates. My hypothesis for explaining this behaviour is that the preconditioner yields a better starting point with these halo updates than without, so the FGMRES solver can reach the solution faster (and maybe similarly for the nonlinear solver, it depends on the tolerances)... @JFLemieux73 do you think this makes sense ? We could verify this using the
No, it's not related. At 2556: CICE/cicecore/cicedyn/dynamics/ice_dyn_vp.F90 Lines 2555 to 2560 in 9424497
we are in |
Thanks @phil-blain. I'll fix the two typos workspace copies in line 3123 and 3531, no problem. Regarding performance, I did a couple runs with/without the haloupdate on for otherwise the same settings with the VP monitor namelists set to true. Again, I'm not sure what's going on, but looking at the monitor output, with haloupdate "off" on the precondition, the iter_fgmres is typically in the 20-30 range. For each fgm iteration, pgmres seems to do 5 iterations. When I run the same test with haloupdate "on" on the precondition, the iter_fgmres is about half, say 10-15 range, the pgmres stays the same at 5 iterations per fgmres iteration. So I think that does confirm that the haloupdate "on" reduces the number of iterations and seems to reduce the overall time even though we pay for the haloupdate. Are there some other tests I can/should do? The test I'm running is with "dynpicard" (kdyn=3,algo_nonlin='picard'). Haven't checked performance of other vp settings. Should we remove this check on bfbflag on the haloupdate on the preconditioner? |
The global sums have traditionally been used only for diagnostics in the log file. The VP solver now also uses global sums, see #774. The CICE generic global sums could be refactored to improve performance, shifting if statements associated with optional arguments relative to loops and similar.
The text was updated successfully, but these errors were encountered: