Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix maskhalo bug in dynamics #517

Merged
merged 1 commit into from
Sep 25, 2020

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Sep 25, 2020

PR checklist

  • Short (1 sentence) summary of your PR:
    Fix maskhalo bug in dynamics introduced in dynamics: add implicit VP solver #491
  • Developer(s):
    apcraig
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.
    Full test suite passed on cheyenne, this fixes issue with "restart gx3 8x2x8x10x20 droundrobin maskhalo" on the current trunk. https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#b1b1f9d49b3a228cff3fc9b07ac4a22d13f2da39
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:

Fix maskhalo bug in dynamics, only appears when running padded decompositions. In #491, an unmasked halo update was changed to a masked halo update. This affects only padded decompositions with maskhalo_dyn=true and was picked up by an exact restart failure.

… decompositions. In CICE-Consortium#491, an unmasked halo update was changed to a masked halo update.  This affects only padded decompositions with maskhalo_dyn=true and was picked up by an exact restart failure
@apcraig
Copy link
Contributor Author

apcraig commented Sep 25, 2020

It would be nice to get this reviewed and approved today to get it into weekend testing. But this bug is unlikely to affect anyone in the short term, so it can also wait if needed.

@phil-blain
Copy link
Member

Thanks @apcraig. This looks like the right solution, but I'm not sure I understand why the change from an unmasked to a masked halo at the point in code would cause exact restart to fail... Can you explain a little more ? just so I understand the code better.

The original commit where I made this change was: e85bca8, and I remember thinking "oh, the masked haloupdate is not done there, with my change it will be done but I guess that's an optimization..." so something was clearly missing from my understanding.

@apcraig
Copy link
Contributor Author

apcraig commented Sep 25, 2020

@phil-blain, I was sort of asking myself the same question when I identified the difference. I can't remember the details about why that first halo update should not be masked. It's obviously a subtle issue that seems to only create problems is very special cases. The other thing is that haloupdate is not part of the subcycling (in evp and eap anyway), so the cost savings is relatively small. If we want to dig deeper into this, we could. It would probably require comparing various fields within the subroutine as the model advances to see where and how the differences are introduced. I'm not sure it's worth the effort right now, but am open to raising the priority and pursuing this question.

When I implemented the maskhalo feature, I tried to identify the proper mask and test for bit-for-bit focusing especially where a mask could be reused for multiple halo updates. Since this was a single halo update outside the dynamics subcycling, the cost benefit was relatively small. I suspect I recognized the masked halo was a problem in this haloupdate, but I may not have ever tried to understand why because the performance implication was small. It may be as simple as forcing zeros into the unused halo to ensure it behaves properly, but that is just speculation. Again, we could look into this more if there was interest.

@phil-blain
Copy link
Member

OK I understand. Let's not pursue this too far, we can always revisit if need be, "premature optimization is the root of all evil" after all.

@eclare108213
Copy link
Contributor

I don't completely understand this either, but here's a hypothesis:
If it's only happening for padded runs where there are extra grid cells outside of the physical domain, beyond the halos, and the code in question is only shifted from before the maskhalo update to afterward, then the halo masking must not treat the padded cells the same way that the regular halo update does. This could affect restarts when the entire grid is written to the restart file (as opposed to just the physical domain). I'm not sure if that's the case here, and this is just a hypothesis. But the fix looks fine to me, and so I'll approve it for weekend testing.

@apcraig apcraig merged commit e662c79 into CICE-Consortium:master Sep 25, 2020
@apcraig apcraig deleted the dyn_maskhalo_fix branch August 17, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants