Support multiple concurrent instances per MPI task #463

michalakes · 2023-02-28T23:05:07Z

Short description

Support multiple concurrent instances per MPI task
closes #469

Changes to be committed:

modified:   scripts/mkstatic.py
modified:   src/ccpp_types.F90
modified:   src/ccpp_types.meta

Description

In order to have an ensemble of model instances in a JEDI executable, we
need for packages the model relies on to be “stateless” – meaning
that the data for an instances are not shared between instances. A more
comprehensive solution would be to implement physics so that all such
data is either passed in via arguments to the package, or allocated as
an instance of a class defining the physics.

Deferring the larger issue of revisiting CCPP requirements to cover
support for concurrent instances, this PR takes a more modest approach. It
adds a new "instance" dimension to the types already defined for GFS
physics (GFS_control, GFS_data, GFS_interstitial). And it includes some
changes that were needed to address first-time initialization latches
inside certain packages.

One of these, in Thompson MP, declears the heap allocated fields by ccpp
instance. The issue here was that only the first instance of Thompson
MP was initializing itself. The other guards against trying to allocate
data that has already been allocated (e.g. h2ointerp and ozinterp).

The extent of packages touched by the PR is only one NRL-used suite that
is a subset of all the packages in CCPP.

User interface changes?: No

Fixes:

Allow multiple instances of CCPP physics in single executable for ensemble DA #999

NCAR/ccpp-physics#999

Testing:

Has been successfully tested in NRL NEPTUNE code, which includes a twoinsts-physics in its CI test harness.

test removed:
unit tests:
system tests:
manual testing:

Changes to add an instance index for CCPP physics Changes to be committed: modified: scripts/mkstatic.py modified: src/ccpp_types.F90 modified: src/ccpp_types.meta

climbfuji

I assume that for now a hardcoded maximum of 200 ensemble members is enough ... maybe that needs to become configurable in the future.

gold2718

I am approving because it is not going to get in the way of our work.
However, like @climbfuji, I wonder about the hard coded limit. Can it be checked or eliminated?

gold2718 · 2023-04-30T18:33:16Z

scripts/mkstatic.py

@@ -896,7 +896,7 @@ class Group(object):
   private
   public :: {subroutines}

-   logical, save :: initialized = .false.
+   logical, dimension(200), save :: initialized = .false.


The save is redundant, it is on by default at module scope.
How / where is the index initialized (there is nothing in this PR to indicate that as far as I can tell)?
Could the 200 limit be a public parameter that can be checked somewhere in case of an overflow?
Better yet, is it known when the write method is called? The actual size could be passed in.

I agree that having this as a public parameter is better than hard-coding it to 200 (though that's plenty high enough for NRL's purposes). I'm at the edge of my practical knowledge of CCPP's innards and would need to spend some time figuring out how to add the size parameter. Someone else reading this would probably know how to do it in 10 minutes. If so, I support that.

@michalakes I have opened a PR onto your branch that makes this value a little less hard-coded (defined in common.py). Ideally it would have been defined in the host model's config file, but it looks like the order of operations in the ccpp_prebuild.py script doesn't allow for that without relatively invasive changes, and that doesn't seem worth the effort for the prebuild script.

@gold2718 @climbfuji Let me know if you have any comments on this, or if there's something I'm missing. It does seem like it should be relatively easy to define this variable in the host config file with capgen, since all the config variables are available in the run_env structure.

@mkavulich,

The save is redundant, it is on by default at module scope. How / where is the index initialized (there is nothing in this PR to indicate that as far as I can tell)? Could the 200 limit be a public parameter that can be checked somewhere in case of an overflow? Better yet, is it known when the write method is called? The actual size could be passed in.

While the size can easily be entered and entered around passed around with capgen, that will not help with either non-compliant parameterizations or with capgen itself.
capgen makes use of module variables to handle automatic memory management for variables that need to be passed between suite phases (e.g., created at init time and used at run time). To accomodate multiple instances would require a pretty extensive reworking of this infrastructure.

I'll just observe that instances is a one-dimensional array of logicals. One could hard code its dimension to 2000 (well beyond any foreseeable number of ensemble members) and it would still probably use less storage than what's devoted to string constants in CCPP. Perfect being enemy of the good, I would hard-code, putting in an error check somewhere, and move on with this

I respectfully request that it be approved and merged and that that we worry about hypotheticals if and when they arise. Agile.

I understand what you are saying but it would also help if you had followed the published workflow for this project (https://github.com/NCAR/ccpp-framework/wiki/Development-Workflow). Also, the governance committee is supposed to approve changes to the functionality of the framework before they are implemented. I will not be a holdout in the sense if the other reviewers approve, I will as well. However, the issues of non-compliant physics schemes are not hypotheticals. capgen is also not hypothetical and IMHO anything which makes unification harder should definitely be approved by the CCPP Framework governance committee.

@areinecke,

My understanding of the purpose of the logical is to indicate if shared data that gets initialized/ingested only occurs once. In your example if each instance needed that array then it will be initialized.

I think these statements are correct. However, the module variables created by capgen are 2-D or 3-D variables and they are there because the suite modifies them. Every instance will need its own copy of each module variable which can be done but will require quite a bit of work because this is a completely new feature that was not anticipated. Since these are usually 3-D variables, hard coding a large number will indeed be a lot of memory. I do not see how the logical can help with this implementation (but please educate me if I'm still missing something).

I respectfully request that it be approved and merged and that that we worry about hypotheticals if and when they arise. Agile.

I understand what you are saying but it would also help if you had followed the published workflow for this project (https://github.com/NCAR/ccpp-framework/wiki/Development-Workflow). Also, the governance committee is supposed to approve changes to the functionality of the framework before they are implemented. I will not be a holdout in the sense if the other reviewers approve, I will as well. However, the issues of non-compliant physics schemes are not hypotheticals. capgen is also not hypothetical and IMHO anything which makes unification harder should definitely be approved by the CCPP Framework governance committee.

You're quite right and I respect your process. Could you (plural) please expedite this through the process. The PR is solid and addresses needed functionality. If there are tweaks needed then please go ahead. We've already done the heavy lifting on this. As to whether the feature was anticipated or not, I'd only point out that it supports ensembles and if it wasn't, it should have been.

As to whether the feature was anticipated or not, I'd only point out that it supports ensembles and if it wasn't, it should have been.

Understood, however, until now, I have never seen ensembles in the same MPI task so I missed that possibility.
I have not been going to any meetings (where I understand this PR has been discussed). Can anyone fill me (or us) in on the current thinking on this PR?

Understood, however, until now, I have never seen ensembles in the same MPI task so I missed that possibility. I have not been going to any meetings (where I understand this PR has been discussed). Can anyone fill me (or us) in on the current thinking on this PR?

Not sure if this is what you what you are asking (others please chime in), but in addition to supporting ensembles the feature is also needed to support our 4DVAR, from which the tangent linear and non-linear instances of the model are calling the same (but separate instances of) the physics. There are probably other use cases wherein multiple instances of the same physics would need to be invoked within the same process (superparameterizations, maybe?) Thanks for being open to considering this further.

gold2718 · 2023-05-01T17:58:28Z

I have been thinking about this PR and have a few concerns (so I'm withdrawing my approval for now).
First, it would really help to have a completed PR desription (the template text is still there at the moment).
Second, please reference the appropriate issue number in the PR description (i.e., why are you making this change). In this case, there is at least one issue (NCAR/ccpp-physics#999), are there others?

It seems that the real use case is that you want to run multiple instances of a model in the same set of tasks (multiple overlapping instances), is that correct?

Can someone point me to the section of the current CCPP documentation (https://ccpp-techdoc.readthedocs.io/en/v6.0.0/) that covers rules for the CCPP framework? I do not see anything and in particular am having trouble finding anything similar to the "driver" rules section of the historical rules document (D1 -- D35 in https://dtcenter.org/sites/default/files/community-code/ccpp-requirements-historical.pdf). Still, looking at both documents, I do not see any requirement that states or implies that multiple overlapping instances of a model needs to be supported by the CCPP.

I think this is important because supporting multiple overlapping instances puts requirements on the CCPP Framework as well as on CCPP Physics Parameterizations and a clear requirement will guide both framework and parameterization development. As pointed out in the issue, there are existing physics parameterizations that contain module variables (e.g., h2ointerp) that might contain model state. Please correct me if I am wrong but if a parameterization holds model state (e.g., an evolving cloud PDF) as a module variable, then multiple instances would all be trying to use the same data which seems bound to fail (even without threading). @michalakes sort of pointed this out back in March when he said:

It's a problem if instances need different ozone or h2o data

Can the CCPP governance committee please issue a statement on this requirement and get it into documentation somewhere? If it is decided that supporting multiple overlapping instances of a physics suite is a CCPP requirement, can someone please add a "capgen unification" issue for it?

climbfuji · 2023-05-01T21:37:20Z

@michalakes I remember having a second new member of the ccpp_types data structure to capture the maximum number of instances (default one). This could be set during the ccpp (framework) init phase and then used to tell the physics to allocate the heap data to the correct size?

michalakes · 2023-05-10T11:28:56Z

I have been thinking about this PR and have a few concerns (so I'm withdrawing my approval for now). First, it would really help to have a completed PR description (the template text is still there at the moment). Second, please reference the appropriate issue number in the PR description (i.e., why are you making this change). In this case, there is at least one issue (NCAR/ccpp-physics#999), are there others?

Done

It seems that the real use case is that you want to run multiple instances of a model in the same set of tasks (multiple overlapping instances), is that correct?

Yes, we want multiple concurrent instances of a model, and thus multiple concurrent instances of CCPP and other libraries it uses, to work correctly when running on the same process, that is, MPI task.

…mmon.py

gold2718 · 2023-06-16T08:02:05Z

@michalakes,

I respect your process

and

Thanks for being open to considering this further.

The reason the first step in our process is Open an Issue is so that discussions like this can happen well before the PR. I'm entering a discussion below since there is still no issue for this feature.

As for capgen, I would treat this by adding a new dimension to module-level variables. I could allocate enough space to hold the data for each instance and select the current instance by using the ccpp_instance variable (which would probably be passed in from the host like the thread number but that is flexible in capgen).
However, to do this, I would need a new CCPP variable, call it ccpp_number_of_instances (similar to number_of_openmp_threads) which would be available at initialization time to be able to allocate the correct number of instance slots.
In this case, we would just add a switch to capgen (e.g., --multi-instance) that would trigger this behavior. At that point, the number of instances is set by the host model at run time.

Thoughts on this approach? Is this an approach that could also be used in mkstatic to avoid the hard-coding issue (which seems to be causing at least some discomfort aside from the memory issue)?

And a side point, y'all must have a lot of memory to be able to run multiple instances in a single task (but maybe that's just the mental scar tissue left over from the Blue Gene era talking :)

climbfuji · 2023-06-16T16:16:37Z

Sorry for my late reply, I was on leave for the last 2+ weeks.

I think it's worth noting that @michalakes and @areinecke joined the CCPP developer meeting a few months ago to explain exactly what the problem is and how this PR addresses it. So I'd say they did what we expected them to do. Unfortunately @gold2718 wasn't on that call and therefore didn't get all the information. I get it that issues need to be created, but at the same time the developer meeting is the main system for discussing/proposing changes to the framework and if someone makes the effort to show up there that should be good enough.

Regarding the proposed solution for capgen above (#463 (comment)), this is what I would have done for ccpp_prebuild as well. But, given that we'll be working actively on the unification and that capgen will replace prebuild in the near future, I thought it would be ok to hardcode it for now in prebuild.

gold2718 · 2023-06-16T16:37:11Z

I get it that issues need to be created, but at the same time the developer meeting is the main system for discussing/proposing changes to the framework and if someone makes the effort to show up there that should be good enough.

Sure but there is still a community that will not find out what is going on this way (heck, even I didn't). I took a minute and closed that gap (mostly referencing the really good description at NCAR/ccpp-physics#999).

michalakes · 2023-06-16T16:43:27Z

Thoughts on this approach? Is this an approach that could also be used in mkstatic to avoid the hard-coding issue (which seems to be causing at least some discomfort aside from the memory issue)?

I agree with the two reviewers who've approved the pull request and think the PR as it currently stands is good enough to go in as-is, with a more comprehensive approach to follow if needed. I'm a dilatant when it comes to the ins and outs of CCPP's infrastructure. I managed to learn just enough to incorporate and test the changes to get our project rolling again and to provide back to you in the form of the current PR. So beyond what I've already said, I don't have any additional thoughts on your proposal other than to encourage you all to go forward expeditiously in whatever manner satisfies your requirements and processes.

And a side point, y'all must have a lot of memory to be able to run multiple instances in a single task

The NEPTUNE code scales well both in memory and computationally. Dividing the job over a separate sets of tasks for each ensemble member consumes about the same total amount of memory as spreading the ensembles over all the tasks. There are good scaling efficiency arguments for having each ensemble member on its own set of tasks; but there are also advantages for computing the ensemble statistics having all the members available on each task. NEPTUNE supports either mode of operation, notwithstanding this issue with the physics.

gold2718 · 2023-06-16T17:09:23Z

I agree with the two reviewers who've approved the pull request and think the PR as it currently stands is good enough to go in as-is, with a more comprehensive approach to follow if needed.

So do we have a quorum? @mkavulich, @grantfirl, are there remaining objections to the short-term plan?

gold2718 · 2023-06-16T17:14:16Z

Dividing the job over a separate sets of tasks for each ensemble member consumes about the same total amount of memory as spreading the ensembles over all the tasks.

CAM scales as well but CAM-physics probably uses a lot more memory per column. Still, on runs where we run a few dozen columns per task, I suppose we could run a small ensemble instead (and use more cores). Interesting approach!

Thanks for bearing with me on the discussion (where I started way behind).

mkavulich · 2023-06-16T18:51:48Z

If I recall correctly it was mainly @climbfuji and @gold2718 who advocated for a less hard-coded definition here, so if both of you approve of the current version then I also approve. I'll double-check with @grantfirl to confirm before merging.

climbfuji · 2023-06-16T19:51:59Z

If I recall correctly it was mainly @climbfuji and @gold2718 who advocated for a less hard-coded definition here, so if both of you approve of the current version then I also approve. I'll double-check with @grantfirl to confirm before merging.

I approved it on Apr 29 with a note that in the future we may want to make it configurable (#463 (review))

mkavulich · 2023-06-16T20:06:20Z

Sorry, I am having a bit of trouble following the conclusion here. We decided a couple meetings ago that I should try to pull out the number of instances into a more configurable location, and the PR I opened to John's branch (michalakes#1) is the best effort I could come up with without making invasive changes to the CCPP prebuild. Is the decision to go forward without those modifications?

climbfuji · 2023-06-17T02:45:01Z

Sorry, I am having a bit of trouble following the conclusion here. We decided a couple meetings ago that I should try to pull out the number of instances into a more configurable location, and the PR I opened to John's branch (michalakes#1) is the best effort I could come up with without making invasive changes to the CCPP prebuild. Is the decision to go forward without those modifications?

I looked at your PR and it's totally fine using this. The back and forth here was about the nature of the change in general, i.e. creating multiple instances for members in one MPI task, and how to add this to capgen. I like your PR better than what is this PR now, but in the end it doesn't matter so much since capgen will use something different anyway.

gold2718 · 2023-06-17T10:32:17Z

Is the decision to go forward without those modifications?

Is there any reason why @michalakes can't just accept your PR and then you can merge to main?

michalakes · 2023-06-18T15:38:44Z

Is the decision to go forward without those modifications?

Is there any reason why @michalakes can't just accept your PR and then you can merge to main?

Is there a PR I need to accept? Sorry, I don't see it.

gold2718 · 2023-06-18T15:44:49Z

Is there a PR I need to accept? Sorry, I don't see it.

If I go to your fork and look at Pull Requests, I see this:
michalakes#1

Note this is a PR to your branch so it shows up on your fork, not on the NCAR organization page.

…-physics Make maximum instances a little less hard-coded

michalakes · 2023-06-18T15:55:19Z

If I go to your fork and look at Pull Requests, I see this: michalakes#1

Note this is a PR to your branch so it shows up on your fork, not on the NCAR organization page.

Got it, thanks. All merged. Thanks (That is, I have merged Mike's PR into my fork and it's ready to be merged to main, all-willing)

mkavulich · 2023-06-20T15:57:00Z

Thanks everyone for the conversation and getting this PR wrapped up. I will go ahead and merge now.

michalakes · 2023-06-20T16:22:01Z

Thanks, everyone! From: Michael Kavulich ***@***.***> Sent: Tuesday, June 20, 2023 9:57 AM To: NCAR/ccpp-framework ***@***.***> Cc: John Michalakes ***@***.***>; Mention ***@***.***> Subject: Re: [NCAR/ccpp-framework] Support multiple concurrent instances per MPI task (PR #463) Thanks everyone for the conversation and getting this PR wrapped up. I will go ahead and merge now. — Reply to this email directly, view it on GitHub <#463 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5QAL2SKOEVT6USM23ALTLXMHB5RANCNFSM6AAAAAAVLKP47Y> . You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB5QAL757ANCRYN5JEKWGZDXMHB5RA5CNFSM6AAAAAAVLKP472WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS7J7Y4E.gif> Message ID: ***@***.*** ***@***.***> >

On branch jm-pr-multiple-instances-of-ccpp-physics

4f4c0bb

Changes to add an instance index for CCPP physics Changes to be committed: modified: scripts/mkstatic.py modified: src/ccpp_types.F90 modified: src/ccpp_types.meta

michalakes marked this pull request as ready for review April 29, 2023 12:08

michalakes requested review from climbfuji, grantfirl, gold2718 and mkavulich as code owners April 29, 2023 12:08

climbfuji approved these changes Apr 29, 2023

View reviewed changes

gold2718 approved these changes Apr 30, 2023

View reviewed changes

gold2718 self-requested a review May 1, 2023 17:19

gold2718 mentioned this pull request May 1, 2023

Allow multiple instances of CCPP physics in single executable for ensemble DA NCAR/ccpp-physics#999

Closed

mkavulich changed the title ~~On branch jm-pr-multiple-instances-of-ccpp-physics~~ Support multiple concurrent instances per MPI task May 15, 2023

mkavulich added 2 commits May 22, 2023 14:38

Move definition of the hard-coded limit of concurrent instances to co…

11db46d

…mmon.py

Correctly include new CCPP_NUM_INSTANCES variable

e72107f

gold2718 approved these changes Jun 16, 2023

View reviewed changes

Merge pull request #1 from mkavulich/jm-pr-multiple-instances-of-ccpp…

b754986

…-physics Make maximum instances a little less hard-coded

mkavulich merged commit 57d268c into NCAR:main Jun 20, 2023

grantfirl mentioned this pull request Oct 11, 2023

initialized_set_block bug fix #503

Merged

gold2718 mentioned this pull request Apr 29, 2024

Add "number of instances" dimension to module-level variables to support multi-instance usage #561

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple concurrent instances per MPI task #463

Support multiple concurrent instances per MPI task #463

michalakes commented Feb 28, 2023 •

edited by gold2718

Loading

climbfuji left a comment

gold2718 left a comment

gold2718 Apr 30, 2023

michalakes May 1, 2023

mkavulich Jun 13, 2023

gold2718 Jun 15, 2023

michalakes Jun 15, 2023

gold2718 Jun 15, 2023

gold2718 Jun 15, 2023

michalakes Jun 15, 2023 •

edited

Loading

gold2718 Jun 15, 2023

michalakes Jun 15, 2023

gold2718 commented May 1, 2023 •

edited

Loading

climbfuji commented May 1, 2023

michalakes commented May 10, 2023

gold2718 commented Jun 16, 2023

climbfuji commented Jun 16, 2023

gold2718 commented Jun 16, 2023

michalakes commented Jun 16, 2023

gold2718 commented Jun 16, 2023

gold2718 commented Jun 16, 2023

mkavulich commented Jun 16, 2023

climbfuji commented Jun 16, 2023

mkavulich commented Jun 16, 2023

climbfuji commented Jun 17, 2023

gold2718 commented Jun 17, 2023

michalakes commented Jun 18, 2023

gold2718 commented Jun 18, 2023

michalakes commented Jun 18, 2023 •

edited

Loading

mkavulich commented Jun 20, 2023

michalakes commented Jun 20, 2023 via email

Support multiple concurrent instances per MPI task #463

Support multiple concurrent instances per MPI task #463

Conversation

michalakes commented Feb 28, 2023 • edited by gold2718 Loading

Short description

Changes to be committed:

Description

climbfuji left a comment

Choose a reason for hiding this comment

gold2718 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michalakes Jun 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gold2718 commented May 1, 2023 • edited Loading

climbfuji commented May 1, 2023

michalakes commented May 10, 2023

gold2718 commented Jun 16, 2023

climbfuji commented Jun 16, 2023

gold2718 commented Jun 16, 2023

michalakes commented Jun 16, 2023

gold2718 commented Jun 16, 2023

gold2718 commented Jun 16, 2023

mkavulich commented Jun 16, 2023

climbfuji commented Jun 16, 2023

mkavulich commented Jun 16, 2023

climbfuji commented Jun 17, 2023

gold2718 commented Jun 17, 2023

michalakes commented Jun 18, 2023

gold2718 commented Jun 18, 2023

michalakes commented Jun 18, 2023 • edited Loading

mkavulich commented Jun 20, 2023

michalakes commented Jun 20, 2023 via email

michalakes commented Feb 28, 2023 •

edited by gold2718

Loading

michalakes Jun 15, 2023 •

edited

Loading

gold2718 commented May 1, 2023 •

edited

Loading

michalakes commented Jun 18, 2023 •

edited

Loading