Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple concurrent instances per MPI task #463

Merged

Conversation

michalakes
Copy link
Contributor

@michalakes michalakes commented Feb 28, 2023

Short description

Support multiple concurrent instances per MPI task
closes #469

Changes to be committed:

modified:   scripts/mkstatic.py
modified:   src/ccpp_types.F90
modified:   src/ccpp_types.meta

Description

In order to have an ensemble of model instances in a JEDI executable, we
need for packages the model relies on to be “stateless” – meaning
that the data for an instances are not shared between instances. A more
comprehensive solution would be to implement physics so that all such
data is either passed in via arguments to the package, or allocated as
an instance of a class defining the physics.

Deferring the larger issue of revisiting CCPP requirements to cover
support for concurrent instances, this PR takes a more modest approach. It
adds a new "instance" dimension to the types already defined for GFS
physics (GFS_control, GFS_data, GFS_interstitial). And it includes some
changes that were needed to address first-time initialization latches
inside certain packages.

One of these, in Thompson MP, declears the heap allocated fields by ccpp
instance. The issue here was that only the first instance of Thompson
MP was initializing itself. The other guards against trying to allocate
data that has already been allocated (e.g. h2ointerp and ozinterp).

The extent of packages touched by the PR is only one NRL-used suite that
is a subset of all the packages in CCPP.

User interface changes?: No

Fixes:

Allow multiple instances of CCPP physics in single executable for ensemble DA #999

NCAR/ccpp-physics#999

Testing:

Has been successfully tested in NRL NEPTUNE code, which includes a twoinsts-physics in its CI test harness.

test removed:
unit tests:
system tests:
manual testing:

	Changes to add an instance index for CCPP physics
 Changes to be committed:
	modified:   scripts/mkstatic.py
	modified:   src/ccpp_types.F90
	modified:   src/ccpp_types.meta
@michalakes michalakes marked this pull request as ready for review April 29, 2023 12:08
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that for now a hardcoded maximum of 200 ensemble members is enough ... maybe that needs to become configurable in the future.

Copy link
Collaborator

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving because it is not going to get in the way of our work.
However, like @climbfuji, I wonder about the hard coded limit. Can it be checked or eliminated?

@@ -896,7 +896,7 @@ class Group(object):
private
public :: {subroutines}

logical, save :: initialized = .false.
logical, dimension(200), save :: initialized = .false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The save is redundant, it is on by default at module scope.
How / where is the index initialized (there is nothing in this PR to indicate that as far as I can tell)?
Could the 200 limit be a public parameter that can be checked somewhere in case of an overflow?
Better yet, is it known when the write method is called? The actual size could be passed in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having this as a public parameter is better than hard-coding it to 200 (though that's plenty high enough for NRL's purposes). I'm at the edge of my practical knowledge of CCPP's innards and would need to spend some time figuring out how to add the size parameter. Someone else reading this would probably know how to do it in 10 minutes. If so, I support that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalakes I have opened a PR onto your branch that makes this value a little less hard-coded (defined in common.py). Ideally it would have been defined in the host model's config file, but it looks like the order of operations in the ccpp_prebuild.py script doesn't allow for that without relatively invasive changes, and that doesn't seem worth the effort for the prebuild script.

@gold2718 @climbfuji Let me know if you have any comments on this, or if there's something I'm missing. It does seem like it should be relatively easy to define this variable in the host config file with capgen, since all the config variables are available in the run_env structure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich,

The save is redundant, it is on by default at module scope. How / where is the index initialized (there is nothing in this PR to indicate that as far as I can tell)? Could the 200 limit be a public parameter that can be checked somewhere in case of an overflow? Better yet, is it known when the write method is called? The actual size could be passed in.

While the size can easily be entered and entered around passed around with capgen, that will not help with either non-compliant parameterizations or with capgen itself.
capgen makes use of module variables to handle automatic memory management for variables that need to be passed between suite phases (e.g., created at init time and used at run time). To accomodate multiple instances would require a pretty extensive reworking of this infrastructure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just observe that instances is a one-dimensional array of logicals. One could hard code its dimension to 2000 (well beyond any foreseeable number of ensemble members) and it would still probably use less storage than what's devoted to string constants in CCPP. Perfect being enemy of the good, I would hard-code, putting in an error check somewhere, and move on with this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I respectfully request that it be approved and merged and that that we worry about hypotheticals if and when they arise. Agile.

I understand what you are saying but it would also help if you had followed the published workflow for this project (https://github.com/NCAR/ccpp-framework/wiki/Development-Workflow). Also, the governance committee is supposed to approve changes to the functionality of the framework before they are implemented. I will not be a holdout in the sense if the other reviewers approve, I will as well. However, the issues of non-compliant physics schemes are not hypotheticals. capgen is also not hypothetical and IMHO anything which makes unification harder should definitely be approved by the CCPP Framework governance committee.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areinecke,

My understanding of the purpose of the logical is to indicate if shared data that gets initialized/ingested only occurs once. In your example if each instance needed that array then it will be initialized.

I think these statements are correct. However, the module variables created by capgen are 2-D or 3-D variables and they are there because the suite modifies them. Every instance will need its own copy of each module variable which can be done but will require quite a bit of work because this is a completely new feature that was not anticipated. Since these are usually 3-D variables, hard coding a large number will indeed be a lot of memory. I do not see how the logical can help with this implementation (but please educate me if I'm still missing something).

Copy link
Contributor Author

@michalakes michalakes Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I respectfully request that it be approved and merged and that that we worry about hypotheticals if and when they arise. Agile.

I understand what you are saying but it would also help if you had followed the published workflow for this project (https://github.com/NCAR/ccpp-framework/wiki/Development-Workflow). Also, the governance committee is supposed to approve changes to the functionality of the framework before they are implemented. I will not be a holdout in the sense if the other reviewers approve, I will as well. However, the issues of non-compliant physics schemes are not hypotheticals. capgen is also not hypothetical and IMHO anything which makes unification harder should definitely be approved by the CCPP Framework governance committee.

You're quite right and I respect your process. Could you (plural) please expedite this through the process. The PR is solid and addresses needed functionality. If there are tweaks needed then please go ahead. We've already done the heavy lifting on this. As to whether the feature was anticipated or not, I'd only point out that it supports ensembles and if it wasn't, it should have been.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to whether the feature was anticipated or not, I'd only point out that it supports ensembles and if it wasn't, it should have been.

Understood, however, until now, I have never seen ensembles in the same MPI task so I missed that possibility.
I have not been going to any meetings (where I understand this PR has been discussed). Can anyone fill me (or us) in on the current thinking on this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, however, until now, I have never seen ensembles in the same MPI task so I missed that possibility. I have not been going to any meetings (where I understand this PR has been discussed). Can anyone fill me (or us) in on the current thinking on this PR?

Not sure if this is what you what you are asking (others please chime in), but in addition to supporting ensembles the feature is also needed to support our 4DVAR, from which the tangent linear and non-linear instances of the model are calling the same (but separate instances of) the physics. There are probably other use cases wherein multiple instances of the same physics would need to be invoked within the same process (superparameterizations, maybe?) Thanks for being open to considering this further.

@gold2718
Copy link
Collaborator

gold2718 commented May 1, 2023

I have been thinking about this PR and have a few concerns (so I'm withdrawing my approval for now).
First, it would really help to have a completed PR desription (the template text is still there at the moment).
Second, please reference the appropriate issue number in the PR description (i.e., why are you making this change). In this case, there is at least one issue (NCAR/ccpp-physics#999), are there others?

It seems that the real use case is that you want to run multiple instances of a model in the same set of tasks (multiple overlapping instances), is that correct?

Can someone point me to the section of the current CCPP documentation (https://ccpp-techdoc.readthedocs.io/en/v6.0.0/) that covers rules for the CCPP framework? I do not see anything and in particular am having trouble finding anything similar to the "driver" rules section of the historical rules document (D1 -- D35 in https://dtcenter.org/sites/default/files/community-code/ccpp-requirements-historical.pdf). Still, looking at both documents, I do not see any requirement that states or implies that multiple overlapping instances of a model needs to be supported by the CCPP.

I think this is important because supporting multiple overlapping instances puts requirements on the CCPP Framework as well as on CCPP Physics Parameterizations and a clear requirement will guide both framework and parameterization development. As pointed out in the issue, there are existing physics parameterizations that contain module variables (e.g., h2ointerp) that might contain model state. Please correct me if I am wrong but if a parameterization holds model state (e.g., an evolving cloud PDF) as a module variable, then multiple instances would all be trying to use the same data which seems bound to fail (even without threading). @michalakes sort of pointed this out back in March when he said:

It's a problem if instances need different ozone or h2o data

Can the CCPP governance committee please issue a statement on this requirement and get it into documentation somewhere? If it is decided that supporting multiple overlapping instances of a physics suite is a CCPP requirement, can someone please add a "capgen unification" issue for it?

@climbfuji
Copy link
Collaborator

@michalakes I remember having a second new member of the ccpp_types data structure to capture the maximum number of instances (default one). This could be set during the ccpp (framework) init phase and then used to tell the physics to allocate the heap data to the correct size?

@michalakes
Copy link
Contributor Author

I have been thinking about this PR and have a few concerns (so I'm withdrawing my approval for now). First, it would really help to have a completed PR description (the template text is still there at the moment). Second, please reference the appropriate issue number in the PR description (i.e., why are you making this change). In this case, there is at least one issue (NCAR/ccpp-physics#999), are there others?

Done

It seems that the real use case is that you want to run multiple instances of a model in the same set of tasks (multiple overlapping instances), is that correct?

Yes, we want multiple concurrent instances of a model, and thus multiple concurrent instances of CCPP and other libraries it uses, to work correctly when running on the same process, that is, MPI task.

@mkavulich mkavulich changed the title On branch jm-pr-multiple-instances-of-ccpp-physics Support multiple concurrent instances per MPI task May 15, 2023
@gold2718
Copy link
Collaborator

@michalakes,

I respect your process

and

Thanks for being open to considering this further.

The reason the first step in our process is Open an Issue is so that discussions like this can happen well before the PR. I'm entering a discussion below since there is still no issue for this feature.

As for capgen, I would treat this by adding a new dimension to module-level variables. I could allocate enough space to hold the data for each instance and select the current instance by using the ccpp_instance variable (which would probably be passed in from the host like the thread number but that is flexible in capgen).
However, to do this, I would need a new CCPP variable, call it ccpp_number_of_instances (similar to number_of_openmp_threads) which would be available at initialization time to be able to allocate the correct number of instance slots.
In this case, we would just add a switch to capgen (e.g., --multi-instance) that would trigger this behavior. At that point, the number of instances is set by the host model at run time.

Thoughts on this approach? Is this an approach that could also be used in mkstatic to avoid the hard-coding issue (which seems to be causing at least some discomfort aside from the memory issue)?

And a side point, y'all must have a lot of memory to be able to run multiple instances in a single task (but maybe that's just the mental scar tissue left over from the Blue Gene era talking :)

@climbfuji
Copy link
Collaborator

Sorry for my late reply, I was on leave for the last 2+ weeks.

I think it's worth noting that @michalakes and @areinecke joined the CCPP developer meeting a few months ago to explain exactly what the problem is and how this PR addresses it. So I'd say they did what we expected them to do. Unfortunately @gold2718 wasn't on that call and therefore didn't get all the information. I get it that issues need to be created, but at the same time the developer meeting is the main system for discussing/proposing changes to the framework and if someone makes the effort to show up there that should be good enough.

Regarding the proposed solution for capgen above (#463 (comment)), this is what I would have done for ccpp_prebuild as well. But, given that we'll be working actively on the unification and that capgen will replace prebuild in the near future, I thought it would be ok to hardcode it for now in prebuild.

@gold2718
Copy link
Collaborator

I get it that issues need to be created, but at the same time the developer meeting is the main system for discussing/proposing changes to the framework and if someone makes the effort to show up there that should be good enough.

Sure but there is still a community that will not find out what is going on this way (heck, even I didn't). I took a minute and closed that gap (mostly referencing the really good description at NCAR/ccpp-physics#999).

@michalakes
Copy link
Contributor Author

Thoughts on this approach? Is this an approach that could also be used in mkstatic to avoid the hard-coding issue (which seems to be causing at least some discomfort aside from the memory issue)?

I agree with the two reviewers who've approved the pull request and think the PR as it currently stands is good enough to go in as-is, with a more comprehensive approach to follow if needed. I'm a dilatant when it comes to the ins and outs of CCPP's infrastructure. I managed to learn just enough to incorporate and test the changes to get our project rolling again and to provide back to you in the form of the current PR. So beyond what I've already said, I don't have any additional thoughts on your proposal other than to encourage you all to go forward expeditiously in whatever manner satisfies your requirements and processes.

And a side point, y'all must have a lot of memory to be able to run multiple instances in a single task

The NEPTUNE code scales well both in memory and computationally. Dividing the job over a separate sets of tasks for each ensemble member consumes about the same total amount of memory as spreading the ensembles over all the tasks. There are good scaling efficiency arguments for having each ensemble member on its own set of tasks; but there are also advantages for computing the ensemble statistics having all the members available on each task. NEPTUNE supports either mode of operation, notwithstanding this issue with the physics.

@gold2718
Copy link
Collaborator

I agree with the two reviewers who've approved the pull request and think the PR as it currently stands is good enough to go in as-is, with a more comprehensive approach to follow if needed.

So do we have a quorum? @mkavulich, @grantfirl, are there remaining objections to the short-term plan?

@gold2718
Copy link
Collaborator

Dividing the job over a separate sets of tasks for each ensemble member consumes about the same total amount of memory as spreading the ensembles over all the tasks.

CAM scales as well but CAM-physics probably uses a lot more memory per column. Still, on runs where we run a few dozen columns per task, I suppose we could run a small ensemble instead (and use more cores). Interesting approach!

Thanks for bearing with me on the discussion (where I started way behind).

@mkavulich
Copy link
Collaborator

If I recall correctly it was mainly @climbfuji and @gold2718 who advocated for a less hard-coded definition here, so if both of you approve of the current version then I also approve. I'll double-check with @grantfirl to confirm before merging.

@climbfuji
Copy link
Collaborator

If I recall correctly it was mainly @climbfuji and @gold2718 who advocated for a less hard-coded definition here, so if both of you approve of the current version then I also approve. I'll double-check with @grantfirl to confirm before merging.

I approved it on Apr 29 with a note that in the future we may want to make it configurable (#463 (review))

@mkavulich
Copy link
Collaborator

Sorry, I am having a bit of trouble following the conclusion here. We decided a couple meetings ago that I should try to pull out the number of instances into a more configurable location, and the PR I opened to John's branch (michalakes#1) is the best effort I could come up with without making invasive changes to the CCPP prebuild. Is the decision to go forward without those modifications?

@climbfuji
Copy link
Collaborator

Sorry, I am having a bit of trouble following the conclusion here. We decided a couple meetings ago that I should try to pull out the number of instances into a more configurable location, and the PR I opened to John's branch (michalakes#1) is the best effort I could come up with without making invasive changes to the CCPP prebuild. Is the decision to go forward without those modifications?

I looked at your PR and it's totally fine using this. The back and forth here was about the nature of the change in general, i.e. creating multiple instances for members in one MPI task, and how to add this to capgen. I like your PR better than what is this PR now, but in the end it doesn't matter so much since capgen will use something different anyway.

@gold2718
Copy link
Collaborator

Is the decision to go forward without those modifications?

Is there any reason why @michalakes can't just accept your PR and then you can merge to main?

@michalakes
Copy link
Contributor Author

Is the decision to go forward without those modifications?

Is there any reason why @michalakes can't just accept your PR and then you can merge to main?

Is there a PR I need to accept? Sorry, I don't see it.

@gold2718
Copy link
Collaborator

Is there a PR I need to accept? Sorry, I don't see it.

If I go to your fork and look at Pull Requests, I see this:
michalakes#1

Note this is a PR to your branch so it shows up on your fork, not on the NCAR organization page.

…-physics

Make maximum instances a little less hard-coded
@michalakes
Copy link
Contributor Author

michalakes commented Jun 18, 2023

If I go to your fork and look at Pull Requests, I see this: michalakes#1

Note this is a PR to your branch so it shows up on your fork, not on the NCAR organization page.

Got it, thanks. All merged. Thanks (That is, I have merged Mike's PR into my fork and it's ready to be merged to main, all-willing)

@mkavulich
Copy link
Collaborator

Thanks everyone for the conversation and getting this PR wrapped up. I will go ahead and merge now.

@mkavulich mkavulich merged commit 57d268c into NCAR:main Jun 20, 2023
@michalakes
Copy link
Contributor Author

michalakes commented Jun 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow multiple instances of CCPP physics in single executable for ensemble DA
5 participants