-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document download scripts for models #118
Comments
Hi @znichollscr, could you please remind me the motivation for providing this ("collection of example downloads")? I would like to understand the reasons behind this effort before sending you what our model(s) use or getting into a long thread on why this ("example downloads"), despite being well-intended, may not be so useful :-). |
That's a good point, the ask was too vague. The intent: for some data sets, we have quite a lot of data, not all of which modelling groups will need. My hope was to build up a resource that shows groups how they can pick just the data they need, to help them navigate the available data, avoid them just downloading it all and hopefully avoid some confusion. I started thinking about this because of the GHG concentration and emissions data. For the GHG concentration, we're providing data on 5 different grids and for 43 different species. No group needs all that, so showing them how to filter for just what they're interested seemed a good idea, and I figured that I may as well use a real use case rather than just making up a pretend group that only uses global-means. For emissions, they're providing data on 2 grids and have this split between 'main' and 'supplementary'. My thinking was basically the same, it's better to use a real use-case rather than inventing a group that uses all the main data on a 0.5 degree grid and then a few (but not all) the files from the supplementary. Perhaps a better, much narrower set of questions to get us started then:
|
@vnaik60 I believe what @znichollscr is pointing out here that my vague "use esgpull to get input4MIPs data" is a useless comment. Whereas, if you have a little recipe that allows (for e.g. NOAA-GFDL) to get your target data beginning-to-end, then this is a more tangible and useable example that gets modeling groups moving far more quickly than having to work all this stuff out themselves with little guidance (or examples).. |
Thanks both! We do not have a recipe at GFDL for downloading input4mips datasets, we download all that is available with the thought that someone in the lab may need the dataset at some point. Of course, I will acknowledge that we have a never-ending archive which facilitates this, so at GFDL we are privileged! I can see how this maybe a useful endeavor, especially for newcomer modeling groups who are just spinning up on running CMIP simulations. However, I would not recommend doing this exercise and rather focusing on documenting each dataset with the data provider's recommendations on dos and donts related to their datasets. My reasons are as follows:
I think what is definitely needed is a forcing dataset guide or manual (in addition to the nice table @znichollscr and @durack1 have worked on) that describes in little bit more detail on what is available on ESGF and how it can or not be used (separate from a journal paper), just like here, and more specifically here , here, and here. Short answers to your specifica Qs:
yes, monthly.
already mentioned above. For chemistry, we use latitudinally varying CH4 concentrations for lower boundary conditions. And there are other configurations of the model that have different dataset needs - for example, we also run with CH4 emissions in which case we do not specify CH4 concentrations.
CMIP class simulations use 0.5deg but as I mentioned above, 0.1deg is used by our variable-grid resolution model. Here are the species needed by our most comprehensive ESM model: |
Thanks @vnaik60, super helpful to understand and very well explained!
Got it, that's a good next step then! |
When we get to writing these docs, we should try to blend them with the air table FAQs too. FAQs here: https://wcrp-cmip.org/cmip7-task-teams/forcings/#frequently_asked_questions_faqs |
Can I suggest for the documentation instead of a google doc like CMIP6 Forcings Dataset summary that we have a Zenodo versioned document? |
Absolutely! I was hoping to pull up a demo that was built like our current docs (so, managed and updated as code as part of this repo, so things are managed in one place). We could then automatically publish updates on Zenodo as needed/desired. |
@znichollscr a demo would be wonderful! And if our data providers can have access to enter the information in the doc, that would be totally dream come true! I was thinking about additional (more than that contained in the metadata) information from modelers' perspective to ensure that the forcings datasets are implemented as intended by the data providers. Here is an initial draft (pasting editable table messed up the format, hence the image) Additionally, there was a request for summary statistics "Could summary statistics (e.g., timeseries of global emission totals) be provided together with the released datasets, so that modellers could confirm that they are ingesting these datasets correctly into their models?" and we also discussed documentation here. So looks like we know what we want (at least to begin with), it is just a matter of producing the right information in the right format. |
It is totally a good idea to keep information about the datasets close to this registered info (input4MIPs_CVs), markdown is very adaptable for this format as well, and could easily be lifted into zenodo as well.. |
This is still on my to-do list, but while that hasn't happened, another source of info to keep our eyes on: https://padlet.com/cmipipo/the-now-cmip7-deck-forcing-suite-zhtlaqh8qrmltktt/wish/lDK1ZRb7z7RMZJ9z |
@vnaik60 a question for you!
It seems pretty clear that we're going to have more data on input4MIPs than any one modelling center needs (e.g. you don't need 5 different resolutions of greenhouse gas concentrations, you'll only need one). Hence, our advice to modelling centers will be more nuanced than, "Download all the data".
I was thinking that it would be very helpful if we started a collection of example downloads. I was hoping we could start with your model.
I think the requirements would be relatively simple. We would need to know, for your model:
Then, probably it's also helpful to document any post-processing steps which are likely to be used by multiple models. For example, processing Thomas' data onto the wavelengths of interest for your model.
My instinct is to put this documentation in this repository, so we can update it as soon as we have new data landing. It might make more sense to put it elsewhere of course, so open to suggestions!
cc @durack1
The text was updated successfully, but these errors were encountered: