-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solar data tweak - targeting SOLARIS-HEPPA-CMIP-4-5 #139
Comments
So, in what way would you prefer that field to be populated? |
We just add a global attribute, "license_id", with value "CC BY 4.0", to all files. The current "license" attribute is fine as it is. |
We can do that of course. |
We're trying, but it's a work in progress and lots of things to do. The tool that best captures it (in my opinion) is https://github.com/climate-resource/input4mips_validation. However, as you can see, there's still lots of things we're not capturing (specifically climate-resource/input4mips_validation#73, climate-resource/input4mips_validation#76). Some more details are here: #15. As you can tell, the rules are fuzzy and hard to trace so I would say that the tool linked above is really the most concrete (because it's written in code, not words). |
And sorry @st-bender this is the list of licenses that we are recommending, pick and choose your flavour (of which we are only recommending 1, but could conceivably deal with a CC0 if someone absolutely wanted it) input4MIPs_CVs/CVs/input4MIPs_license.json Lines 1 to 7 in a1e7be3
As a backstory all modelling groups in CMIP6, aside from 1, went with the CC BY 4.0 license, with a single group going CC0 (see here) |
This is a good suggestion, but not quite in place. The best reference is the CMIP6 guidance document, which can be viewed here - for e.g. |
IMHO, that feels a bit backwards. I believe the specifications should come first. That would enable you to focus on the important and necessary things and to find missing or unclear things that can be adjusted. Then you can write the validation tool and make sure that it does what it is supposed to do. It doesn't need to be much text, you could extract the list from your code and put it here, for example. Edit: Especially since everythings is in flux, a simple (draft) list as a basis for discussion seems to be the way to go. |
With all due respect @st-bender, I think you are underestimating the problem a bit. For example, here is the dataset forcing specification from CMIP6 (https://docs.google.com/document/d/1pU9IiJvPJwRvIgVaSDdJ4O0Jeorv_2ekEtted34K9cA/edit?tab=t.0#heading=h.cn9f7982ycw6), it's 7 pages alone and relies hugely on the CF conventions (https://cfconventions.org/), which are 260 pages long in their pdf form (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.pdf). The rules have been written down for a long time. The issue is that words are imprecise (so what is written down is often wrong/self-contradictory) and the rules are very hard to learn from their text form (with the result being that very few people learn them). Having it in code is a) much more precise and b) much more re-usable (you don't need to learn all the rules, you just run the validator and it tells you whether anything is wrong). If you want to help out with this overall process or join the forcings task team, I don't think either Paul or I would complain. As I said though, I think you are underestimating the task we have. You only have 3 files and they are relatively straight forward, so it feels like the solutions should be simpler. However, we're also trying to support datasets with hundreds of files, where really simple solutions stop working (which is why the CF conventions are 260 pages long). |
Maybe I am. I just think that code alone is not a good reference. It might have bugs, and in my experience if it is not very carefully written and well documented, it can be hard to read and understand. And just using it as a black box is not very helpful either.
I think we disagree here. Inconsistencies in the specifications can be discussed and rectified if they are critical, and I am not assuming that this is easy. I just think that with code as the only reference those are much harder to catch. Also how does the code decide which version of the imprecise language to use? But anyway, this is getting offtopic and should be moved somewhere else. |
I agree. If you want to continue it, please do. A final thought on this while I'm here is below (which may be a useful starting point) I hear and understand all the points you're making. If you would like to help the task team, it would be great to have another person who cares about this on board and Paul and I would gladly tell you all of our thoughts on what has and hasn't worked previously, why we think it has/hasn't worked and why we're going the route we're going. I think the specifications in the current code are going in a comprehensible direction, but feel free to take a look yourself (e.g. the current tracking_id validation, creation_date validation). If you're not interested, also fine, Paul and I will keep doing our best and appreciate the time you take to fix things as we start to catch things more and more (for what it's worth, your dataset is super clean, others have been through much more painful re-writes e.g. #123). |
@st-bender thanks for engaging, it's useful to discuss this. @znichollscr is right, a challenge for the forcing datasets is that they veer very far away from the single variable per file format of CMIPx. There are reasons that this makes sense, and a lot of legacy in addition (many modelling groups don't want formats to change as they'd have more work to change their post-processing or model codes). This considerably complicates the edge cases that are appearing to interpret and publish. I agree that having a clearly defined specifications document that works through all the edge cases in an iterative process would be ideal. However, most of the forcing efforts are voluntary, and the ideal is very difficult to realize, in addition to the time pressures that the AR7 and CMIP7 Fast Track impose. If you have feedback on the CMIP6 specifications document (here) please make suggestions, it would be useful to update this to the latest reference alongside @znichollscr input4MIPs-validator tool, which has been a great self-help tool for the data providers that have picked up and used it - it's far more objective and thorough than my or @znichollscr eyes alone! Again, thanks for the engagement here, and I second the invitation to help us out with this process, the more hands the better the project, and the datasets will be! |
@znichollscr
It looks like there are duplicates in metadata.py so I might have looked in the wrong place, but I hope you get the idea. |
@st-bender apologies, we should have pointed you to the CVs (controlled vocabularies) that the input4MIPs-validator is using to validate contributions - see PCMDI/input4MIPs_CVs/CVs |
Got it, thanks. Further discussion can roll into here if it's needed: climate-resource/input4mips_validation#77 |
Hi, |
@st-bender I just took a quick peek at your latest monthly file (below), looks pretty good to me. I note that we had discussed whether "fx" was the right frequency for the piControl climatology data.. And I don't have a better suggestion immediately to add. Out of curiosity, what is the fix/issue that 4.5 will be solving over 4.4? We'll need that description to be caught alongside the 4.4 dataset deprecation and replacement by 4.5. // global attributes:
:title = "SOLARIS-HEPPA CMIP7 historic solar forcing (1850-2023)" ;
:institution_id = "SOLARIS-HEPPA" ;
:institution = "APARC SOLARIS-HEPPA" ;
:activity_id = "input4MIPs" ;
:comment = "The NASA NOAA LASP (NNL) solar variability models were formerly known as the Naval Research Laboratory (NRL) solar variability models. NNL V1 models will become the operational NOAA/NCEI Solar Irradiance Climate Data Record (CDR) V3 in August 2024. The SSI and F10.7 data are taken from V03. Sub-annual variability has been added for the period before 1874; TSI in this file is the integral over SSI from source data between 0 and 100,000nm" ;
:time_coverage_start = "1850-01-01" ;
:time_coverage_end = "2023-12-31" ;
:frequency = "mon" ;
:source = "SSI, TSI, and F10.7 from ssi_v03r00 (Odele Coddington et al., pers. comm.); Ap and Kp from ftp.ngdc.noaa.gov until 2014, afterwards from GFZ Potsdam (https://kp.gfz-potsdam.de)" ;
:source_id = "SOLARIS-HEPPA-CMIP-4-4" ;
:realm = "atmos" ;
:further_info_url = "http://solarisheppa.geomar.de/cmip7" ;
:metadata_url = "see http://solarisheppa.geomar.de/solarisheppa/sites/default/files/data/cmip7/CMIP7_metadata_description_4.4.pdf" ;
:contributor_name = "Bernd Funke, Timo Asikainen, Stefan Bender, Odele Coddington, Thierry Dudok de Wit, Illaria Ermolli, Margit Haberreiter, Doug Kinnison, Judith Lean, Sergey Koldoboskiy, Daniel R. Marsh, Hilde Nesse, Annika Seppaelae, Miriam Sinnhuber, Ilya Usoskin, Max van de Kamp, Pekka T. Verronen" ;
:references = "Funke et al., Geosci. Model Dev., 17, 1217--1227, https://doi.org/10.5194/gmd-17-1217-2024, 2024" ;
:contact = "bernd AT iaa.es" ;
:dataset_category = "solar" ;
:grid_label = "gn" ;
:mip_era = "CMIP6Plus" ;
:target_mip = "CMIP" ;
:variable_id = "multiple" ;
:license = "Solar forcing data produced by SOLARIS-HEPPA is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). The data producers and data providers make no warranty, either expressed or implied, including but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
:Conventions = "CF-1.8" ;
:creation_date = "2024-10-14T09:26:01Z" ;
:source_version = "4.4" ;
:nominal_resolution = "10000 km" ;
:product = "derived" ;
:region = "global" ;
:tracking_id = "hdl:21.14100/f420da79-7a74-49b3-9693-6412888d1499" ;
} |
I have exchanged the files behind the links provided earlier,
Our website updates will soon be underway, stating the reason:
|
Perfect thank you. @durack1 these look good to be put in the publishing queue to me. Links are:
Thank you |
No problem folks, these 3 new files are now live - see |
Issues to solve:
Files have a valid license identifier but not the "license_id" attribute that is being lifted to populate webpages, e.g.,
@znichollscr @st-bender @berndfunke ping - just a note for a very trivial clean up in the next version
The text was updated successfully, but these errors were encountered: