Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need tools to convert to and parse new metadata headers #171

Closed
gold2718 opened this issue Jan 30, 2019 · 17 comments
Closed

Need tools to convert to and parse new metadata headers #171

gold2718 opened this issue Jan 30, 2019 · 17 comments

Comments

@gold2718
Copy link
Collaborator

In order to convert CCPP physics to the new metadata standard, we need a conversion tool plus code that parses files with the new metadata headers. Some issues regarding the conversion are:

  • standard names will be converted to lower case and have dashes and periods converted to underscores
  • questionable units (e.g., blank, 'none', 'count', 'flag', 'DDT' vs. 'various') are not being converted
  • The first three dimensions default to 'horizontal_dimension', 'vertical_dimension', and 'number_of_tracers'
  • Old metadata tables with non-standard usage of initial exclamation marks will not be converted
@climbfuji
Copy link
Collaborator

Thanks, I saw the pull request that contains a superset of what we need to get this job done. Let's see together what the minimum code is that I need to pull out from the PR to perform the conversion and read the basics of the new metadata (i.e. what we read today) into the old ccpp_prebuild.py system.

@climbfuji
Copy link
Collaborator

This topic was discussed and several changes were proposed or agreed on during the CCPP framework meeting on January 31, 2019. The meeting minutes are here: https://github.com/NCAR/ccpp-framework/wiki/CCPP-Framework-Meeting-Minutes-2019%E2%80%9001%E2%80%9031

@gold2718
Copy link
Collaborator Author

gold2718 commented Feb 1, 2019

One of the potential changes discussed during the meeting was conversion of metadata tables to be pure config file format (including removing the no-longer-needed Fortran comment characters).
To do this, we need to ensure there is a good place for the arg_table_name. One suggestion is to add a new section called [arg_table] which could have a name key and possibly other keys (e.g., type = DDT or type = scheme with scheme being the default).

Note that it will not be possible to read a .md file using the ConfigParser.read_file command as variables from different headers with the same local name (e.g., errmsg) will appear to be duplicate sections.
This means that there has to be some other file-reading function which means our arg_table name entry can be in any format we choose.

Thoughts?

@climbfuji
Copy link
Collaborator

Thanks for sharing these ideas. The option to read the files using a Python config parser came out of the blue and is by no means something we need now (or anytime soon if at all). It definitely shouldn't be in the way of implementing the best possible option for our purposes.

@gold2718
Copy link
Collaborator Author

gold2718 commented Feb 1, 2019

So are you a fan of removing the exclamation marks?
Suggested syntax for the header?
Idea 1:
arg_table_<scheme_name> (basically the current format).
idea 2:

[scheme_name]
  type = scheme

@climbfuji
Copy link
Collaborator

Can we think of any good arguments for keeping or removing the '!!'?

Pro removing:

  • they are not really required for the formatting, because we are outside a Fortran file
  • we won't be dumping the contents of xxx.md into the documentation directly (i.e. raw input in the Fortran source code using a doxygen include command)?
  • to someone new this will look strange - why need '!!' at the beginning of every line
  • makes it more readable (and some editors will automatically switch to config-file syntax highlighting once the '!!' are gone)

Pro keeping:

  • in line with what we had until now
  • less work for you
  • if we decided to dump the contents of xxx.md into the documentation directly through a Fortran source code, then that would work - but it would require a different prefix than '!!' if a .md file describing a C/C++ code was to be imported into the source code in raw format to generate documentation

If we were to go without the '!!', I like idea 1 better, because it is easier to see the hierarchy. Maybe @ligiabernardet and @grantfirl have some input, too.

@gold2718
Copy link
Collaborator Author

gold2718 commented Feb 1, 2019

I vote for pro-removing. I would also like to pitch a third idea for the beginning of a metadata header:

#######################################
[ccpp-arg-table]
  name = <scheme_name>
  type = scheme

where the hash marks are optional (any line beginning with a hash mark is a comment line in keeping with the config format even though the files will not meet strict format compliance). type will default to scheme so is really only needed for DDT or MODULE.

Another item which could be included in this section is source_file (a bit more robust and flexible than relying on a file name). Using a section allows for flexibility over time.

@grantfirl
Copy link
Collaborator

I'm definitely in the pro-removing camp for the reasons that Dom mentioned. Given the length of the .md file, I can't see the need to ever dump this back into the source, and even if the need arose, it would be easy to prepend the appropriate comment characters with scripting. If we're really divorcing the metadata from the source, I just don't see the point in going "half way" and keeping the formatting in there. For the header, I don't have a strong preference, although I second Dom's suggestion that the "hierarchy" needs to be clear. So, I would be on board with ideas 2 or 3 as long as there is white space or some other delineation between the header and the start of the variable metadata.

@gold2718
Copy link
Collaborator Author

gold2718 commented Feb 1, 2019

Another potential item for the [ccpp-arg-table] table header section is the ability to capture doxygen information. For instance, adding:

  document_new_section = True

(default False) would allow me to capture the current distinction between metadata headers that begin new doxygen sections and those that do not.

@grantfirl
Copy link
Collaborator

After looking at the doxygen rules for Fortran, it looks like '!>' is valid to start a section OR to continue a section, whereas '!!' is continue only. Given this, I would say that it isn't necessary for the
document_new_section = True line.

@grantfirl
Copy link
Collaborator

grantfirl commented Feb 2, 2019

It looks like Dom's proposed solution captured in the January 31, 2019 meeting notes (https://github.com/NCAR/ccpp-framework/wiki/CCPP-Framework-Meeting-Minutes-2019%E2%80%9001%E2%80%9031) will work just fine. That is, when the metadata is removed from the source files into *.md, the converter will need to leave the following lines in order for the Doxygen parser to pick up the table

!> \section arg_table_(subroutine_name) Argument Table
!! \htmlinclude (subroutine_name).html

As part of the make doc step, we'll have a script read in all *.md files and spit out *_subroutine(a,b,c).html files in a new directory within the physics/docs directory. The path to this directory will need to be saved in the 'EXAMPLE_PATH' of the doxygen configuration file and the extra style sheet can be used to pretty up the html tables.

I've tested this on Man's documentation branch with a test HTML file and everything works just dandy.

@climbfuji
Copy link
Collaborator

Great, thank you so much for looking into this @grantfirl !

@gold2718
Copy link
Collaborator Author

gold2718 commented Feb 2, 2019

So what should the top of a metadata section look like in the new system? I need a way to know for sure that I am starting a new header and also need to know the name and type (e.g., scheme, ddt, module). Thoughts? Votes? Ruling?

@climbfuji
Copy link
Collaborator

In order to proceed, shall we go with option 3?

#######################################
[ccpp-arg-table]
  name = <scheme_name>
  type = scheme

As Grant said, we won't need the document_new_section option.

I am not sure we need source_file either, as long as the logic is to give CCPP a list of Fortran source files (i.e., schemes) with the convention that the metadata table is contained in NAME_OF_THE_SOURCE_FILE_WITHOUT_SUFFIX.md.

@gold2718
Copy link
Collaborator Author

gold2718 commented Feb 5, 2019

@grantfirl, any objection to selection from @climbfuji?
If so, how about a counter-proposal?

@grantfirl
Copy link
Collaborator

No objection. I agree with Dom's choice.

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Feb 5, 2019 via email

gold2718 added a commit that referenced this issue Mar 23, 2019
Version 2 contains config-file-like format in a separate file.
Dimensions are taken from the Fortran code and converted to standard names while possible.
The conversion process generates warnings where it was unable to do a complete conversion.

Fixes #171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants