Feature 364 add tci calc #369

DanielAdriaansen · 2024-04-05T22:15:08Z

In this PR, there is a new diagnostic function to support the calculation of the Terrestrial Coupling Index. This calculation was put into metcalcpy/diagnostics/land_surface.py. The intent is to have a new .py file under diagnostics for categories that loosely correspond to the model_applications use case categories in METplus Wrappers, with any additional categories as needed, to try and keep things aligned as best as possible.

In an effort to document the functions placed in this area, I implemented new "API" documentation through the Sphinx autosummary extension (https://www.sphinx-doc.org/en/master/usage/extensions/autosummary.html). To facilitate this, here is a list of changes I made, and things I discovered along the way:

In docs/conf.py, I had to adjust the insertion of the top level of the METcalcpy module to the path, rather than the docs directory only.
In docs/conf.py, I added the autosummary extension to the list of Sphinx extensions to use. As far as I can tell, this is included with Sphinx and is not a separate Python package.
In docs/conf.py, I added autosummary_generate = True to generate the documentation from autosummary.
I added a new directory under docs to hold the autosummary documentation, called diag_ref. Inside this file, there is a new index.rst file that has RST that controls the display of the top-level HTML of the rendered autosummary documentation.
I added an entry to docs/Users_Guide/index.rst to point to the autosummary documentation in docs/diag_ref so it is available in the Users Guide.
Controlled via the docs/diag_ref/index.rst file, there is a subdirectory under docs/diag_ref called generated where the RST files from autosummary are stored when the docs are built.
I had some trouble getting the function table to contain hyperlinks to the function HTML page itself. After much gnashing of teeth and Google manipulation, I was able to get it to work by copying the module.rst file from the Sphinx installation area into docs/_templates/autosummary. The file I copied was at {PYTHON_BASE}/lib/python3.10/site-packages/sphinx/ext/autosummary/templates/autosummary/module.rst, which now lives and is committed at docs/_templates/autosummary/module.rst. This presents some risk, because if Sphinx changes something about this RST file in a new Sphinx release, we should probably re-copy and commit this template to our repository. I could not figure out why Sphinx autosummary was not using those templates or why it was not working without the file existing in the docs/_templates area. It's possible I am missing a setting in conf.py, but I am not totally sure. I figured this out from a combination of StackOverflow articles, with this one https://stackoverflow.com/a/62613202 being most helpful. This open issue in Sphinx GitHub also was somewhat helpful: By default, make Autosummary templates generate documentation pages for classes, functions, exceptions etc sphinx-doc/sphinx#7912.
The appearance of the function signature rendered in HTML is controlled via docstrings in metcalcpy/diagnostics/land_surface.py. Per @bikegeek, "Google" style docstrings are desired, so the sphinx.ext.napoleon extension was enabled to correctly render the Google docstrings using autosummary.
One important gotcha to note, is that autosummary imports all of your *.py files to generate the documentation. What this means is that the Python environment where the documentation is build (e.g. RTD online and in GHA) now has the same set of dependencies as your entire Python package. For example, previously the docs could build without Pandas and Xarray, but because those are dependencies in diagnostics/land_surface.py, they are now required when building the docs otherwise autosummary reports ImportError. So be sure any additional dependencies added to any *.py file you're writing are added to any RTD and docs requirements txt and yaml files.

Pull Request Testing

Describe testing already performed for these changes:

Added new test in test/test_calc_tci.py
Added new test data in test/data/calc_tci*
Verified function works with METplus Wrappers use case in Enhancement: Enhance Terrestrial Coupling Index (TCI) Use Case METplus#2388.
Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

Verify that I set up the testing properly.
Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes or No]
Yes. These changes come with new documentation.
Do these changes include sufficient testing updates? [Yes or No]
Yes.
Will this PR result in changes to the test suite? [Yes or No]

If yes, describe the new output and/or changes to the existing output:
Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

Add any new Python packages to the METplus Components Python Requirements table.
Review the source issue metadata (required labels, projects, and milestone).
Complete the PR definition above.
Ensure the PR title matches the feature or bugfix branch name.
Define the PR metadata, as permissions allow.
Select: Reviewer(s)
Select: Organization level software support Project or Repository level development cycle Project
Select: Milestone as the version that will include these changes
After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
Close the linked issue and delete your feature or bugfix branch from GitHub.

…tributors guide coding standards.

…sing newer sphinx.

…module.

…file to get the functions to be clickable per sphinx-doc/sphinx#11998.

… to the gitignore.

…/surface applications.

…ring formatting to match Sphinx https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html.

…ht way to do this.

…ifts Python version for building documentation from 3.8 to 3.10.

…rence.

…ser's Guide.

docs/Contributors_Guide/index.rst

bikegeek · 2024-04-05T22:27:34Z

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Co-authored-by: Julie Prestopnik <jpresto@ucar.edu>

DanielAdriaansen · 2024-04-10T21:24:30Z

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Same, 1.26.4. I think this is coming from NaN getting into the calculation somehow. I can dig a bit further.

…cpy into feature_364_add_TCI_calc

DanielAdriaansen · 2024-04-10T22:30:43Z

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Same, 1.26.4. I think this is coming from NaN getting into the calculation somehow. I can dig a bit further.

So the error is originating from this line:

METcalcpy/metcalcpy/diagnostics/land_surface.py

Line 29 in 9b89677

soil_std = soil_data.std(dim='time')

The reason is because for the forecast data, the "Terrestrial Coupling Index" by definition is only computed on land and not over the ocean. The forecast data have NaN values for ocean grid cells, thus the variance formula in NumPy cannot compute the variance at those grid cells.

What I don't understand, is Xarray's std() function has an option called skipna, which I set to True but I still get the warning: https://xarray.pydata.org/en/v2024.02.0/generated/xarray.DataArray.std.html#xarray.DataArray.std. HOWEVER. If I set skipna=False, the warning disappears? This seems odd to me...almost like a bug (but I doubt it is) where True uses NaN and False skips them but I would assume the opposite to be true. I am going to do some more testing.

After digging a bit more (numpy/numpy#15219), it seems that NumPy nanvar() operates on "slices" (which I do not understand how they are created). When some points are NaN in a "slice", then there may not be enough remaining points to compute the variance, and thus the warning is raised.

DanielAdriaansen · 2024-04-11T00:28:01Z

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Same, 1.26.4. I think this is coming from NaN getting into the calculation somehow. I can dig a bit further.

So the error is originating from this line:

METcalcpy/metcalcpy/diagnostics/land_surface.py

Line 29 in 9b89677

soil_std = soil_data.std(dim='time')

The reason is because for the forecast data, the "Terrestrial Coupling Index" by definition is only computed on land and not over the ocean. The forecast data have NaN values for ocean grid cells, thus the variance formula in NumPy cannot compute the variance at those grid cells.

What I don't understand, is Xarray's std() function has an option called skipna, which I set to True but I still get the warning: https://xarray.pydata.org/en/v2024.02.0/generated/xarray.DataArray.std.html#xarray.DataArray.std. HOWEVER. If I set skipna=False, the warning disappears? This seems odd to me...almost like a bug (but I doubt it is) where True uses NaN and False skips them but I would assume the opposite to be true. I am going to do some more testing.

After digging a bit more (numpy/numpy#15219), it seems that NumPy nanvar() operates on "slices" (which I do not understand how they are created). When some points are NaN in a "slice", then there may not be enough remaining points to compute the variance, and thus the warning is raised.

OK, I think I understand. Consider these examples:

few_nan = np.array([np.nan, 1, 2, np.nan, 3])
da_few = xr.DataArray(few_nan)
print(da_few.std(skipna=True)
print(da_few.std(skipna=False)

<xarray.DataArray ()> Size: 8B
array(nan)
<xarray.DataArray ()> Size: 8B
array(0.81649658)

all_nan = np.array([np.nan,np.nan,np.nan,np.nan,np.nan])
da_all = xr.DataArray(all_nan)
print(da_all.std(skipna=True))
print(da_all.std(skipna=False))

/home/dadriaan/.conda/envs/metplus_docs/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
<xarray.DataArray ()> Size: 8B
array(nan)
<xarray.DataArray ()> Size: 8B
array(nan)

The all_nan case is where our error is coming from. In this case, I think the RuntimeWarning is justifiable and we should leave skipna=True. Setting skipna=False would skip valid values in the event they were mixed with NaN on a "slice". For this case, I don't think that would occur, but a user may have different data.

Another option is to make another argument to calc_tci() about how to handle NaN. I think this may be worthwhile. Do we need a test for cases with NaN and without @bikegeek?

…nd xarray. The default is set to True, but a user can override if they wish.

DanielAdriaansen · 2024-04-11T14:58:06Z

Remaining questions for @bikegeek:

Docstring style for functions? Should we switch to Google or NumPy docstrings?
In METcalcpy/test, instead of test_calc_tci.py, should we make a single file called test_land_surface.py, and in there it tests all functions defined in metcalcpy/diagnostics/land_surface.py? I worry if we end up with 100 functions in land_surface.py, we will have 100 test_*.py files, but maybe they should be contained in a general test_land_surface.py file?

bikegeek · 2024-04-11T15:31:55Z

1. I use Google style, but I think as long as we have good documentation, that is what matters. 2. I prefer one test file with multiple tests, then you can re-use test fixtures for setting up and cleaning up similar tests.

…

On Thu, Apr 11, 2024 at 8:58 AM Dan Adriaansen ***@***.***> wrote: Remaining questions for @bikegeek <https://github.com/bikegeek>: 1. Docstring style for functions? Should we switch to Google or NumPy docstrings? 2. In METcalcpy/test, instead of test_calc_tci.py, should we make a single file called test_land_surface.py, and in there it tests all functions defined in metcalcpy/diagnostics/land_surface.py? I worry if we end up with 100 functions in land_surface.py, we will have 100 test_*.py files, but maybe they should be contained in a general test_land_surface.py file? — Reply to this email directly, view it on GitHub <#369 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4UJHTEMWT4F2KNOHREP6DY42QJLAVCNFSM6AAAAABFZ3KFTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBZHA4TEMZTHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

…al land_surface diagnostics in the future.

…cstrings and re-writes the docstring using Google format.

DanielAdriaansen · 2024-04-12T17:05:27Z

This is ready for final review. Thanks all for the feedback so far!

bikegeek

Pydocs in the User's Guide is extremely helpful. Thanks for getting this in.

DanielAdriaansen added 24 commits April 1, 2024 22:39

Changes URL for coding standards to point to the METplus wrappers con…

81fe425

…tributors guide coding standards.

Changes wording from build to make to be consistent with experience u…

79b10fd

…sing newer sphinx.

Changes to allow autosummary to work and document the diagnostics sub…

d406ce8

…module.

Adds new menu entry for the diagnostics submodule.

053ab70

RST file for autosummary to use, with change from default module.rst …

39d3a09

…file to get the functions to be clickable per sphinx-doc/sphinx#11998.

RST file for the diagnostic submodule.

f03c08f

Adds the generated directory from sphinx in the diag_api subdirectory…

ba3a1d9

… to the gitignore.

Adds a new land_surface diagnostic file to contain functions for land…

0d52ca4

…/surface applications.

Adds __pycache__ to gitignore.

6bdee0e

Adds sphinx execution times RST file to gitignore.

a234f1e

Removes cruft from testing, adds exception raising, and updates docst…

8bbf0c5

…ring formatting to match Sphinx https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html.

Restructure function to handle invalid types. This may not be the rig…

5b9fdf5

…ht way to do this.

Removes erroneous menu entries from testing documentation.

7d480c9

Adds test for calc_tci function and test data.

76d468c

Adds Xarray and Pandas dependencies for building documentation, and l…

0e8cf60

…ifts Python version for building documentation from 3.8 to 3.10.

Adds TCI test to the unit test config.

1b0578b

Changes name to remove API.

2ad0c6f

Trying to get GHA to run.

5bd776b

Updating pandas/xarray versions.

133bc89

Double equals.

34f7a48

Changes RST heading level and adds description of the diagnostic refe…

ef029dd

…rence.

Moves the diagnostic calculation reference to a sub-menu within the U…

c62ab3b

…ser's Guide.

Testing failure when readthedocs reports a warning.

54b7694

Adds back xarray readthedocs requirement.

271cfb0

DanielAdriaansen requested review from bikegeek, jprestop and hankenstein2 April 5, 2024 22:15

DanielAdriaansen self-assigned this Apr 5, 2024

jprestop reviewed Apr 5, 2024

View reviewed changes

docs/Contributors_Guide/index.rst Outdated Show resolved Hide resolved

Update docs/Contributors_Guide/index.rst

74bb985

Co-authored-by: Julie Prestopnik <jpresto@ucar.edu>

DanielAdriaansen added 2 commits April 10, 2024 21:27

Adds reference for function.

8fc0a41

Merge branch 'feature_364_add_TCI_calc' of github.com:dtcenter/METcal…

9b89677

…cpy into feature_364_add_TCI_calc

DanielAdriaansen added 3 commits April 11, 2024 14:14

Adds bool for turning on/off the pandas and xarray tests.

45255f0

Adds new skipna argument to the function, which is passed to pandas a…

8a05512

…nd xarray. The default is set to True, but a user can override if they wish.

Renames diag_api directory to diag_ref.

f9f6eb5

DanielAdriaansen linked an issue Apr 11, 2024 that may be closed by this pull request

Add calculation for Terrestrial Coupling Index (TCI) #364

Closed

22 tasks

DanielAdriaansen added 4 commits April 11, 2024 20:29

Renames testing file for land_surface diagnostics to support addition…

12aef2d

…al land_surface diagnostics in the future.

Turns on the Sphinx napoleon extension to support Google and NumPy do…

7d7406b

…cstrings and re-writes the docstring using Google format.

Updates Google docstring to acheive desired formatting.

6b41370

Adds dunder init file.

aae16e8

DanielAdriaansen requested a review from georgemccabe April 12, 2024 16:18

Renames hyperlink and menu entry.

fa28a9c

DanielAdriaansen marked this pull request as ready for review April 12, 2024 17:01

DanielAdriaansen added 2 commits April 12, 2024 17:02

Changed a directory name and forgot to update the ignore file.

f386ba9

Removes directory not intended to be in repository.

2d60265

bikegeek approved these changes Apr 12, 2024

View reviewed changes

bikegeek merged commit c7709a4 into develop Apr 12, 2024
8 checks passed

bikegeek deleted the feature_364_add_TCI_calc branch April 12, 2024 19:41

DanielAdriaansen mentioned this pull request Apr 12, 2024

Add calculation for Terrestrial Coupling Index (TCI) #364

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature 364 add tci calc #369

Feature 364 add tci calc #369

DanielAdriaansen commented Apr 5, 2024 •

edited

Loading

bikegeek commented Apr 5, 2024

DanielAdriaansen commented Apr 10, 2024

DanielAdriaansen commented Apr 10, 2024 •

edited

Loading

DanielAdriaansen commented Apr 11, 2024 •

edited

Loading

DanielAdriaansen commented Apr 11, 2024

bikegeek commented Apr 11, 2024 via email

DanielAdriaansen commented Apr 12, 2024

bikegeek left a comment

Feature 364 add tci calc #369

Feature 364 add tci calc #369

Conversation

DanielAdriaansen commented Apr 5, 2024 • edited Loading

Pull Request Testing

Pull Request Checklist

bikegeek commented Apr 5, 2024

DanielAdriaansen commented Apr 10, 2024

DanielAdriaansen commented Apr 10, 2024 • edited Loading

DanielAdriaansen commented Apr 11, 2024 • edited Loading

DanielAdriaansen commented Apr 11, 2024

bikegeek commented Apr 11, 2024 via email

DanielAdriaansen commented Apr 12, 2024

bikegeek left a comment

Choose a reason for hiding this comment

DanielAdriaansen commented Apr 5, 2024 •

edited

Loading

DanielAdriaansen commented Apr 10, 2024 •

edited

Loading

DanielAdriaansen commented Apr 11, 2024 •

edited

Loading