Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 364 add tci calc #369

Merged
merged 37 commits into from
Apr 12, 2024
Merged

Feature 364 add tci calc #369

merged 37 commits into from
Apr 12, 2024

Conversation

DanielAdriaansen
Copy link

@DanielAdriaansen DanielAdriaansen commented Apr 5, 2024

In this PR, there is a new diagnostic function to support the calculation of the Terrestrial Coupling Index. This calculation was put into metcalcpy/diagnostics/land_surface.py. The intent is to have a new .py file under diagnostics for categories that loosely correspond to the model_applications use case categories in METplus Wrappers, with any additional categories as needed, to try and keep things aligned as best as possible.

In an effort to document the functions placed in this area, I implemented new "API" documentation through the Sphinx autosummary extension (https://www.sphinx-doc.org/en/master/usage/extensions/autosummary.html). To facilitate this, here is a list of changes I made, and things I discovered along the way:

  1. In docs/conf.py, I had to adjust the insertion of the top level of the METcalcpy module to the path, rather than the docs directory only.
  2. In docs/conf.py, I added the autosummary extension to the list of Sphinx extensions to use. As far as I can tell, this is included with Sphinx and is not a separate Python package.
  3. In docs/conf.py, I added autosummary_generate = True to generate the documentation from autosummary.
  4. I added a new directory under docs to hold the autosummary documentation, called diag_ref. Inside this file, there is a new index.rst file that has RST that controls the display of the top-level HTML of the rendered autosummary documentation.
  5. I added an entry to docs/Users_Guide/index.rst to point to the autosummary documentation in docs/diag_ref so it is available in the Users Guide.
  6. Controlled via the docs/diag_ref/index.rst file, there is a subdirectory under docs/diag_ref called generated where the RST files from autosummary are stored when the docs are built.
  7. I had some trouble getting the function table to contain hyperlinks to the function HTML page itself. After much gnashing of teeth and Google manipulation, I was able to get it to work by copying the module.rst file from the Sphinx installation area into docs/_templates/autosummary. The file I copied was at {PYTHON_BASE}/lib/python3.10/site-packages/sphinx/ext/autosummary/templates/autosummary/module.rst, which now lives and is committed at docs/_templates/autosummary/module.rst. This presents some risk, because if Sphinx changes something about this RST file in a new Sphinx release, we should probably re-copy and commit this template to our repository. I could not figure out why Sphinx autosummary was not using those templates or why it was not working without the file existing in the docs/_templates area. It's possible I am missing a setting in conf.py, but I am not totally sure. I figured this out from a combination of StackOverflow articles, with this one https://stackoverflow.com/a/62613202 being most helpful. This open issue in Sphinx GitHub also was somewhat helpful: By default, make Autosummary templates generate documentation pages for classes, functions, exceptions etc sphinx-doc/sphinx#7912.
  8. The appearance of the function signature rendered in HTML is controlled via docstrings in metcalcpy/diagnostics/land_surface.py. Per @bikegeek, "Google" style docstrings are desired, so the sphinx.ext.napoleon extension was enabled to correctly render the Google docstrings using autosummary.
  9. One important gotcha to note, is that autosummary imports all of your *.py files to generate the documentation. What this means is that the Python environment where the documentation is build (e.g. RTD online and in GHA) now has the same set of dependencies as your entire Python package. For example, previously the docs could build without Pandas and Xarray, but because those are dependencies in diagnostics/land_surface.py, they are now required when building the docs otherwise autosummary reports ImportError. So be sure any additional dependencies added to any *.py file you're writing are added to any RTD and docs requirements txt and yaml files.

Pull Request Testing

  • Describe testing already performed for these changes:

    Added new test in test/test_calc_tci.py
    Added new test data in test/data/calc_tci*
    Verified function works with METplus Wrappers use case in Enhancement: Enhance Terrestrial Coupling Index (TCI) Use Case METplus#2388.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

    Verify that I set up the testing properly.

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes or No]
    Yes. These changes come with new documentation.

  • Do these changes include sufficient testing updates? [Yes or No]
    Yes.

  • Will this PR result in changes to the test suite? [Yes or No]

    If yes, describe the new output and/or changes to the existing output:

  • Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

  • Add any new Python packages to the METplus Components Python Requirements table.
  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s)
    Select: Organization level software support Project or Repository level development cycle Project
    Select: Milestone as the version that will include these changes
  • After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

…ifts Python version for building documentation from 3.8 to 3.10.
@bikegeek
Copy link
Collaborator

bikegeek commented Apr 5, 2024

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Co-authored-by: Julie Prestopnik <jpresto@ucar.edu>
@DanielAdriaansen
Copy link
Author

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Same, 1.26.4. I think this is coming from NaN getting into the calculation somehow. I can dig a bit further.

@DanielAdriaansen
Copy link
Author

DanielAdriaansen commented Apr 10, 2024

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Same, 1.26.4. I think this is coming from NaN getting into the calculation somehow. I can dig a bit further.

So the error is originating from this line:

soil_std = soil_data.std(dim='time')

The reason is because for the forecast data, the "Terrestrial Coupling Index" by definition is only computed on land and not over the ocean. The forecast data have NaN values for ocean grid cells, thus the variance formula in NumPy cannot compute the variance at those grid cells.

What I don't understand, is Xarray's std() function has an option called skipna, which I set to True but I still get the warning: https://xarray.pydata.org/en/v2024.02.0/generated/xarray.DataArray.std.html#xarray.DataArray.std. HOWEVER. If I set skipna=False, the warning disappears? This seems odd to me...almost like a bug (but I doubt it is) where True uses NaN and False skips them but I would assume the opposite to be true. I am going to do some more testing.

After digging a bit more (numpy/numpy#15219), it seems that NumPy nanvar() operates on "slices" (which I do not understand how they are created). When some points are NaN in a "slice", then there may not be enough remaining points to compute the variance, and thus the warning is raised.

@DanielAdriaansen
Copy link
Author

DanielAdriaansen commented Apr 11, 2024

I ran the test and I see it passes, but there is a warning:

/Volumes/d1/minnawin/miniforge3/envs/mp_analysis_env/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

I'm using numpy version 1.26.4. What version are you using?

Same, 1.26.4. I think this is coming from NaN getting into the calculation somehow. I can dig a bit further.

So the error is originating from this line:

soil_std = soil_data.std(dim='time')

The reason is because for the forecast data, the "Terrestrial Coupling Index" by definition is only computed on land and not over the ocean. The forecast data have NaN values for ocean grid cells, thus the variance formula in NumPy cannot compute the variance at those grid cells.

What I don't understand, is Xarray's std() function has an option called skipna, which I set to True but I still get the warning: https://xarray.pydata.org/en/v2024.02.0/generated/xarray.DataArray.std.html#xarray.DataArray.std. HOWEVER. If I set skipna=False, the warning disappears? This seems odd to me...almost like a bug (but I doubt it is) where True uses NaN and False skips them but I would assume the opposite to be true. I am going to do some more testing.

After digging a bit more (numpy/numpy#15219), it seems that NumPy nanvar() operates on "slices" (which I do not understand how they are created). When some points are NaN in a "slice", then there may not be enough remaining points to compute the variance, and thus the warning is raised.

OK, I think I understand. Consider these examples:

few_nan = np.array([np.nan, 1, 2, np.nan, 3])
da_few = xr.DataArray(few_nan)
print(da_few.std(skipna=True)
print(da_few.std(skipna=False)
<xarray.DataArray ()> Size: 8B
array(nan)
<xarray.DataArray ()> Size: 8B
array(0.81649658)
all_nan = np.array([np.nan,np.nan,np.nan,np.nan,np.nan])
da_all = xr.DataArray(all_nan)
print(da_all.std(skipna=True))
print(da_all.std(skipna=False))
/home/dadriaan/.conda/envs/metplus_docs/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
<xarray.DataArray ()> Size: 8B
array(nan)
<xarray.DataArray ()> Size: 8B
array(nan)

The all_nan case is where our error is coming from. In this case, I think the RuntimeWarning is justifiable and we should leave skipna=True. Setting skipna=False would skip valid values in the event they were mixed with NaN on a "slice". For this case, I don't think that would occur, but a user may have different data.

Another option is to make another argument to calc_tci() about how to handle NaN. I think this may be worthwhile. Do we need a test for cases with NaN and without @bikegeek?

@DanielAdriaansen
Copy link
Author

Remaining questions for @bikegeek:

  1. Docstring style for functions? Should we switch to Google or NumPy docstrings?
  2. In METcalcpy/test, instead of test_calc_tci.py, should we make a single file called test_land_surface.py, and in there it tests all functions defined in metcalcpy/diagnostics/land_surface.py? I worry if we end up with 100 functions in land_surface.py, we will have 100 test_*.py files, but maybe they should be contained in a general test_land_surface.py file?

@DanielAdriaansen DanielAdriaansen linked an issue Apr 11, 2024 that may be closed by this pull request
22 tasks
@bikegeek
Copy link
Collaborator

bikegeek commented Apr 11, 2024 via email

@DanielAdriaansen DanielAdriaansen marked this pull request as ready for review April 12, 2024 17:01
@DanielAdriaansen
Copy link
Author

This is ready for final review. Thanks all for the feedback so far!

Copy link
Collaborator

@bikegeek bikegeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydocs in the User's Guide is extremely helpful. Thanks for getting this in.

@bikegeek bikegeek merged commit c7709a4 into develop Apr 12, 2024
8 checks passed
@bikegeek bikegeek deleted the feature_364_add_TCI_calc branch April 12, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add calculation for Terrestrial Coupling Index (TCI)
3 participants