Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature #2525 fill value at dataplane #2557

Merged
merged 18 commits into from
Jun 13, 2023

Conversation

hsoh-u
Copy link
Collaborator

@hsoh-u hsoh-u commented Jun 6, 2023

Expected Differences

The python embedding for the dataplane supports user-defined fill_value, min_value and max_value. MET does post processing for MET data from the user's python script. This is an internal processing by MET. The same behavior is expected without setting "fill_value", "min_value", or "max_value" attribute setting. One exception is the NaN or Inf values from user's python script. They will be converted to MET missing value (-9999.0).

  • Do these changes introduce new tools, command line arguments, or configuration file options? [No]

    If yes, please describe:

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

    If yes, please describe:

Pull Request Testing

  • Describe testing already performed for these changes:

Run the following command with some changed at : /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/examples/read_ascii_numpy.py and /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/examples/read_ascii_xarraypy

export MET_PYTHON_EXE=$(which python3)

./plot_data_plane PYTHON_NUMPY python_numpy_tmp_nc.ps 'name="/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/examples/read_ascii_numpy.py letter.txt LETTER";' -plot_range 0.0 255.0 -title "Python enabled numpy plot_data_plane" -v 8

./plot_data_plane PYTHON_XARRAY python_xarray_tmp_nc.ps 'name="/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/examples/read_ascii_xarray.py /d1/projects/MET/MET_test_data/unit_test/python/letter.txt LETTER";' -plot_range 0.0 255.0 -title "Python enabled xarray plot_data_plane" -v 8

unset MET_PYTHON_EXE

./plot_data_plane PYTHON_NUMPY python_numpy.ps 'name="/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/examples/read_ascii_numpy.py letter.txt LETTER";' -plot_range 0.0 255.0 -title "Python enabled numpy plot_data_plane" -v 8

./plot_data_plane PYTHON_XARRAY python_xarray.ps 'name="/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/examples/read_ascii_xarray.py /d1/projects/MET/MET_test_data/unit_test/python/letter.txt LETTER";' -plot_range 0.0 255.0 -title "Python enabled xarray plot_data_plane" -v 8

Add following lines to scripts/python/examples/read_ascii_numpy.py or combination of three attributes

attrs['fill_value'] = 255  # for letter.txt
attrs['min_value'] = 212  # for letter.txt
attrs['max_value'] = 255  # for letter.txt

Add following lines to scripts/python/examples/read_ascii_xarray.py or combination of three attributes

ds.attrs['fill_value'] = 255
ds.attrs['min_value'] = 212
ds.attrs['max_value'] = 255
  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

Tes NaN or Inf cases if you have the sample data

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [No]

  • Do these changes include sufficient testing updates? [No]

  • Will this PR result in changes to the test suite? [No]

    If yes, describe the new output and/or changes to the existing output:

  • Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s)
    Select: Organization level software support Project or Repository level development cycle Project
    Select: Milestone as the version that will include these changes
  • After submitting the PR, select Development issue with the original issue number.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

@DanielAdriaansen
Copy link
Contributor

@hsoh-u is there a specific reason min_value and max_value support was added instead of only fill_value? I am just thinking of documentation and users and potential confusion.

I think fill_value is needed because there has to be some way for the user to tell MET what their fill_value is if it is not what MET expects. However, I feel that building in the ability to filter (exclude) data above max_value and below min_value is beyond the functionality we should provide for the user. If they want to threshold their data, I feel that they should do it on their own in their Python script, or via the MET config file for the tool they are using (or METplus wrappers).

Maybe there is a case I am not considering?

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 8, 2023

The min_value and max_value was added for plot_data_plane which does not have min/max value configuration. I think it's OK without it. Let me know this should be excluded.

@hsoh-u is there a specific reason min_value and max_value support was added instead of only fill_value? I am just thinking of documentation and users and potential confusion.

I think fill_value is needed because there has to be some way for the user to tell MET what their fill_value is if it is not what MET expects. However, I feel that building in the ability to filter (exclude) data above max_value and below min_value is beyond the functionality we should provide for the user. If they want to threshold their data, I feel that they should do it on their own in their Python script, or via the MET config file for the tool they are using (or METplus wrappers).

Maybe there is a case I am not considering?

@JohnHalleyGotway JohnHalleyGotway changed the title Feature 2525 fill value at dataplane Feature #2525 fill value at dataplane Jun 9, 2023
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jun 9, 2023

@hsoh-u sorry for the delay in responding on this. I agree with @DanielAdriaansen. I don't see the need for adding the min_value and max_value logic. Instead, I'd use the existing censor_thresh and censor_val config options already supported by MET to accomplish the same result, if needed.

I just tested to confirm that this does work with python embedding, and it does:

plot_data_plane \
PYTHON_NUMPY plot_censor.ps \
'name="read_CPC_binary.py gefs-legacy-00z_rfcst-cal_tmean_20170807_8-14day.bin 10"; censor_thresh=lt0.5; censor_val=0.0;'
Screen Shot 2023-06-09 at 3 24 40 PM

@hsoh-u, if I'm correct in assuming that censor_thresh/censor_val provides the same functionality as min_value/max_value, please remove the latter. And please LMK when you've done so so I can proceed with the MET-11.1.0-RC1 release... realistically, I probably can't cut it until Monday 6/12.

Thanks!

@DanielAdriaansen
Copy link
Contributor

DanielAdriaansen commented Jun 13, 2023

@hsoh-u can you rebuild MET on seneca here:
/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET

I am getting lots of " hs DEBUG" in my output but I don't see that in the source and I want to make sure I am using the latest compilation from this branch for testing. It looks like the binaries have a date of 6 June on them. Thanks.

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 13, 2023

@hsoh-u can you rebuild MET on seneca here: /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET

I am getting lots of " hs DEBUG" in my output but I don't see that in the source and I want to make sure I am using the latest compilation from this branch for testing. It looks like the binaries have a date of 6 June on them. Thanks.

It was re-built.

@DanielAdriaansen
Copy link
Contributor

DanielAdriaansen commented Jun 13, 2023

@hsoh-u for the MET_PYTHON_EXE case, if the user does not have a fill_value, what should they set in their attrs dictionary?

When I do this:
attrs['fill_value'] = None

I get this error:

DEBUG 3: Calling /home/dadriaan/.conda/envs/icing/bin/python3 to run user's python script (practice_gridded_pyembed).
DEBUG 4: Writing temporary Python dataplane file:
DEBUG 4: 	/home/dadriaan/.conda/envs/icing/bin/python3 /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py /tmp/tmp_met_nc_33916_0 practice_gridded_pyembed xarray
Python Script:	'/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py'
User Command:	'practice_gridded_pyembed xarray'
Temporary File:	'/tmp/tmp_met_nc_33916_0'
/d1/projects/METplus/METplus_Data/model_applications/short_range/brightness_temperature/2019_05_21_141/remap_GOES-16.20190521.010000.nc
Traceback (most recent call last):
  File "/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py", line 27, in <module>
    dataplane.write_dataplane(met_in, netcdf_filename)
  File "/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/met/dataplane.py", line 167, in write_dataplane
    setattr(ds, attr, attr_val)
  File "src/netCDF4/_netCDF4.pyx", line 2894, in netCDF4._netCDF4.Dataset.__setattr__
  File "src/netCDF4/_netCDF4.pyx", line 2824, in netCDF4._netCDF4.Dataset.setncattr
  File "src/netCDF4/_netCDF4.pyx", line 1651, in netCDF4._netCDF4._set_att
TypeError: illegal data type for attribute b'user_fill_value', must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got O
ERROR  : 
ERROR  : tmp_nc_dataplane() -> command "/home/dadriaan/.conda/envs/icing/bin/python3 /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py /tmp/tmp_met_nc_33916_0 practice_gridded_pyembed xarray" failed ... status = 256
ERROR  : 

Command on Seneca:
/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/bin/plot_data_plane PYTHON_XARRAY test.ps 'name="/home/dadriaan/projects/2525_missing_data_v2/practice_gridded_pyembed.py xarray";' -v 5

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 13, 2023

@hsoh-u for the MET_PYTHON_EXE case, if the user does not have a fill_value, what should they set in their attrs dictionary?

When I do this: attrs['fill_value'] = None

I get this error:

DEBUG 3: Calling /home/dadriaan/.conda/envs/icing/bin/python3 to run user's python script (practice_gridded_pyembed).
DEBUG 4: Writing temporary Python dataplane file:
DEBUG 4: 	/home/dadriaan/.conda/envs/icing/bin/python3 /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py /tmp/tmp_met_nc_33916_0 practice_gridded_pyembed xarray
Python Script:	'/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py'
User Command:	'practice_gridded_pyembed xarray'
Temporary File:	'/tmp/tmp_met_nc_33916_0'
/d1/projects/METplus/METplus_Data/model_applications/short_range/brightness_temperature/2019_05_21_141/remap_GOES-16.20190521.010000.nc
Traceback (most recent call last):
  File "/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py", line 27, in <module>
    dataplane.write_dataplane(met_in, netcdf_filename)
  File "/d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/met/dataplane.py", line 167, in write_dataplane
    setattr(ds, attr, attr_val)
  File "src/netCDF4/_netCDF4.pyx", line 2894, in netCDF4._netCDF4.Dataset.__setattr__
  File "src/netCDF4/_netCDF4.pyx", line 2824, in netCDF4._netCDF4.Dataset.setncattr
  File "src/netCDF4/_netCDF4.pyx", line 1651, in netCDF4._netCDF4._set_att
TypeError: illegal data type for attribute b'user_fill_value', must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got O
ERROR  : 
ERROR  : tmp_nc_dataplane() -> command "/home/dadriaan/.conda/envs/icing/bin/python3 /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/share/met/python/pyembed/write_tmp_dataplane.py /tmp/tmp_met_nc_33916_0 practice_gridded_pyembed xarray" failed ... status = 256
ERROR  : 

Command on Seneca: /d1/personal/hsoh/git/features/feature_2525_fill_value_at_dataplane/MET/bin/plot_data_plane PYTHON_XARRAY test.ps 'name="/home/dadriaan/projects/2525_missing_data_v2/practice_gridded_pyembed.py xarray";' -v 5

Python NetCDF API does not allow to set None to the global/variable attributes (both by using setattr or dictionatry). I will filter out attributes with None.
Here is a sample code to test

import netCDF4 as nc

ds = nc.Dataset('test_netcdf.nc', 'w')
setattr(ds, 'name_str', None)
#ds['name_str'] = None

@DanielAdriaansen
Copy link
Contributor

Thanks @hsoh-u.

For the purpose of documentation, can you please confirm that attrs['fill_value'] must be either:

  1. float
  2. int
  3. None

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 13, 2023

Yes, but no need to set the attribute to None (which is ignored).

@DanielAdriaansen
Copy link
Contributor

So in the case of no user fill value, should we encourage the user to not set attrs['fill_value']? Or would it be better to have them set it to None?

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 13, 2023

The fill_value is optional. My vote is not to set None.

@DanielAdriaansen
Copy link
Contributor

OK. The documentation currently says all attrs are required: https://met.readthedocs.io/en/feature_2525_fill_value_at_dataplane/Users_Guide/appendixF.html#required-attributes-for-2d-gridded-dataplanes.

I will add a new section for "Optional attributes for Dataplane" where fill_value will be.

@DanielAdriaansen
Copy link
Contributor

@hsoh-u can you please confirm values respected without setting attrs['fill_value']?

  1. NaN
  2. ??

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 13, 2023

NaN and Inf values are converted to -9999 with/without attrs['fill_value'].

@DanielAdriaansen
Copy link
Contributor

OK thanks. It also seems like -9999 and -9999.0 also work if either of those happen to be the user fill value (i.e. user_fill_value = met_fill_value). So -9999, -9999., Inf, NaN need no attrs['fill_value'] but if the user fill value is not in that list the user must supply their fill value in attrs.

@hsoh-u
Copy link
Collaborator Author

hsoh-u commented Jun 13, 2023

That's correct

Copy link
Contributor

@DanielAdriaansen DanielAdriaansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsoh-u this looks good and I ran several tests and was able to verify the behavior.

I need to add documentation now, but I made a few comments and suggestions to the code in the meantime while I add documentation.

scripts/python/met/dataplane.py Outdated Show resolved Hide resolved
scripts/python/met/dataplane.py Outdated Show resolved Hide resolved
scripts/python/met/dataplane.py Outdated Show resolved Hide resolved
scripts/python/met/dataplane.py Outdated Show resolved Hide resolved
@DanielAdriaansen
Copy link
Contributor

DanielAdriaansen commented Jun 13, 2023

@hsoh-u in the documentation here:
https://met.readthedocs.io/en/feature_2525_fill_value_at_dataplane/Users_Guide/appendixF.html#python-embedding-for-2d-gridded-dataplanes

I do not mention masked arrays. Is it true that a user can also supply NumPy.ma objects in addition to N-D Array's and Xarray DataArray's?

Or is it better to just tell the user to use N-D array or DataArray to avoid confusion since maybe MaskedArray is only used under the hood?

@DanielAdriaansen
Copy link
Contributor

I approve. I tested with various combinations of MET_PYTHON_EXE, PYTHON_NUMPY, PYTHON_XARRAY, MET fill values combined with user defined fill values, NaN with user defined fill values, user defined fill values only, etc. Thanks for making these changes @hsoh-u!

@hsoh-u hsoh-u merged commit 76770fa into develop Jun 13, 2023
@hsoh-u hsoh-u deleted the feature_2525_fill_value_at_dataplane branch June 13, 2023 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Bugfix: Fix the fill value setting used in the write_tmp_dataplane internal Python embedding script
3 participants