Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 1949 cf netcdf documentaton #1951

Merged
merged 5 commits into from
Nov 15, 2021
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions met/docs/Users_Guide/data_io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,118 @@ Input point observation files in PrepBUFR format are available through NCEP. The

Tropical cyclone forecasts and observations are typically provided in a specific ATCF (Automated Tropical Cyclone Forecasting) ASCII format, in A-deck, B-deck, and E-deck files.

Requirements for CF Compliant NetCDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The MET tools use following attributes and variables for input CF Compliant NetCDF data.

1. The global attribute "Conventions".

2. The ""`standard_name <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#standard-name>`_" and "`units <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#units>`_" attributes for coordinate variables. The "`axis <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#time-axis-ex>`_" attribute ("T" or "time") must exist as the time variable if the "standard_name" attribute does not exist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. The ""`standard_name <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#standard-name>`_" and "`units <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#units>`_" attributes for coordinate variables. The "`axis <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#time-axis-ex>`_" attribute ("T" or "time") must exist as the time variable if the "standard_name" attribute does not exist.
2. The "`standard_name <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#standard-name>`_" and "`units <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#units>`_" attributes for coordinate variables. The "`axis <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#time-axis-ex>`_" attribute ("T" or "time") must exist as the time variable if the "standard_name" attribute does not exist.


3. The "`coordinates <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#coordinate-types>`_" attribute for the data variables. It contains the coordinate variable names.

4. The "`grid_mapping <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#appendix-grid-mappings>`_" attribute for the data variables for projections and the matching grid mapping variable (optional for the latitude_longitude projection).

5. The gridded data should be evenly spaced horizontally and vertically.

6. (Optional) the "`forecast_reference_time <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#scalar-coordinate-variables>`_" variable for init_time.

MET processes the CF-Compliant gridded NetCDF files with the projection information. The CF-Compliant NetCDF is defined by the global attribute "Conventions" whose value begins with "CF-" ("CF-<Version_number>"). The global attribute "Conventions" is mandatory. MET accepts the variation of this attribute ("conventions" and "CONVENTIONS"). The value should be started with "CF-" and followed by the version number. MET accepts the attribute value that begins with "CF " ("CF" and a space instead of a hyphen) or "COARDS".

The grid mapping variable contains the projection information. The grid mapping variable can be found by looking at the variable attribute "grid_mapping" from the data variables. The "standard_name" attribute is used to filter out the coordinate variables like time, latitude, and longitude variables. The value of the "grid_mapping" attribute is the name of the grid mapping variable. Four projections are supported with grid mapping variables: latitude_longitude, lambert_conformal_conic, polar_stereographic, and geostationary. In case of the latitude_longitude projection, the latitude and longitude variable names should be the same as the dimension names and the "units" attribute should be valid.

Here are examples for the grid mapping variable ("edr" is the data variable):

**Example 1: grid mapping for latitude_longitude projection**

.. code-block:: none

float edr(time, z, lat, lon) ;
edr:units = "m^(2/3) s^-1" ;
edr:long_name = "Median eddy dissipation rate" ;
edr:coordinates = "lat lon" ;
edr:_FillValue = -9999.f ;
edr:grid_mapping = "grid_mapping" ;
int grid_mapping ;
grid_mapping:grid_mapping_name = "latitude_longitude" ;
grid_mapping:semi_major_axis = 6371000. ;
grid_mapping:inverse_flattening = 0 ;


**Example 2: grid mapping for lambert_conformal_conic projection**

.. code-block:: none

float edr(time, z, y, x) ;
edr:units = "m^(2/3) s^-1" ;
edr:long_name = "Eddy dissipation rate" ;
edr:coordinates = "lat lon" ;
edr:_FillValue = -9999.f ;
edr:grid_mapping = "grid_mapping" ;
int grid_mapping ;
grid_mapping:grid_mapping_name = "lambert_conformal_conic" ;
grid_mapping:standard_parallel = 25. ;
grid_mapping:longitude_of_central_meridian = -95. ;
grid_mapping:latitude_of_projection_origin = 25. ;
grid_mapping:false_easting = 0 ;
grid_mapping:false_northing = 0 ;
grid_mapping:GRIB_earth_shape = "spherical" ;
grid_mapping:GRIB_earth_shape_code = 0 ;

When the grid mapping variable is not available, MET detects the latitude_longitude projection in following order:

1. the lat/lon projection from the dimensions

2. the lat/lon projection from the "coordinates" attribute from the data variable

3. the lat/lon projection from the latitude and longitude variables by the "standard_name" attribute

MET is looking for variables with the same name as the dimension and checking the "units" attribute to find the latitude and longitude variables. The valid "units" strings are listed in the table below. MET accepts the variable "tlat" and "tlon" if the dimension names are "nlat" and "nlon”.

If there are no latitude and longitude variables from dimensions, MET gets coordinate variable names from the "coordinates" attribute. The matching coordinate variables should have the proper "units" attribute.

MET gets the time, latitude, and longitude variables by looking at the standard name: "time", "latitude", and "longitude" as the last option.

MET gets the valid time from the time variable and the "forecast_reference_time" variable for the init_time. If the time variable does not exist, it can come from the file name. MET supports only two cases:

1. TRMM_3B42_3hourly_filename (3B42.<yyyymmdd>.<hh>.7.G3.nc)

2. TRMM_3B42_daily_filename (3B42_daily.<yyyy>.<mm>.<dd>.7.G3.nc)

.. list-table:: Valid strings for the "units" attribute.
:widths: auto
:header-rows: 1

* - time
- latitude
- longitude
* - "seconds since YYYY-MM-DD HH:MM:SS",
"minutes since YYYY-MM-DD HH:MM:SS",
"hours since YYYY-MM-DD HH:MM:SS",
"days since YYYY-MM-DD HH:MM:SS",
Accepts "Y", "YY", "YYY", "M", "D", "HH", and "HH:MM".
"HH:MM:SS" is optional
- "degrees_north",
"degree_north",
"degree_N",
"degrees_N",
"degreeN",
"degreesN"
- "degrees_east",
"degree_east",
"degree_E",
"degrees_E",
"degreeE",
"degreesE"

Performance with NetCDF input data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is no limitation on the NetCDF file size. The size of the data variables matters than the file size. NetCDF API loads the meta data first on opening the NetCDF. It's the similar on accesing data variables. It's two API calls: getting the meta data and getting the actual data. The memory is allocated and consumed at the second API call (getting actual data).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is no limitation on the NetCDF file size. The size of the data variables matters than the file size. NetCDF API loads the meta data first on opening the NetCDF. It's the similar on accesing data variables. It's two API calls: getting the meta data and getting the actual data. The memory is allocated and consumed at the second API call (getting actual data).
There is no limitation on the NetCDF file size. The size of the data variables matters more than the file size. The NetCDF API loads the metadata first upon opening the NetCDF file. It's similar for accessing data variables. There are two API calls: getting the metadata and getting the actual data. The memory is allocated and consumed at the second API call (getting the actual data).


The dimensions of the data variables matter. MET requests the NetCDF data need based: 1) 1oading and processing a data plane, and 2) loading and processing the next data plane. This means an extra step for slicing with one more dimension NetCDF input data. The performance is quite different if the compression is enabled with high resolution data. NetCDF does compression per variable. The variables can have different compression levels (0 to 9, 0 is not compressed). The number of decompression is the same between 1 more and 1 less dimension NetCDF input files (combined VS separated). The difference is the amount of data to be decompressed which requires more memory. For example, let's assume the time dimension is 30. NetCDF data with 1 less dimension (no time dimension) does decompression 30 times for nx by ny dataset. NetCDF with 1 more dimension does compression 30 times for 30 by nx by ny dataset & slicing for target time offset. So it's better to have multiple NetCDF files with 1 less dimension than a big file with bigger variable data if compressed. If the compression is not enabled, the file size will be much bigger (more disk space).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The dimensions of the data variables matter. MET requests the NetCDF data need based: 1) 1oading and processing a data plane, and 2) loading and processing the next data plane. This means an extra step for slicing with one more dimension NetCDF input data. The performance is quite different if the compression is enabled with high resolution data. NetCDF does compression per variable. The variables can have different compression levels (0 to 9, 0 is not compressed). The number of decompression is the same between 1 more and 1 less dimension NetCDF input files (combined VS separated). The difference is the amount of data to be decompressed which requires more memory. For example, let's assume the time dimension is 30. NetCDF data with 1 less dimension (no time dimension) does decompression 30 times for nx by ny dataset. NetCDF with 1 more dimension does compression 30 times for 30 by nx by ny dataset & slicing for target time offset. So it's better to have multiple NetCDF files with 1 less dimension than a big file with bigger variable data if compressed. If the compression is not enabled, the file size will be much bigger (more disk space).
The dimensions of the data variables matter. MET requests the NetCDF data needs based on: 1) loading and processing a data plane, and 2) loading and processing the next data plane. This means an extra step for slicing with one more dimension in the NetCDF input data. The performance is quite different if the compression is enabled with high resolution data. NetCDF does compression per variable. The variables can have different compression levels (0 to 9). A value of 0 means no compression, and 9 is the highest level of compression possible. The number for decompression is the same between one more and one less dimension NetCDF input files (combined VS separated). The difference is the amount of data to be decompressed which requires more memory. For example, let's assume the time dimension is 30. NetCDF data with one less dimension (no time dimension) does decompression 30 times for nx by ny dataset. NetCDF with one more dimension does compression 30 times for 30 by nx by ny dataset and slicing for target time offset. So it's better to have multiple NetCDF files with one less dimension than a big file with bigger variable data if compressed. If the compression is not enabled, the file size will be much bigger requiring more disk space.


.. _Intermediate data formats:

Intermediate data formats
Expand Down