Skip to content

Commit

Permalink
Update develop-ref after #1688 and #1689 (#1690)
Browse files Browse the repository at this point in the history
* Per #1429, enhance error message from DataLine::get_item(). (#1682)

* Feature 1429 tc_log second try (#1686)

* Per #1429, enhance error message from DataLine::get_item().

* Per #1429, I realize that the line number actually is readily available in the DataLine class... so include it in the error message.

* Feature 1588 ps_log (#1687)

* Per #1588, updated pair_data_point.h/.cc to add detailed Debug(4) log messages, as specified in the GitHub issue. Do still need to test each of these cases to confirm that the log messages look good.

* Per #1588, switch very detailed interpolation details from debug level 4 to 5.

* Per #1588, remove the Debug(4) log message about duplicate obs since it's been moved up to a higher level.

* Per #1588, add/update detailed log messages when processing point observations for bad data, off the grid, bad topo, big topo diffs, bad fcst value, and duplicate obs.

* #1454 Disabled plot_data_plane_CESM_SSMI_microwave and plot_data_plane_CESM_sea_ice_nc becaues of not evenly spaced

* #1454 Moved NC attribute name to nc_utils.h

* #1454 Corrected sanity checking for lat/lon projection based on the percentage of the delta instead of fixed tolerance

* #1454 Corrected sanity checking for lat/lon projection based on the percentage of the delta instead of fixed tolerance

* #1454 Corrected data.delta_lon

* #1454 Change bact to use diff instead of absolute value of diff

* 454 Deleted instea dof commenting out

* 454 Deleted instea dof commenting out

* Feature 1684 bss and 1685 single reference model (#1689)

* Per #1684, move an instance of the ClimoCDFInfo class into PairBase. Also define derive_climo_vals() and derive_climo_prob() utility functions.

* Add to VxPairDataPoint and VxPairDataEnsemble functions to set the ClimoCDFInfo class.

* Per #1684, update ensemble_stat and point_stat to set the ClimoCDFInfo object based on the contents of the config file.

* Per #1684, update the vx_statistics library and stat_analysis to make calls to the new derive_climo_vals() and derive_climo_prob() functions.

* Per #1684, since cdf_info is a member of PairBase class, need to handle it in the PairDataPoint and PairDataEnsemble assignment and subsetting logic.

* Per #1684, during development, I ran across and then updated this log message.

* Per #1684, working on log messages and figured that the regridding climo data should be moved from Debug(1) to at least Debug(2).

* Per #1684 and #1685, update the logic for the derive_climo_vals() utility function. If only a single climo bin is requested, just return the climo mean. Otherwise, sample the requested number of values.

* Per #1684, just fixing the format of this log message.

* Per #1684, add a STATLine::get_offset() member function.

* Per #1684, update parse_orank_line() logic. Rather than calling NumArray::clear() call NumArray::erase() to preserve allocated memory. Also, instead of parsing ensemble member values by column name, parse them by offset number.

* Per #1684, call EnsemblePairData::extend() when parsing ORANK data to allocate one block of memory instead of bunches of litte ones.

* Per #1684 and #1685, add another call to Ensemble-Stat to test computing the CRPSCL_EMP from a single climo mean instead of using the full climo distribution.

* Per #1684 and #1685, update ensemble-stat docs about computing CRPSS_EMP relative to a single reference model.

* Per #1684, need to update Grid-Stat to store the climo cdf info in the PairDataPoint objects.

* Per #1684, remove debug print statements.

* Per #1684, need to set cdf_info when aggregating MPR lines in Stat-Analysis.

* Per #1684 and #1685, update PairDataEnsemble::compute_pair_vals() to print a log message indicating the climo data being used as reference:

For a climo distribution defined by mean and stdev:
DEBUG 3: Computing ensemble statistics relative to a 9-member climatological ensemble.

For a single deterministic reference:
DEBUG 3: Computing ensemble statistics relative to the climatological mean.

Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
  • Loading branch information
3 people committed Mar 2, 2021
1 parent 131a003 commit 66f7fb2
Show file tree
Hide file tree
Showing 31 changed files with 723 additions and 281 deletions.
6 changes: 5 additions & 1 deletion met/docs/Users_Guide/ensemble-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ The ranked probability score (RPS) is included in the Ranked Probability Score (
Climatology data
~~~~~~~~~~~~~~~~

The Ensemble-Stat output includes at least three statistics computed relative to external climatology data. The climatology is defined by mean and standard deviation fields, and both are required in the computation of ensemble skill score statistics. MET assumes that the climatology follows a normal distribution, defined by the mean and standard deviation at each point. When computing the CRPS skill score for (:ref:`Gneiting et al., 2004 <Gneiting-2004>`) the reference CRPS statistic is computed using the climatological mean and standard deviation directly. When computing the CRPS skill score for (:ref:`Hersbach, 2000 <Hersbach-2000>`) the reference CRPS statistic is computed by selecting equal-area-spaced values from the assumed normal climatological distribution. The number of points selected is determined by the *cdf_bins* setting in the *climo_cdf* dictionary. The reference CRPS is computed empirically from this ensemble of climatology values. The climatological distribution is also used for the RPSS. The forecast RPS statistic is computed from a probabilistic contingency table in which the probabilities are derived from the ensemble member values. In a simliar fashion, the climatogical probability for each observed value is derived from the climatological distribution. The area of the distribution to the left of the observed value is interpreted as the climatological probability. These climatological probabilities are also evaluated using a probabilistic contingency table from which the reference RPS score is computed. The skill scores are derived by comparing the forecast statistic to the reference climatology statistic.
The Ensemble-Stat output includes at least three statistics computed relative to external climatology data. The climatology is defined by mean and standard deviation fields, and typically both are required in the computation of ensemble skill score statistics. MET assumes that the climatology follows a normal distribution, defined by the mean and standard deviation at each point.

When computing the CRPS skill score for (:ref:`Gneiting et al., 2004 <Gneiting-2004>`) the reference CRPS statistic is computed using the climatological mean and standard deviation directly. When computing the CRPS skill score for (:ref:`Hersbach, 2000 <Hersbach-2000>`) the reference CRPS statistic is computed by selecting equal-area-spaced values from the assumed normal climatological distribution. The number of points selected is determined by the *cdf_bins* setting in the *climo_cdf* dictionary. The reference CRPS is computed empirically from this ensemble of climatology values. If the number bins is set to 1, the climatological CRPS is computed using only the climatological mean value. In this way, the empirical CRPSS may be computed relative to a single model rather than a climatological distribution.

The climatological distribution is also used for the RPSS. The forecast RPS statistic is computed from a probabilistic contingency table in which the probabilities are derived from the ensemble member values. In a simliar fashion, the climatogical probability for each observed value is derived from the climatological distribution. The area of the distribution to the left of the observed value is interpreted as the climatological probability. These climatological probabilities are also evaluated using a probabilistic contingency table from which the reference RPS score is computed. The skill scores are derived by comparing the forecast statistic to the reference climatology statistic.

Ensemble observation error
~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
16 changes: 12 additions & 4 deletions met/src/basic/vx_util/data_line.cc
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,12 @@ const char * DataLine::get_item(int k) const

if ( (k < 0) || (k >= N_items) ) {

mlog << Error << "\nDataLine::get_item(int) -> range check error\n\n";
ConcatString cs = (File ? File->filename() : "");

mlog << Error << "\nDataLine::get_item(int) -> "
<< "range check error reading line number " << LineNumber
<< ", item number " << k+1 << " of " << N_items
<< " from file \"" << cs << "\"\n\n";

exit ( 1 );

Expand Down Expand Up @@ -640,7 +645,8 @@ LineDataFile::LineDataFile(const LineDataFile &)

{

mlog << Error << "\nLineDataFile::LineDataFile(const LineDataFile &) -> should never be called!\n\n";
mlog << Error << "\nLineDataFile::LineDataFile(const LineDataFile &) -> "
<< "should never be called!\n\n";

exit ( 1 );

Expand All @@ -654,7 +660,8 @@ LineDataFile & LineDataFile::operator=(const LineDataFile &)

{

mlog << Error << "\nLineDataFile::operator=(const LineDataFile &) -> should never be called!\n\n";
mlog << Error << "\nLineDataFile::operator=(const LineDataFile &) -> "
<< "should never be called!\n\n";

exit ( 1 );

Expand Down Expand Up @@ -698,7 +705,8 @@ in = new ifstream;

if ( !in ) {

mlog << Error << "\nLineDataFile::open(const char *) -> can't allocate input stream\n\n";
mlog << Error << "\nLineDataFile::open(const char *) -> "
<< "can't allocate input stream\n\n";

exit ( 1 );

Expand Down
2 changes: 1 addition & 1 deletion met/src/basic/vx_util/interp_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -686,7 +686,7 @@ double interp_geog_match(const DataPlane &dp, const GridTemplate &gt,
}

if(!is_bad_data(interp_v)) {
mlog << Debug(4)
mlog << Debug(5)
<< "For observation value " << obs_v << " at grid (x, y) = ("
<< obs_x << ", " << obs_y << ") found forecast value "
<< interp_v << " at nearest matching geography point ("
Expand Down
16 changes: 14 additions & 2 deletions met/src/libcode/vx_analysis_util/stat_line.cc
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,18 @@ bool STATLine::has(const char *col_str) const

{

return ( !is_bad_data(get_offset(col_str)) );

}


////////////////////////////////////////////////////////////////////////


int STATLine::get_offset(const char *col_str) const

{

int offset = bad_data_int;
int dim = bad_data_int;

Expand All @@ -353,10 +365,10 @@ if ( is_bad_data(offset) ) {
}

//
// Return whether a valid offset value was found
// Return the offset value
//

return ( !is_bad_data(offset) );
return ( offset );

}

Expand Down
9 changes: 5 additions & 4 deletions met/src/libcode/vx_analysis_util/stat_line.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ class STATLine : public DataLine {
// retrieve values of the header columns
//

bool has (const char *) const;
ConcatString get (const char *, bool check_na = true) const;
const char * get_item (const char *, bool check_na = true) const;
const char * get_item (int, bool check_na = true) const;
bool has (const char *) const;
int get_offset(const char *) const;
ConcatString get (const char *, bool check_na = true) const;
const char * get_item (const char *, bool check_na = true) const;
const char * get_item (int, bool check_na = true) const;

const char * version () const;
const char * model () const;
Expand Down
5 changes: 0 additions & 5 deletions met/src/libcode/vx_data2d_nc_met/met_file.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,6 @@ static const string valid_time_ut_att_name = "valid_time_ut";
static const string init_time_ut_att_name = "init_time_ut";
static const string accum_time_att_name = "accum_time_sec";

static const string name_att_name = "name";
static const string long_name_att_name = "long_name";
static const string level_att_name = "level";
static const string units_att_name = "units";

static const int max_met_args = 30;

////////////////////////////////////////////////////////////////////////
Expand Down
3 changes: 0 additions & 3 deletions met/src/libcode/vx_data2d_nc_pinterp/pinterp_file.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,6 @@ static const char hpa_units_str [] = "hPa";

static const string init_time_att_name = "START_DATE";

static const string description_att_name = "description";
static const string units_att_name = "units";

static const int max_pinterp_args = 30;

static const double pinterp_missing = 1.0e35;
Expand Down
102 changes: 54 additions & 48 deletions met/src/libcode/vx_data2d_nccf/nccf_file.cc
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ using namespace std;
////////////////////////////////////////////////////////////////////////


static const int max_met_args = 30;
static const int max_met_args = 30;
static const double DELTA_TOLERANCE_PERCENT = 0.05;

const double NcCfFile::DELTA_TOLERANCE = 15.0;

Expand Down Expand Up @@ -2870,40 +2871,40 @@ void NcCfFile::get_grid_from_lat_lon_vars(NcVar *lat_var, NcVar *lon_var,

// Figure out the dlat/dlon values from the dimension variables

double lat_values[lat_counts];
double lon_values[lon_counts];

bool two_dim_corrd = false;
long x_size = get_data_size(lon_var);
long y_size = get_data_size(lat_var);
long latlon_counts = lon_counts*lat_counts;
if( x_size == latlon_counts || y_size == latlon_counts ) two_dim_corrd = true;
else if( x_size != lon_counts || y_size != lat_counts)
bool two_dim_corrd = (x_size == latlon_counts) && (y_size == latlon_counts );
if( !two_dim_corrd && (x_size != lon_counts || y_size != lat_counts))
{
mlog << Error << "\n" << method_name << " -> "
<< "Coordinate variables don't match dimension sizes in netCDF file.\n\n";
exit(1);
}

double lat_values[lat_counts];
double lon_values[lon_counts];
bool lat_first = false;
if (two_dim_corrd) {
lat_first = (lat_counts == get_dim_size(lat_var, 0));
long cur[2], length[2];
for (int i=0; i<2; i++) {
cur[i] = 0;
length[i] = 1;
}
length[0] = lat_counts;
cur[0] = cur[1] = 0;
length[0] = length[1] = 1;
if (lat_first) length[0] = lat_counts;
else length[1] = lat_counts;
get_nc_data(lat_var,lat_values, length, cur);
length[1] = lon_counts;
length[0] = 1;

length[0] = length[1] = 1;
if (lat_first) length[1] = lon_counts;
else length[0] = lon_counts;
get_nc_data(lon_var,lon_values, length, cur);
}
else {
get_nc_data(lat_var,lat_values);
get_nc_data(lon_var,lon_values);
}

// Calculate dlat and dlon assuming they are constant. MET requires that
// dlat be equal to dlon
// Calculate dlat and dlon assuming they are constant.

double dlat = lat_values[1] - lat_values[0];
double dlon = rescale_lon(lon_values[1] - lon_values[0]);
Expand All @@ -2919,53 +2920,58 @@ void NcCfFile::get_grid_from_lat_lon_vars(NcVar *lat_var, NcVar *lon_var,
// entire grid. CF compliancy doesn't require this, but MET does.

if (!skip_sanity_check) {
double degree_tolerance;
float lat_missing_value = bad_data_double;
float lon_missing_value = bad_data_double;
NcVarAtt *missing_value_att = (NcVarAtt*) 0;
missing_value_att = get_nc_att(lat_var, (string)"_FillValue");
if (IS_VALID_NC_P(missing_value_att)) {
lat_missing_value = get_att_value_double(missing_value_att);
}
if( missing_value_att ) delete missing_value_att;
missing_value_att = get_nc_att(lon_var, (string)"_FillValue");
if (IS_VALID_NC_P(missing_value_att)) {
lon_missing_value = get_att_value_double(missing_value_att);
}
if( missing_value_att ) delete missing_value_att;
bool sanity_check_failed = false;
get_var_att_float(lat_var, fill_value_att_name, lat_missing_value);
get_var_att_float(lon_var, fill_value_att_name, lon_missing_value);

if (fabs(dlat - dlon) > DELTA_TOLERANCE)
{
mlog << Error << "\n" << method_name << " -> "
<< "MET can only process Latitude/Longitude files where the delta lat and delta lon are the same\n\n";
exit(1);
}

degree_tolerance = fabs(dlat * DELTA_TOLERANCE_PERCENT);
for (int i = 1; i < (int)lat_counts; ++i)
{
if ((fabs(lat_missing_value - lat_values[i]) < DELTA_TOLERANCE) ||
(fabs(lat_missing_value - lat_values[i-1]) < DELTA_TOLERANCE)) continue;
if ((fabs(lat_missing_value - lat_values[i]) < degree_tolerance) ||
(fabs(lat_missing_value - lat_values[i-1]) < degree_tolerance)) continue;
double curr_delta = lat_values[i] - lat_values[i-1];
if (fabs(curr_delta - dlat) > DELTA_TOLERANCE)
if (fabs(curr_delta - dlat) > degree_tolerance)
{
mlog << Debug(4) << method_name << " -> lat["
<< i-1 << "]=" << lat_values[i-1] << " lat["
<< i << "]=" << lat_values[i] << " "
<< fabs(curr_delta - dlat) << " > " << degree_tolerance << "\n";
mlog << Error << "\n" << method_name << " -> "
<< "MET can only process Latitude/Longitude files where the lat delta is constant (dlat="
<< dlat <<", dlon=" << dlon << ")\n\n";
exit(1);
<< "MET can only process Latitude/Longitude files where the latitudes are evenly spaced (dlat="
<< dlat <<", delta[" << i << "]=" << curr_delta << ")\n\n";
sanity_check_failed = true;
break;
}
}


degree_tolerance = fabs(dlon * DELTA_TOLERANCE_PERCENT);
for (int i = 1; i < (int)lon_counts; ++i)
{
if ((fabs(lon_missing_value - lon_values[i]) < DELTA_TOLERANCE) ||
(fabs(lon_missing_value - lon_values[i-1]) < DELTA_TOLERANCE)) continue;
if ((fabs(lon_missing_value - lon_values[i]) < degree_tolerance) ||
(fabs(lon_missing_value - lon_values[i-1]) < degree_tolerance)) continue;
double curr_delta = rescale_lon(lon_values[i] - lon_values[i-1]);
if (fabs(curr_delta - dlon) > DELTA_TOLERANCE)
if (fabs(curr_delta - dlon) > degree_tolerance)
{
mlog << Debug(4) << method_name << " -> lon["
<< i-1 << "]=" << lon_values[i-1] << " lon["
<< i << "]=" << lon_values[i] << " "
<< fabs(curr_delta - dlon) << " > " << degree_tolerance << "\n";
mlog << Error << "\n" << method_name << " -> "
<< "MET can only process Latitude/Longitude files where the lon delta is constant\n\n";
exit(1);
<< "MET can only process Latitude/Longitude files where the longitudes are evenly spaced (dlon="
<< dlon <<", delta[" << i << "]=" << curr_delta << ")\n\n";
sanity_check_failed = true;
break;
}
}

if (sanity_check_failed) {
mlog << Error << "\n" << method_name << " -> "
<< "Please check the input data is the lat/lon projection\n\n";
exit(1);
}
}

// Fill in the data structure. Remember to negate the longitude
Expand All @@ -2991,9 +2997,9 @@ void NcCfFile::get_grid_from_lat_lon_vars(NcVar *lat_var, NcVar *lon_var,
data.lat_ll = lat_values[lat_counts-1];
}

grid.set(data);
grid.set(data); // resets swap_to_north to false
if (dlat < 0) grid.set_swap_to_north(true);

}

////////////////////////////////////////////////////////////////////////
5 changes: 0 additions & 5 deletions met/src/libcode/vx_nc_util/nc_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,6 @@ using namespace netCDF::exceptions;

////////////////////////////////////////////////////////////////////////

static const string level_att_name = "level";
static const string units_att_name = "units";
static const string missing_value_att_name = "missing_value";
static const string fill_value_att_name = "_FillValue";

////////////////////////////////////////////////////////////////////////

void patch_nc_name(string *var_name) {
Expand Down
9 changes: 9 additions & 0 deletions met/src/libcode/vx_nc_util/nc_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,15 @@ static const string nc_att_use_var_id = "use_var_id";
static const char nc_att_obs_version[] = "MET_Obs_version";
static const char nc_att_met_point_nccf[] = "MET_point_NCCF";

static const string description_att_name = "description";
static const string fill_value_att_name = "_FillValue";
static const string level_att_name = "level";
static const string long_name_att_name = "long_name";
static const string missing_value_att_name = "missing_value";
static const string name_att_name = "name";
static const string units_att_name = "units";


static const char nc_time_unit_exp[] = "^[a-z|A-Z]* since [0-9]\\{1,4\\}-[0-9]\\{1,2\\}-[0-9]\\{1,2\\}";

static const char MET_NC_Obs_ver_1_2[] = "1.02";
Expand Down
3 changes: 2 additions & 1 deletion met/src/libcode/vx_statistics/compute_stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -743,7 +743,8 @@ void compute_pctinfo(const PairDataPoint &pd, bool pstd_flag,
// Use input climatological probabilities or derive them
if(cmn_flag) {
if(cprob_in) climo_prob = *cprob_in;
else climo_prob = derive_climo_prob(pd.cmn_na, pd.csd_na,
else climo_prob = derive_climo_prob(pd.cdf_info,
pd.cmn_na, pd.csd_na,
pct_info.othresh);
}

Expand Down
11 changes: 6 additions & 5 deletions met/src/libcode/vx_statistics/ens_stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -484,10 +484,10 @@ void RPSInfo::set(const PairDataEnsemble &pd) {
// Check that thresholds are actually defined
if(fthresh.n() == 0) {
mlog << Error << "\nRPSInfo::set(const PairDataEnsemble &) -> "
<< "no thresholds provided to compute the RPS line type! "
<< "Specify thresholds using the \""
<< conf_key_prob_cat_thresh
<< "\" configuration file option.\n\n";
<< "no thresholds provided to compute the RPS line type!\n"
<< "Specify thresholds using the \"" << conf_key_prob_cat_thresh
<< "\" configuration file option or by providing climatological "
<< "mean and standard deviation data.\n\n";
exit(1);
}

Expand Down Expand Up @@ -522,7 +522,8 @@ void RPSInfo::set(const PairDataEnsemble &pd) {
climo_pct.zero_out();

// Derive climatological probabilities
if(cmn_flag) climo_prob = derive_climo_prob(pd.cmn_na, pd.csd_na,
if(cmn_flag) climo_prob = derive_climo_prob(pd.cdf_info,
pd.cmn_na, pd.csd_na,
fthresh[i]);

// Loop over the observations
Expand Down
Loading

0 comments on commit 66f7fb2

Please sign in to comment.