From c6b4cab1d995cf228ddbd82db363a2120ae72d9e Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Mon, 29 Jul 2024 16:58:28 -0600 Subject: [PATCH 01/41] Per #1371, add -input command line argument and add support for ALL for the CTC, MCTC, SL1L2, and PCT line types. --- docs/Users_Guide/series-analysis.rst | 15 ++- src/libcode/vx_stat_out/stat_columns.cc | 110 +++++++++++------- src/libcode/vx_stat_out/stat_columns.h | 3 + .../core/series_analysis/series_analysis.cc | 65 +++++++++++ .../core/series_analysis/series_analysis.h | 11 +- 5 files changed, 152 insertions(+), 52 deletions(-) diff --git a/docs/Users_Guide/series-analysis.rst b/docs/Users_Guide/series-analysis.rst index 0be681585f..b54bf81cc9 100644 --- a/docs/Users_Guide/series-analysis.rst +++ b/docs/Users_Guide/series-analysis.rst @@ -33,6 +33,7 @@ The usage statement for the Series-Analysis tool is shown below: -fcst file_1 ... file_n | fcst_file_list -obs file_1 ... file_n | obs_file_list [-both file_1 ... file_n | both_file_list] + [-input input_file] [-paired] -out file -config file @@ -58,13 +59,17 @@ Optional Arguments for series_analysis 5. To set both the forecast and observations to the same set of files, use the optional -both file_1 ... file_n | both_file_list option to the same set of files. This is useful when reading the NetCDF matched pair output of the Grid-Stat tool which contains both forecast and observation data. -6. The -paired option indicates that the -fcst and -obs file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When -paired is not used, missing or incomplete files result in a runtime error with no output file being created. When -paired is used, missing or incomplete files result in a warning with output being created using the available data. +6. The -input option specifies the path to an existing Series-Analysis output file. When computing statistics for the data provided, Series-Analysis intializes the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) data using the input_file data. -7. The -log file outputs log messages to the specified file. +.. note:: When the -input option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. -8. The -v level overrides the default level of logging (2). +7. The -paired option indicates that the -fcst and -obs file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When -paired is not used, missing or incomplete files result in a runtime error with no output file being created. When -paired is used, missing or incomplete files result in a warning with output being created using the available data. -9. The -compress level option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression. +8. The -log file outputs log messages to the specified file. + +9. The -v level overrides the default level of logging (2). + +10. The -compress level option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression. An example of the series_analysis calling sequence is shown below: @@ -179,3 +184,5 @@ The output_stats array controls the type of output that the Series-Analysis tool 11. PJC for Joint and Conditional factorization for Probabilistic forecasts (See :numref:`table_PS_format_info_PJC`) 12. PRC for Receiver Operating Characteristic for Probabilistic forecasts (See :numref:`table_PS_format_info_PRC`) + +.. note:: When the -input option is used, all partial sum and contingency table count columns are required to aggregate statistics across multiple runs. To facilitate this, the output_stats entries for the CTC, SL1L2, SAL1L2, and PCT line types can be set to "ALL" to indicate that all available columns for those line types should be written. diff --git a/src/libcode/vx_stat_out/stat_columns.cc b/src/libcode/vx_stat_out/stat_columns.cc index 0bd1c9393a..b60d265ef9 100644 --- a/src/libcode/vx_stat_out/stat_columns.cc +++ b/src/libcode/vx_stat_out/stat_columns.cc @@ -122,32 +122,18 @@ void write_header_row(const char * const * cols, int n_cols, int hdr_flag, void write_mctc_header_row(int hdr_flag, int n_cat, AsciiTable &at, int r, int c) { - int i, j, col; - ConcatString cs; // Write the header column names if requested if(hdr_flag) { - for(i=0; i= 1) { - tmp_str.format("%s%i", pct_columns[2], n_thresh); - at.set_entry(r, col, tmp_str); // Threshold - } + StringArray sa(get_pct_columns(n_thresh)); + for(int i=0; i= 1) { + cs.format("%s%i", pct_columns[2], n_thresh); + sa.add(cs); + } + + return sa; +} + + //////////////////////////////////////////////////////////////////////// void write_fho_row(StatHdrColumns &shc, const CTSInfo &cts_info, diff --git a/src/libcode/vx_stat_out/stat_columns.h b/src/libcode/vx_stat_out/stat_columns.h index 8af99ad730..7eb21c6fa0 100644 --- a/src/libcode/vx_stat_out/stat_columns.h +++ b/src/libcode/vx_stat_out/stat_columns.h @@ -49,6 +49,9 @@ extern void write_phist_header_row (int, int, AsciiTable &, int, int); extern void write_orank_header_row (int, int, AsciiTable &, int, int); extern void write_relp_header_row (int, int, AsciiTable &, int, int); +extern StringArray get_mctc_columns (int); +extern StringArray get_pct_columns (int); + extern void write_fho_row (StatHdrColumns &, const CTSInfo &, STATOutputType, AsciiTable &, int &, AsciiTable &, int &); extern void write_ctc_row (StatHdrColumns &, const CTSInfo &, STATOutputType, diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index ebbb43e27a..d4292e74fc 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -35,6 +35,7 @@ // 014 07/06/22 Howard Soh METplus-Internal #19 Rename main to met_main // 015 10/03/22 Presotpnik MET #2227 Remove namespace netCDF from header files // 016 01/29/24 Halley Gotway MET #2801 Configure time difference warnings +// 017 07/26/24 Halley Gotway MET #1371 Aggregate statistics through time // //////////////////////////////////////////////////////////////////////// @@ -103,6 +104,11 @@ static void store_stat_pstd (int, const ConcatString &, const PCTInfo &); static void store_stat_pjc (int, const ConcatString &, const PCTInfo &); static void store_stat_prc (int, const ConcatString &, const PCTInfo &); +static void store_stat_all_ctc (int, const CTSInfo &); +static void store_stat_all_mctc (int, const MCTSInfo &); +static void store_stat_all_sl1l2(int, const SL1L2Info &); +static void store_stat_all_pct (int, const PCTInfo &); + static void setup_nc_file(const VarInfo *, const VarInfo *); static void add_nc_var(const ConcatString &, const ConcatString &, const ConcatString &, const ConcatString &, @@ -118,6 +124,7 @@ static void usage(); static void set_fcst_files(const StringArray &); static void set_obs_files(const StringArray &); static void set_both_files(const StringArray &); +static void set_input(const StringArray &); static void set_paired(const StringArray &); static void set_out_file(const StringArray &); static void set_config_file(const StringArray &); @@ -167,6 +174,7 @@ void process_command_line(int argc, char **argv) { cline.add(set_fcst_files, "-fcst", -1); cline.add(set_obs_files, "-obs", -1); cline.add(set_both_files, "-both", -1); + cline.add(set_input, "-input", 1); cline.add(set_paired, "-paired", 0); cline.add(set_config_file, "-config", 1); cline.add(set_out_file, "-out", 1); @@ -1230,6 +1238,9 @@ void store_stat_ctc(int n, const ConcatString &col, // Set the column name to all upper case ConcatString c = to_upper(col); + // Handle ALL columns + if(c == all_columns) return store_stat_all_ctc(n, cts_info); + // Get the column value if(c == "TOTAL") { v = cts_info.cts.n(); } else if(c == "FY_OY") { v = cts_info.cts.fy_oy(); } @@ -1447,6 +1458,9 @@ void store_stat_mctc(int n, const ConcatString &col, ConcatString c = to_upper(col); ConcatString d = c; + // Handle ALL columns + if(c == all_columns) return store_stat_all_mctc(n, mcts_info); + // Get the column value if(c == "TOTAL") { v = (double) mcts_info.cts.total(); } else if(c == "N_CAT") { v = (double) mcts_info.cts.nrows(); } @@ -1748,6 +1762,9 @@ void store_stat_sl1l2(int n, const ConcatString &col, // Set the column name to all upper case ConcatString c = to_upper(col); + // Handle ALL columns + if(c == all_columns) return store_stat_all_sl1l2(n, s_info); + // Get the column value if(c == "TOTAL") { v = (double) s_info.scount; } else if(c == "FBAR") { v = s_info.fbar; } @@ -1811,6 +1828,9 @@ void store_stat_pct(int n, const ConcatString &col, ConcatString c = to_upper(col); ConcatString d = c; + // Handle ALL columns + if(c == all_columns) return store_stat_all_pct(n, pct_info); + // Get index value for variable column numbers if(check_reg_exp("_[0-9]", c.c_str())) { @@ -2081,6 +2101,40 @@ void store_stat_prc(int n, const ConcatString &col, //////////////////////////////////////////////////////////////////////// +void store_stat_all_ctc(int n, const CTSInfo &cts_info) { + for(int i=0; i Date: Tue, 30 Jul 2024 12:58:34 -0600 Subject: [PATCH 02/41] Per #1371, rename the -input command line option as -aggregate instead --- docs/Users_Guide/series-analysis.rst | 6 ++-- .../core/series_analysis/series_analysis.cc | 31 +++++++++++-------- .../core/series_analysis/series_analysis.h | 2 +- 3 files changed, 22 insertions(+), 17 deletions(-) diff --git a/docs/Users_Guide/series-analysis.rst b/docs/Users_Guide/series-analysis.rst index b54bf81cc9..ff9e6d389c 100644 --- a/docs/Users_Guide/series-analysis.rst +++ b/docs/Users_Guide/series-analysis.rst @@ -33,7 +33,7 @@ The usage statement for the Series-Analysis tool is shown below: -fcst file_1 ... file_n | fcst_file_list -obs file_1 ... file_n | obs_file_list [-both file_1 ... file_n | both_file_list] - [-input input_file] + [-aggregate file] [-paired] -out file -config file @@ -59,9 +59,9 @@ Optional Arguments for series_analysis 5. To set both the forecast and observations to the same set of files, use the optional -both file_1 ... file_n | both_file_list option to the same set of files. This is useful when reading the NetCDF matched pair output of the Grid-Stat tool which contains both forecast and observation data. -6. The -input option specifies the path to an existing Series-Analysis output file. When computing statistics for the data provided, Series-Analysis intializes the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) data using the input_file data. +6. The -aggregate option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data. -.. note:: When the -input option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. +.. note:: When the -aggregate option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. 7. The -paired option indicates that the -fcst and -obs file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When -paired is not used, missing or incomplete files result in a runtime error with no output file being created. When -paired is used, missing or incomplete files result in a warning with output being created using the available data. diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index d4292e74fc..c2ec1b7d5c 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -124,7 +124,7 @@ static void usage(); static void set_fcst_files(const StringArray &); static void set_obs_files(const StringArray &); static void set_both_files(const StringArray &); -static void set_input(const StringArray &); +static void set_aggregate(const StringArray &); static void set_paired(const StringArray &); static void set_out_file(const StringArray &); static void set_config_file(const StringArray &); @@ -171,14 +171,14 @@ void process_command_line(int argc, char **argv) { cline.set_usage(usage); // Add the options function calls - cline.add(set_fcst_files, "-fcst", -1); - cline.add(set_obs_files, "-obs", -1); - cline.add(set_both_files, "-both", -1); - cline.add(set_input, "-input", 1); - cline.add(set_paired, "-paired", 0); - cline.add(set_config_file, "-config", 1); - cline.add(set_out_file, "-out", 1); - cline.add(set_compress, "-compress", 1); + cline.add(set_fcst_files, "-fcst", -1); + cline.add(set_obs_files, "-obs", -1); + cline.add(set_both_files, "-both", -1); + cline.add(set_aggregate, "-aggregate", 1); + cline.add(set_paired, "-paired", 0); + cline.add(set_config_file, "-config", 1); + cline.add(set_out_file, "-out", 1); + cline.add(set_compress, "-compress", 1); // Parse the command line cline.parse(); @@ -826,6 +826,11 @@ void process_scores() { << pd_ptr[i].n_obs << " matched pairs.\n"; } + // TODO: MET #1371 figure out where to read data from the aggr_file + // file to initialize data structures. Maybe store DataPlanes + // in a map so that we only need to load the + // gridded data once and then can read it many times? + // Compute contingency table counts and statistics if(!conf_info.fcst_info[0]->is_prob() && (conf_info.output_stats[STATLineType::fho].n() + @@ -2319,7 +2324,7 @@ void usage() { << "\t-fcst file_1 ... file_n | fcst_file_list\n" << "\t-obs file_1 ... file_n | obs_file_list\n" << "\t[-both file_1 ... file_n | both_file_list]\n" - << "\t[-input input_file]\n" + << "\t[-aggregate file]\n" << "\t[-paired]\n" << "\t-out file\n" << "\t-config file\n" @@ -2342,7 +2347,7 @@ void usage() { << "\t\t\"-both\" sets the \"-fcst\" and \"-obs\" options to " << "the same set of files (optional).\n" - << "\t\t\"-input input_file\" specifies a series_analysis output " + << "\t\t\"-aggregate file\" specifies a series_analysis output " << "file with partial sums and/or contingency table counts to be " << "updated prior to deriving statistics (optional).\n" @@ -2388,8 +2393,8 @@ void set_both_files(const StringArray & a) { //////////////////////////////////////////////////////////////////////// -void set_input(const StringArray & a) { - input_file = a[0]; +void set_aggregate(const StringArray & a) { + aggr_file = a[0]; } //////////////////////////////////////////////////////////////////////// diff --git a/src/tools/core/series_analysis/series_analysis.h b/src/tools/core/series_analysis/series_analysis.h index 330a93870c..5a0b309abe 100644 --- a/src/tools/core/series_analysis/series_analysis.h +++ b/src/tools/core/series_analysis/series_analysis.h @@ -73,7 +73,7 @@ static StringArray fcst_files, found_fcst_files; static StringArray obs_files, found_obs_files; static GrdFileType ftype = FileType_None; static GrdFileType otype = FileType_None; -static ConcatString input_file; +static ConcatString aggr_file; static bool paired = false; static int compress_level = -1; From 5f12f30e0b77a00bbde1ba065f4323fbd742a50b Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Tue, 30 Jul 2024 14:50:48 -0600 Subject: [PATCH 03/41] Per #1371, work in progress --- .../core/series_analysis/series_analysis.cc | 65 ++++++++++--------- .../core/series_analysis/series_analysis.h | 4 +- 2 files changed, 37 insertions(+), 32 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 9cffb790db..3219bd2ccc 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -187,11 +187,11 @@ void process_command_line(int argc, char **argv) { // Check for error. There should be zero arguments left. if(cline.n() != 0) usage(); - // Warn about log output + // Recommend logging verbosity level of 3 or less if(mlog.verbosity_level() >= 3) { - mlog << Warning << "\nRunning Series-Analysis at verbosity >= 3 " + mlog << Debug(3) << "Running Series-Analysis at verbosity >= 3 " << "produces excessive log output and can slow the runtime " - << "considerably.\n\n"; + << "considerably.\n"; } // Check that the required arguments have been set. @@ -219,6 +219,12 @@ void process_command_line(int argc, char **argv) { << "\"-out\" option.\n\n"; usage(); } + if(aggr_file == out_file) { + mlog << Error << "\nprocess_command_line() -> " + << "the \"-out\" and \"-aggregate\" options cannot be " + << "set to the same file (\"" << aggr_file << "\")!\n\n"; + usage(); + } // Create the default config file name default_config_file = replace_path(default_config_filename); @@ -355,16 +361,17 @@ void process_grid(const Grid &fcst_grid, const Grid &obs_grid) { // Determine the verification grid grid = parse_vx_grid(conf_info.fcst_info[0]->regrid(), &fcst_grid, &obs_grid); - nxy = grid.nx() * grid.ny(); // Process masking regions conf_info.process_masks(grid); // Set the block size, if needed - if(is_bad_data(conf_info.block_size)) conf_info.block_size = nxy; + if(is_bad_data(conf_info.block_size)) { + conf_info.block_size = grid.nxy(); + } // Compute the number of reads required - n_reads = nint(ceil((double) nxy / conf_info.block_size)); + n_reads = nint(ceil((double) grid.nxy() / conf_info.block_size)); mlog << Debug(2) << "Computing statistics using a block size of " @@ -506,7 +513,7 @@ void get_series_data(int i_series, } // Setup the verification grid - if(nxy == 0) process_grid(fcst_grid, obs_grid); + if(!grid.is_set()) process_grid(fcst_grid, obs_grid); // Regrid the forecast, if necessary if(!(fcst_grid == grid)) { @@ -627,7 +634,7 @@ void get_series_entry(int i_series, VarInfo *info, } // If not already done, read the data - if(dp.nx() == 0 && dp.ny() == 0) { + if(dp.is_empty()) { found = read_single_entry(info, found_files[i_series], type, dp, cur_grid); } @@ -688,7 +695,8 @@ bool read_single_entry(VarInfo *info, const ConcatString &cur_file, //////////////////////////////////////////////////////////////////////// void process_scores() { - int i, x, y, i_read, i_series, i_point, i_fcst; + int i, x, y; + int i_point = 0; VarInfo *fcst_info = (VarInfo *) nullptr; VarInfo *obs_info = (VarInfo *) nullptr; PairDataPoint *pd_ptr = (PairDataPoint *) nullptr; @@ -704,14 +712,13 @@ void process_scores() { int n_skip_pos = 0; // Loop over the data reads - for(i_read=0; i_read 1 ? i_series : 0); + int i_fcst = (conf_info.get_n_fcst() > 1 ? i_series : 0); // Store the current VarInfo objects fcst_info = conf_info.fcst_info[i_fcst]; @@ -742,7 +749,7 @@ void process_scores() { mlog << Debug(2) << "Processing data pass number " << i_read + 1 << " of " << n_reads << " for grid points " << i_point + 1 << " to " - << min(i_point + conf_info.block_size, nxy) << ".\n"; + << min(i_point + conf_info.block_size, grid.nxy()) << ".\n"; } // Read climatology data for the current series entry @@ -759,14 +766,10 @@ void process_scores() { conf_info.conf.lookup_array(conf_key_obs_climo_stdev_field, false), i_fcst, fcst_dp.valid(), grid); - bool fcmn_flag = (fcmn_dp.nx() == fcst_dp.nx() && - fcmn_dp.ny() == fcst_dp.ny()); - bool fcsd_flag = (fcsd_dp.nx() == fcst_dp.nx() && - fcsd_dp.ny() == fcst_dp.ny()); - bool ocmn_flag = (ocmn_dp.nx() == fcst_dp.nx() && - ocmn_dp.ny() == fcst_dp.ny()); - bool ocsd_flag = (ocsd_dp.nx() == fcst_dp.nx() && - ocsd_dp.ny() == fcst_dp.ny()); + bool fcmn_flag = !fcmn_dp.is_empty(); + bool fcsd_flag = !fcsd_dp.is_empty(); + bool ocmn_flag = !ocmn_dp.is_empty(); + bool ocsd_flag = !ocsd_dp.is_empty(); mlog << Debug(3) << "For " << fcst_info->magic_str() << ", found " @@ -787,7 +790,7 @@ void process_scores() { set_range(obs_dp.lead(), obs_lead_beg, obs_lead_end); // Store matched pairs for each grid point - for(i=0; i stat_data; +// Mapping of aggregate NetCDF variable name to DataPlane +std::map aggr_data; + //////////////////////////////////////////////////////////////////////// // // Miscellaneous Variables @@ -111,7 +114,6 @@ std::map stat_data; // Grid variables static Grid grid; -static int nxy = 0; static int n_reads = 1; // Initialize to at least one pass // Data file factory and input files From ea1b00a47d9a26efe96ca0afca5e3ce8ec344112 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 1 Aug 2024 09:15:14 -0600 Subject: [PATCH 04/41] Per #1371, just comments --- src/tools/core/series_analysis/series_analysis.h | 1 - 1 file changed, 1 deletion(-) diff --git a/src/tools/core/series_analysis/series_analysis.h b/src/tools/core/series_analysis/series_analysis.h index 5d0a47b176..5476351dae 100644 --- a/src/tools/core/series_analysis/series_analysis.h +++ b/src/tools/core/series_analysis/series_analysis.h @@ -17,7 +17,6 @@ // 000 12/10/12 Halley Gotway New // 001 09/28/22 Prestopnik MET #2227 Remove namespace std and netCDF from header files // -// //////////////////////////////////////////////////////////////////////// #ifndef __SERIES_ANALYSIS_H__ From d92b2dfbf501e78103bc7f722420119e499b0531 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 1 Aug 2024 12:15:13 -0600 Subject: [PATCH 05/41] Per #1371, working on aggregating CTC counts --- docs/Users_Guide/series-analysis.rst | 6 +- .../test_unit/xml/unit_series_analysis.xml | 52 ++++- .../core/series_analysis/series_analysis.cc | 208 ++++++++++++++---- .../core/series_analysis/series_analysis.h | 5 +- 4 files changed, 211 insertions(+), 60 deletions(-) diff --git a/docs/Users_Guide/series-analysis.rst b/docs/Users_Guide/series-analysis.rst index ff9e6d389c..fa015d6896 100644 --- a/docs/Users_Guide/series-analysis.rst +++ b/docs/Users_Guide/series-analysis.rst @@ -33,7 +33,7 @@ The usage statement for the Series-Analysis tool is shown below: -fcst file_1 ... file_n | fcst_file_list -obs file_1 ... file_n | obs_file_list [-both file_1 ... file_n | both_file_list] - [-aggregate file] + [-aggr file] [-paired] -out file -config file @@ -59,9 +59,9 @@ Optional Arguments for series_analysis 5. To set both the forecast and observations to the same set of files, use the optional -both file_1 ... file_n | both_file_list option to the same set of files. This is useful when reading the NetCDF matched pair output of the Grid-Stat tool which contains both forecast and observation data. -6. The -aggregate option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data. +6. The -aggr option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data. -.. note:: When the -aggregate option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. +.. note:: When the -aggr option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. 7. The -paired option indicates that the -fcst and -obs file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When -paired is not used, missing or incomplete files result in a runtime error with no output file being created. When -paired is used, missing or incomplete files result in a warning with output being created using the available data. diff --git a/internal/test_unit/xml/unit_series_analysis.xml b/internal/test_unit/xml/unit_series_analysis.xml index c1e64416b3..036b1e7607 100644 --- a/internal/test_unit/xml/unit_series_analysis.xml +++ b/internal/test_unit/xml/unit_series_analysis.xml @@ -29,12 +29,12 @@ OBS_FIELD { name = "APCP"; level = [ "A06" ]; } MASK_POLY FHO_STATS "F_RATE", "O_RATE" - CTC_STATS "FY_OY", "FN_ON" + CTC_STATS "ALL" CTS_STATS "CSI", "GSS" - MCTC_STATS "F1_O1", "F2_O2", "F3_O3" + MCTC_STATS "ALL" MCTS_STATS "ACC", "ACC_NCL", "ACC_NCU" CNT_STATS "TOTAL", "ME", "ME_NCL", "ME_NCU" - SL1L2_STATS "FBAR", "OBAR" + SL1L2_STATS "ALL" SAL1L2_STATS PCT_STATS PSTD_STATS @@ -46,22 +46,54 @@ &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F012.grib \ &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F018.grib \ &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F024.grib \ - &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F030.grib \ - &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F036.grib \ - &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F042.grib \ -obs &DATA_DIR_OBS;/stage4_hmt/stage4_2012040906_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012040912_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012040918_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012041000_06h.grib \ - &DATA_DIR_OBS;/stage4_hmt/stage4_2012041006_06h.grib \ + -out &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041000.nc \ + -config &CONFIG_DIR;/SeriesAnalysisConfig \ + -v 1 + + + &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041000.nc + + + + + &MET_BIN;/series_analysis + + MODEL GFS + OBTYPE STAGE4 + FCST_CAT_THRESH >0.0, >5.0 + FCST_FIELD { name = "APCP"; level = [ "A06" ]; } + OBS_CAT_THRESH >0.0, >5.0 + OBS_FIELD { name = "APCP"; level = [ "A06" ]; } + MASK_POLY + FHO_STATS "F_RATE", "O_RATE" + CTC_STATS "ALL" + CTS_STATS "CSI", "GSS" + MCTC_STATS "ALL" + MCTS_STATS "ACC", "ACC_NCL", "ACC_NCU" + CNT_STATS "TOTAL", "ME", "ME_NCL", "ME_NCU" + SL1L2_STATS "ALL" + SAL1L2_STATS + PCT_STATS + PSTD_STATS + PJC_STATS + PRC_STATS + + \ + -fcst &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F030.grib \ + &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F036.grib \ + -obs &DATA_DIR_OBS;/stage4_hmt/stage4_2012041006_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012041012_06h.grib \ - &DATA_DIR_OBS;/stage4_hmt/stage4_2012041018_06h.grib \ - -out &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041100.nc \ + -aggr &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041000.nc \ + -out &OUTPUT_DIR;/series_analysis/series_analysis_AGGREGATE_APCP_06_2012040900_to_2012041012.nc \ -config &CONFIG_DIR;/SeriesAnalysisConfig \ -v 1 - &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041100.nc + &OUTPUT_DIR;/series_analysis/series_analysis_AGGREGATE_APCP_06_2012040900_to_2012041100.nc diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 3219bd2ccc..252f4843b5 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -40,7 +40,6 @@ // //////////////////////////////////////////////////////////////////////// - #include #include #include @@ -57,6 +56,7 @@ #include "main.h" #include "series_analysis.h" +#include "vx_data2d_nc_met.h" #include "vx_statistics.h" #include "vx_nc_util.h" #include "vx_regrid.h" @@ -66,7 +66,6 @@ using namespace std; using namespace netCDF; - //////////////////////////////////////////////////////////////////////// static void process_command_line(int, char **); @@ -84,6 +83,7 @@ static void get_series_entry(int, VarInfo *, const StringArray &, DataPlane &, Grid &); static bool read_single_entry(VarInfo *, const ConcatString &, const GrdFileType, DataPlane &, Grid &); +static DataPlane get_aggr_data(const ConcatString &); static void process_scores(); @@ -93,6 +93,8 @@ static void do_cnt (int, const PairDataPoint *); static void do_sl1l2 (int, const PairDataPoint *); static void do_pct (int, const PairDataPoint *); +static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &ctc); + static void store_stat_fho (int, const ConcatString &, const CTSInfo &); static void store_stat_ctc (int, const ConcatString &, const CTSInfo &); static void store_stat_cts (int, const ConcatString &, const CTSInfo &); @@ -110,6 +112,8 @@ static void store_stat_all_mctc (int, const MCTSInfo &); static void store_stat_all_sl1l2(int, const SL1L2Info &); static void store_stat_all_pct (int, const PCTInfo &); +static ConcatString build_nc_var_name_ctc(const ConcatString &, const CTSInfo &); + static void setup_nc_file(const VarInfo *, const VarInfo *); static void add_nc_var(const ConcatString &, const ConcatString &, const ConcatString &, const ConcatString &, @@ -125,7 +129,7 @@ static void usage(); static void set_fcst_files(const StringArray &); static void set_obs_files(const StringArray &); static void set_both_files(const StringArray &); -static void set_aggregate(const StringArray &); +static void set_aggr(const StringArray &); static void set_paired(const StringArray &); static void set_out_file(const StringArray &); static void set_config_file(const StringArray &); @@ -172,14 +176,14 @@ void process_command_line(int argc, char **argv) { cline.set_usage(usage); // Add the options function calls - cline.add(set_fcst_files, "-fcst", -1); - cline.add(set_obs_files, "-obs", -1); - cline.add(set_both_files, "-both", -1); - cline.add(set_aggregate, "-aggregate", 1); - cline.add(set_paired, "-paired", 0); - cline.add(set_config_file, "-config", 1); - cline.add(set_out_file, "-out", 1); - cline.add(set_compress, "-compress", 1); + cline.add(set_fcst_files, "-fcst", -1); + cline.add(set_obs_files, "-obs", -1); + cline.add(set_both_files, "-both", -1); + cline.add(set_aggr, "-aggr", 1); + cline.add(set_paired, "-paired", 0); + cline.add(set_config_file, "-config", 1); + cline.add(set_out_file, "-out", 1); + cline.add(set_compress, "-compress", 1); // Parse the command line cline.parse(); @@ -221,7 +225,7 @@ void process_command_line(int argc, char **argv) { } if(aggr_file == out_file) { mlog << Error << "\nprocess_command_line() -> " - << "the \"-out\" and \"-aggregate\" options cannot be " + << "the \"-out\" and \"-aggr\" options cannot be " << "set to the same file (\"" << aggr_file << "\")!\n\n"; usage(); } @@ -694,6 +698,52 @@ bool read_single_entry(VarInfo *info, const ConcatString &cur_file, //////////////////////////////////////////////////////////////////////// +DataPlane get_aggr_data(const ConcatString &var_name) { + DataPlane aggr_dp; + bool found = false; + + // Open the aggregate file, if needed + if(!aggr_mtddf) { + aggr_mtddf = mtddf_factory.new_met_2d_data_file(aggr_file.c_str(), FileType_NcMet); + + // Update timing info + /* TODO MET #1371 + set_range(fcst_dp.init(), fcst_init_beg, fcst_init_end); + set_range(fcst_dp.valid(), fcst_valid_beg, fcst_valid_end); + set_range(fcst_dp.lead(), fcst_lead_beg, fcst_lead_end); + set_range(obs_dp.init(), obs_init_beg, obs_init_end); + set_range(obs_dp.valid(), obs_valid_beg, obs_valid_end); + set_range(obs_dp.lead(), obs_lead_beg, obs_lead_end); + */ + } + + // Setup the data request + VarInfoNcMet aggr_info; + aggr_info.set_magic(var_name, "(*,*)"); + + // Attempt to read the gridded data from the current file + if(!aggr_mtddf->data_plane(aggr_info, aggr_dp)) { + mlog << Error << "\nget_aggr_data() -> " + << "Required variable " << var_name << " not found in aggregate file " + << aggr_file << "\n\n"; + exit(1); + } + + // Check that the grid has not changed + if(aggr_mtddf->grid().nx() != grid.nx() || + aggr_mtddf->grid().ny() != grid.ny()) { + mlog << Error << "\nget_aggr_data() -> " + << "the input grid dimensions (" << grid.nx() << ", " << grid.ny() + << ") and aggregate grid dimensions (" << aggr_mtddf->grid().nx() + << ", " << aggr_mtddf->grid().ny() << ") do not match!\n\n"; + exit(1); + } + + return aggr_dp; +} + +//////////////////////////////////////////////////////////////////////// + void process_scores() { int i, x, y; int i_point = 0; @@ -848,11 +898,6 @@ void process_scores() { << pd_ptr[i].n_obs << " matched pairs.\n"; } - // TODO: MET #1371 figure out where to read data from the aggr_file - // file to initialize data structures. Maybe store DataPlanes - // in a map so that we only need to load the - // gridded data once and then can read it many times? - // Compute contingency table counts and statistics if(!conf_info.fcst_info[0]->is_prob() && (conf_info.output_stats[STATLineType::fho].n() + @@ -946,7 +991,6 @@ void process_scores() { //////////////////////////////////////////////////////////////////////// void do_cts(int n, const PairDataPoint *pd_ptr) { - int i, j; mlog << Debug(4) << "Computing Categorical Statistics.\n"; @@ -955,20 +999,47 @@ void do_cts(int n, const PairDataPoint *pd_ptr) { CTSInfo *cts_info = new CTSInfo [n_cts]; // Setup CTSInfo objects - for(i=0; in_obs-1); + + // Loop over the thresholds + for(int i=0; i Date: Thu, 1 Aug 2024 13:25:48 -0600 Subject: [PATCH 06/41] Per #1371, work in progress --- .../core/series_analysis/series_analysis.cc | 29 ++++++++++--------- .../core/series_analysis/series_analysis.h | 3 +- 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 252f4843b5..5592ec8500 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -56,7 +56,6 @@ #include "main.h" #include "series_analysis.h" -#include "vx_data2d_nc_met.h" #include "vx_statistics.h" #include "vx_nc_util.h" #include "vx_regrid.h" @@ -696,6 +695,9 @@ bool read_single_entry(VarInfo *info, const ConcatString &cur_file, return found; } +// TODO MET #1371 figure out how valid data thresholds should be handled +// when reading -aggr data + //////////////////////////////////////////////////////////////////////// DataPlane get_aggr_data(const ConcatString &var_name) { @@ -703,8 +705,11 @@ DataPlane get_aggr_data(const ConcatString &var_name) { bool found = false; // Open the aggregate file, if needed - if(!aggr_mtddf) { - aggr_mtddf = mtddf_factory.new_met_2d_data_file(aggr_file.c_str(), FileType_NcMet); + if(!aggr_nc.MetNc) { + + mlog << Debug(2) << "Reading aggregate data file: " << aggr_file << "\n"; + + aggr_nc.open(aggr_file.c_str()); // Update timing info /* TODO MET #1371 @@ -722,20 +727,20 @@ DataPlane get_aggr_data(const ConcatString &var_name) { aggr_info.set_magic(var_name, "(*,*)"); // Attempt to read the gridded data from the current file - if(!aggr_mtddf->data_plane(aggr_info, aggr_dp)) { + if(!aggr_nc.data_plane(aggr_info, aggr_dp)) { mlog << Error << "\nget_aggr_data() -> " - << "Required variable " << var_name << " not found in aggregate file " - << aggr_file << "\n\n"; + << "Required variable \"" << aggr_info.magic_str() << "\"" + << " not found in aggregate file!\n\n"; exit(1); } // Check that the grid has not changed - if(aggr_mtddf->grid().nx() != grid.nx() || - aggr_mtddf->grid().ny() != grid.ny()) { + if(aggr_nc.grid().nx() != grid.nx() || + aggr_nc.grid().ny() != grid.ny()) { mlog << Error << "\nget_aggr_data() -> " << "the input grid dimensions (" << grid.nx() << ", " << grid.ny() - << ") and aggregate grid dimensions (" << aggr_mtddf->grid().nx() - << ", " << aggr_mtddf->grid().ny() << ") do not match!\n\n"; + << ") and aggregate grid dimensions (" << aggr_nc.grid().nx() + << ", " << aggr_nc.grid().ny() << ") do not match!\n\n"; exit(1); } @@ -1028,7 +1033,6 @@ void do_cts(int n, const PairDataPoint *pd_ptr) { read_aggr_ctc(n, cts_info[i], aggr_ctc); // Aggregate past CTC counts with new ones - // TODO MET #1371: rename CTS to CTC? cts_info[i].cts += aggr_ctc; // Compute statistics and confidence intervals @@ -1226,7 +1230,7 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, TTContingencyTable &ctc) { // Initialize - ctc.clear(); + ctc.zero_out(); // Loop over the CTC column names for(int i=0; i Date: Thu, 1 Aug 2024 13:47:45 -0600 Subject: [PATCH 07/41] Per #1371, update timing info using time stamps in the aggr file --- .../core/series_analysis/series_analysis.cc | 48 +++++++++++++++---- 1 file changed, 39 insertions(+), 9 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 5592ec8500..835300aebd 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -711,15 +711,45 @@ DataPlane get_aggr_data(const ConcatString &var_name) { aggr_nc.open(aggr_file.c_str()); - // Update timing info - /* TODO MET #1371 - set_range(fcst_dp.init(), fcst_init_beg, fcst_init_end); - set_range(fcst_dp.valid(), fcst_valid_beg, fcst_valid_end); - set_range(fcst_dp.lead(), fcst_lead_beg, fcst_lead_end); - set_range(obs_dp.init(), obs_init_beg, obs_init_end); - set_range(obs_dp.valid(), obs_valid_beg, obs_valid_end); - set_range(obs_dp.lead(), obs_lead_beg, obs_lead_end); - */ + // Update timing info based on aggregate file global attributes + ConcatString cs; + + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_init_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_init_beg, fcst_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_init_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_init_beg, fcst_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_valid_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_valid_beg, fcst_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_valid_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_valid_beg, fcst_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_lead_beg", cs)) { + set_range(timestring_to_sec(cs.c_str()), fcst_lead_beg, fcst_lead_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_lead_end", cs)) { + set_range(timestring_to_sec(cs.c_str()), fcst_lead_beg, fcst_lead_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_init_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_init_beg, obs_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_init_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_init_beg, obs_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_valid_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_valid_beg, obs_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_valid_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_valid_beg, obs_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_lead_beg", cs)) { + set_range(timestring_to_sec(cs.c_str()), obs_lead_beg, obs_lead_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_lead_end", cs)) { + set_range(timestring_to_sec(cs.c_str()), obs_lead_beg, obs_lead_end); + } } // Setup the data request From 6364c9928c03d0553dbb2245461d168c60b6b119 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 1 Aug 2024 20:01:01 +0000 Subject: [PATCH 08/41] Per #1371, close the aggregate data file --- src/tools/core/series_analysis/series_analysis.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 835300aebd..23706b5365 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -2478,6 +2478,9 @@ void clean_up() { nc_out = (NcFile *) nullptr; } + // Close the aggregate NetCDF file + if(aggr_nc.MetNc) aggr_nc.close(); + // Deallocate memory for data files if(fcst_mtddf) { delete fcst_mtddf; fcst_mtddf = nullptr; } if(obs_mtddf) { delete obs_mtddf; obs_mtddf = nullptr; } From fe26bf1b2046db396bdb196e7db8937058628f21 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Wed, 7 Aug 2024 17:06:41 -0600 Subject: [PATCH 09/41] Per #1371, define set_event() and set_nonevent() member functions --- src/libcode/vx_statistics/contable.h | 7 ++ src/libcode/vx_statistics/contable_nx2.cc | 90 +++++++++++++++++++---- 2 files changed, 82 insertions(+), 15 deletions(-) diff --git a/src/libcode/vx_statistics/contable.h b/src/libcode/vx_statistics/contable.h index cae46f880a..8c7823c4d7 100644 --- a/src/libcode/vx_statistics/contable.h +++ b/src/libcode/vx_statistics/contable.h @@ -193,6 +193,13 @@ class Nx2ContingencyTable : public ContingencyTable { double threshold(int index) const; // 0 <= index <= Nrows + // + // set counts + // + + void set_event(int row, int count); + void set_nonevent(int row, int count); + // // increment counts // diff --git a/src/libcode/vx_statistics/contable_nx2.cc b/src/libcode/vx_statistics/contable_nx2.cc index 9af0358511..c69ec38545 100644 --- a/src/libcode/vx_statistics/contable_nx2.cc +++ b/src/libcode/vx_statistics/contable_nx2.cc @@ -187,7 +187,8 @@ void Nx2ContingencyTable::set_size(int NR, int NC) if ( NC != 2 ) { - mlog << Error << "\nNx2ContingencyTable::set_size(int, int) -> must have 2 columns!\n\n"; + mlog << Error << "\nNx2ContingencyTable::set_size(int, int) -> " + << "must have 2 columns!\n\n"; exit ( 1 ); @@ -209,7 +210,8 @@ int Nx2ContingencyTable::value_to_row(double t) const if ( !Thresholds ) { - mlog << Error << "\nNx2ContingencyTable::value_to_row(double) const -> thresholds array not set!\n\n"; + mlog << Error << "\nNx2ContingencyTable::value_to_row(double) const -> " + << "thresholds array not set!\n\n"; exit ( 1 ); @@ -246,7 +248,8 @@ void Nx2ContingencyTable::set_thresholds(const double * Values) if ( E->empty() ) { - mlog << Error << "\nNx2ContingencyTable::set_thresholds(const double *) -> table empty!\n\n"; + mlog << Error << "\nNx2ContingencyTable::set_thresholds(const double *) -> " + << "table empty!\n\n"; exit ( 1 ); @@ -272,7 +275,8 @@ double Nx2ContingencyTable::threshold(int k) const if ( !Thresholds ) { - mlog << Error << "\nNx2ContingencyTable::threshold(int) const -> no thresholds set!\n\n"; + mlog << Error << "\nNx2ContingencyTable::threshold(int) const -> " + << "no thresholds set!\n\n"; exit ( 1 ); @@ -280,7 +284,8 @@ if ( !Thresholds ) { if ( (k < 0) || (k > Nrows) ) { // there are Nrows + 1 thresholds - mlog << Error << "\nNx2ContingencyTable::threshold(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::threshold(int) const -> " + << "range check error\n\n"; exit ( 1 ); @@ -290,6 +295,51 @@ return Thresholds[k]; } +//////////////////////////////////////////////////////////////////////// + + +void Nx2ContingencyTable::set_event(int row, int value) + +{ + +if ( row < 0 || row >= Nrows ) { + + mlog << Error << "\nNx2ContingencyTable::set_event(double) -> " + << "bad row index ... " << row << "\n\n"; + + exit ( 1 ); + +} + +set_entry(row, nx2_event_column, value); + +return; + +} + + +//////////////////////////////////////////////////////////////////////// + + +void Nx2ContingencyTable::set_nonevent(int row, int value) + +{ + +if ( row < 0 || row >= Nrows ) { + + mlog << Error << "\nNx2ContingencyTable::set_nonevent(double) -> " + << "bad row index ... " << row << "\n\n"; + + exit ( 1 ); + +} + +set_entry(row, nx2_nonevent_column, value); + +return; + +} + //////////////////////////////////////////////////////////////////////// @@ -304,7 +354,8 @@ r = value_to_row(t); if ( r < 0 ) { - mlog << Error << "\nNx2ContingencyTable::inc_event(double) -> bad value ... " << t << "\n\n"; + mlog << Error << "\nNx2ContingencyTable::inc_event(double) -> " + << "bad value ... " << t << "\n\n"; exit ( 1 ); @@ -330,7 +381,8 @@ r = value_to_row(t); if ( r < 0 ) { - mlog << Error << "\nNx2ContingencyTable::inc_nonevent(double) -> bad value ... " << t << "\n\n"; + mlog << Error << "\nNx2ContingencyTable::inc_nonevent(double) -> " + << "bad value ... " << t << "\n\n"; exit ( 1 ); @@ -356,7 +408,8 @@ r = value_to_row(t); if ( r < 0 ) { - mlog << Error << "\nNx2ContingencyTable::event_count_by_thresh(double) -> bad value ... " << t << "\n\n"; + mlog << Error << "\nNx2ContingencyTable::event_count_by_thresh(double) -> " + << "bad value ... " << t << "\n\n"; exit ( 1 ); @@ -384,7 +437,8 @@ r = value_to_row(t); if ( r < 0 ) { - mlog << Error << "\nNx2ContingencyTable::nonevent_count_by_thresh(double) -> bad value ... " << t << "\n\n"; + mlog << Error << "\nNx2ContingencyTable::nonevent_count_by_thresh(double) -> " + << "bad value ... " << t << "\n\n"; exit ( 1 ); @@ -446,7 +500,8 @@ double Nx2ContingencyTable::row_proby(int row) const if ( (row < 0) || (row >= Nrows) ) { - mlog << Error << "\nNx2ContingencyTable::row_proby(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::row_proby(int) const -> " + << "range check error\n\n"; exit ( 1 ); @@ -693,7 +748,8 @@ double Nx2ContingencyTable::row_calibration(int row) const if ( (row < 0) || (row >= Nrows) ) { - mlog << Error << "\nNx2ContingencyTable::row_calibration(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::row_calibration(int) const -> " + << "range check error\n\n"; exit ( 1 ); @@ -723,7 +779,8 @@ double Nx2ContingencyTable::row_refinement(int row) const if ( (row < 0) || (row >= Nrows) ) { - mlog << Error << "\nNx2ContingencyTable::row_refinement(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::row_refinement(int) const -> " + << "range check error\n\n"; exit ( 1 ); @@ -755,7 +812,8 @@ double Nx2ContingencyTable::row_event_likelihood(int row) const if ( (row < 0) || (row >= Nrows) ) { - mlog << Error << "\nNx2ContingencyTable::row_event_likelihood(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::row_event_likelihood(int) const -> " + << "range check error\n\n"; exit ( 1 ); @@ -784,7 +842,8 @@ double Nx2ContingencyTable::row_nonevent_likelihood(int row) const if ( (row < 0) || (row >= Nrows) ) { - mlog << Error << "\nNx2ContingencyTable::row_nonevent_likelihood(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::row_nonevent_likelihood(int) const -> " + << "range check error\n\n"; exit ( 1 ); @@ -815,7 +874,8 @@ TTContingencyTable tt; if ( (row < 0) || (row >= Nrows) ) { - mlog << Error << "\nNx2ContingencyTable::ctc_by_row(int) const -> range check error\n\n"; + mlog << Error << "\nNx2ContingencyTable::ctc_by_row(int) const -> " + << "range check error\n\n"; exit ( 1 ); From 7980c3e4551fe7e20af9a88fb806bbcde8bac1e1 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Wed, 7 Aug 2024 17:09:30 -0600 Subject: [PATCH 10/41] Per #1371, add logic to aggregate MCTC and PCT counts --- .../core/series_analysis/series_analysis.cc | 191 ++++++++++++++++-- 1 file changed, 173 insertions(+), 18 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 23706b5365..c9cec58495 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -92,7 +92,11 @@ static void do_cnt (int, const PairDataPoint *); static void do_sl1l2 (int, const PairDataPoint *); static void do_pct (int, const PairDataPoint *); -static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &ctc); +// TODO: MET #1371 need logic to aggregate SL1L2, SAL1L2, and CNT (?) + +static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &); +static void read_aggr_mctc (int, const MCTSInfo &, ContingencyTable &); +static void read_aggr_pct (int, const PCTInfo &, Nx2ContingencyTable &); static void store_stat_fho (int, const ConcatString &, const CTSInfo &); static void store_stat_ctc (int, const ConcatString &, const CTSInfo &); @@ -531,7 +535,7 @@ void get_series_data(int i_series, exit(1); } - mlog << Debug(1) + mlog << Debug(3) << "Regridding field " << fcst_info->magic_str() << " to the verification grid.\n"; fcst_dp = met_regrid(fcst_dp, fcst_grid, grid, @@ -551,7 +555,7 @@ void get_series_data(int i_series, exit(1); } - mlog << Debug(1) + mlog << Debug(3) << "Regridding field " << obs_info->magic_str() << " to the verification grid.\n"; obs_dp = met_regrid(obs_dp, obs_grid, grid, @@ -707,7 +711,8 @@ DataPlane get_aggr_data(const ConcatString &var_name) { // Open the aggregate file, if needed if(!aggr_nc.MetNc) { - mlog << Debug(2) << "Reading aggregate data file: " << aggr_file << "\n"; + mlog << Debug(1) + << "Reading aggregate data file: " << aggr_file << "\n"; aggr_nc.open(aggr_file.c_str()); @@ -1045,7 +1050,7 @@ void do_cts(int n, const PairDataPoint *pd_ptr) { } } - // Aggregate past CTC counts + // Aggregate new point data with input CTC counts if(aggr_file.nonempty()) { // Index NumArray to use all points @@ -1135,9 +1140,31 @@ void do_mcts(int n, const PairDataPoint *pd_ptr) { mcts_info.alpha[i] = conf_info.ci_alpha[i]; } + // Aggregate new point data with input MCTC counts + if(aggr_file.nonempty()) { + + // Index NumArray to use all points + NumArray i_na; + i_na.add_seq(0, pd_ptr->n_obs-1); + + // Compute the current MCTSInfo + compute_mctsinfo(*pd_ptr, i_na, false, false, mcts_info); + + // Read the MCTC data to be aggregated + ContingencyTable aggr_ctc; + read_aggr_mctc(n, mcts_info, aggr_ctc); + + // Aggregate past MCTC counts with new ones + mcts_info.cts += aggr_ctc; + + // Compute statistics and confidence intervals + mcts_info.compute_stats(); + mcts_info.compute_ci(); + + } // Compute the counts, stats, normal confidence intervals, and // bootstrap confidence intervals - if(conf_info.boot_interval == BootIntervalType::BCA) { + else if(conf_info.boot_interval == BootIntervalType::BCA) { compute_mcts_stats_ci_bca(rng_ptr, *pd_ptr, conf_info.n_boot_rep, mcts_info, true, @@ -1257,17 +1284,15 @@ void do_sl1l2(int n, const PairDataPoint *pd_ptr) { //////////////////////////////////////////////////////////////////////// void read_aggr_ctc(int n, const CTSInfo &cts_info, - TTContingencyTable &ctc) { + TTContingencyTable &aggr_ctc) { // Initialize - ctc.zero_out(); + aggr_ctc.zero_out(); // Loop over the CTC column names for(int i=0; i " + << "the number of MCTC categories do not match (" + << nint(v) << " != " << aggr_ctc.nrows() << ")!\n\n"; + exit(1); + } + // Check the expected correct + else if(c == "EC_VALUE" && !is_bad_data(v) && + !is_eq(v, aggr_ctc.ec_value(), loose_tol)) { + mlog << Error << "\nread_aggr_mctc() -> " + << "the MCTC expected correct values do not match (" + << v << " != " << aggr_ctc.ec_value() << ")!\n\n"; + exit(1); + } + // Populate the MCTC table + else if(check_reg_exp("F[0-9]*_O[0-9]*", c.c_str())) { + StringArray sa(c.split("_")); + int i_row = atoi(sa[0].c_str()+1) - 1; + int i_col = atoi(sa[1].c_str()+1) - 1; + aggr_ctc.set_entry(i_row, i_col, nint(v)); + } + } // end for i + + return; +} + +//////////////////////////////////////////////////////////////////////// + +void read_aggr_pct(int n, const PCTInfo &pct_info, + Nx2ContingencyTable &aggr_pct) { + + // Initialize + aggr_pct = pct_info.pct; + aggr_pct.zero_out(); + + // Get PCT column names + StringArray pct_cols(get_pct_columns(aggr_pct.nrows())); + + // Loop over the PCT colum names + for(int i=0; i " + << "the number of PCT categories do not match (" + << nint(v)+1 << " != " << aggr_pct.nrows() << ")!\n\n"; + exit(1); + } + // Set the event counts + else if(check_reg_exp("OY_[0-9]", c.c_str())) { + + // Parse the index value from the column name + int i_row = atoi(strrchr(c.c_str(), '_') + 1) - 1; + aggr_pct.set_event(i_row, nint(v)); + } + // Set the non-event counts + else if(check_reg_exp("ON_[0-9]", c.c_str())) { + + // Parse the index value from the column name + int i_row = atoi(strrchr(c.c_str(), '_') + 1) - 1; + aggr_pct.set_nonevent(i_row, nint(v)); + } + + } // end for i + + return; +} + +//////////////////////////////////////////////////////////////////////// + void do_pct(int n, const PairDataPoint *pd_ptr) { int i, j; @@ -1313,8 +1448,28 @@ void do_pct(int n, const PairDataPoint *pd_ptr) { // Set the current observation threshold pct_info.othresh = conf_info.ocat_ta[i]; + // Aggregate new point data with input PCT counts + if(aggr_file.nonempty()) { + + // Compute the probabilistic counts + compute_pctinfo(*pd_ptr, false, pct_info); + + // Read the PCT data to be aggregated + Nx2ContingencyTable aggr_pct; + read_aggr_pct(n, pct_info, aggr_pct); + + // Aggregate past PCT counts with new ones + pct_info.pct += aggr_pct; + + // Compute statistics and confidence intervals + pct_info.compute_stats(); + pct_info.compute_ci(); + + } // Compute the probabilistic counts and statistics - compute_pctinfo(*pd_ptr, true, pct_info); + else { + compute_pctinfo(*pd_ptr, true, pct_info); + } // Add statistic value for each possible PCT column for(j=0; j Date: Fri, 9 Aug 2024 09:49:35 -0600 Subject: [PATCH 11/41] Merging changes from develop --- docs/Users_Guide/appendixC.rst | 74 +++++---- docs/Users_Guide/point-stat.rst | 34 ++-- src/libcode/vx_stat_out/stat_columns.cc | 4 +- src/libcode/vx_statistics/compute_stats.cc | 8 +- src/libcode/vx_statistics/met_stats.cc | 23 +-- src/libcode/vx_statistics/met_stats.h | 5 +- .../core/series_analysis/series_analysis.cc | 154 +++++++++++++----- .../core/stat_analysis/aggr_stat_line.cc | 66 ++++---- .../core/stat_analysis/parse_stat_line.cc | 4 +- 9 files changed, 232 insertions(+), 140 deletions(-) diff --git a/docs/Users_Guide/appendixC.rst b/docs/Users_Guide/appendixC.rst index 15c3ab5c2d..a6bbb0fe51 100644 --- a/docs/Users_Guide/appendixC.rst +++ b/docs/Users_Guide/appendixC.rst @@ -616,23 +616,23 @@ Anomaly Correlation Coefficient Called "ANOM_CORR" and "ANOM_CORR_UNCNTR" for centered and uncentered versions in CNT output :numref:`table_PS_format_info_CNT` -The anomaly correlation coefficient is equivalent to the Pearson correlation coefficient, except that both the forecasts and observations are first adjusted according to a climatology value. The anomaly is the difference between the individual forecast or observation and the typical situation, as measured by a climatology (**c**) of some variety. It measures the strength of linear association between the forecast anomalies and observed anomalies. The anomaly correlation coefficient is defined as: +The anomaly correlation coefficient is equivalent to the Pearson correlation coefficient, except that both the forecasts and observations are first adjusted by subtracting their corresponding climatology value. The anomaly is the difference between the individual forecast or observation and the typical situation, as measured by a forecast climatology (:math:`c_f`) and observation climatology (:math:`c_o`). It measures the strength of linear association between the forecast anomalies and observed anomalies. The anomaly correlation coefficient is defined as: -.. math:: \text{Anomaly Correlation} = \frac{\sum(f_i - c)(o_i - c)}{\sqrt{\sum(f_i - c)^2} \sqrt{\sum(o_i -c)^2}} . +.. math:: \text{Anomaly Correlation} = \frac{\sum(f_i - {c_f}_i)(o_i - {c_o}_i)}{\sqrt{\sum(f_i - {c_f}_i)^2} \sqrt{\sum(o_i - {c_o}_i)^2}} . The centered anomaly correlation coefficient (ANOM_CORR) which includes the mean error is defined as: .. only:: latex - .. math:: \text{ANOM\_CORR } = \frac{ \overline{[(f - c) - \overline{(f - c)}][(a - c) - \overline{(a - c)}]}}{ \sqrt{ \overline{( (f - c) - \overline{(f - c)})^2} \overline{( (a - c) - \overline{(a - c)})^2}}} + .. math:: \text{ANOM\_CORR } = \frac{ \overline{[(f - c_f) - \overline{(f - c_f)}][(o - c_o) - \overline{(o - c_o)}]}}{ \sqrt{ \overline{( (f - c_f) - \overline{(f - c_f)})^2} \overline{( (o - c_o) - \overline{(o - c_o)})^2}}} .. only:: html - .. math:: \text{ANOM_CORR } = \frac{ \overline{[(f - c) - \overline{(f - c)}][(a - c) - \overline{(a - c)}]}}{ \sqrt{ \overline{( (f - c) - \overline{(f - c)})^2} \overline{( (a - c) - \overline{(a - c)})^2}}} + .. math:: \text{ANOM_CORR } = \frac{ \overline{[(f - c_f) - \overline{(f - c_f)}][(o - c_o) - \overline{(o - c_o)}]}}{ \sqrt{ \overline{( (f - c_f) - \overline{(f - c_f)})^2} \overline{( (o - c_o) - \overline{(o - c_o)})^2}}} The uncentered anomaly correlation coefficient (ANOM_CORR_UNCNTR) which does not include the mean errors is defined as: -.. math:: \text{Anomaly Correlation Raw } = \frac{ \overline{(f - c)(a - c)}}{ \sqrt{\overline{(f - c)^2} \overline{(a - c)^2}}} +.. math:: \text{Anomaly Correlation Raw } = \frac{ \overline{(f - c_f)(o - c_o)}}{ \sqrt{\overline{(f - c_f)^2} \overline{(o - c_o)^2}}} Anomaly correlation can range between -1 and 1; a value of 1 indicates perfect correlation and a value of -1 indicates perfect negative correlation. A value of 0 indicates that the forecast and observed anomalies are not correlated. @@ -650,56 +650,60 @@ The partial sums can be accumulated over individual cases to produce statistics Scalar L1 and L2 Values ----------------------- -Called "FBAR", "OBAR", "FOBAR", "FFBAR", and "OOBAR" in SL1L2 output :numref:`table_PS_format_info_SL1L2` +Called "FBAR", "OBAR", "FOBAR", "FFBAR", "OOBAR", and "MAE" in SL1L2 output :numref:`table_PS_format_info_SL1L2` These statistics are simply the 1st and 2nd moments of the forecasts, observations and errors: .. math:: - \text{FBAR} = \text{Mean}(f) = \bar{f} = \frac{1}{n} \sum_{i=1}^n f_i + \text{FBAR} = \text{Mean}(f) = \frac{1}{n} \sum_{i=1}^n f_i - \text{OBAR} = \text{Mean}(o) = \bar{o} = \frac{1}{n} \sum_{i=1}^n o_i + \text{OBAR} = \text{Mean}(o) = \frac{1}{n} \sum_{i=1}^n o_i - \text{FOBAR} = \text{Mean}(fo) = \bar{fo} = \frac{1}{n} \sum_{i=1}^n f_i o_i + \text{FOBAR} = \text{Mean}(fo) = \frac{1}{n} \sum_{i=1}^n f_i o_i - \text{FFBAR} = \text{Mean}(f^2) = \bar{f}^2 = \frac{1}{n} \sum_{i=1}^n f_i^2 + \text{FFBAR} = \text{Mean}(f^2) = \frac{1}{n} \sum_{i=1}^n f_i^2 - \text{OOBAR} = \text{Mean}(o^2) = \bar{o}^2 = \frac{1}{n} \sum_{i=1}^n o_i^2 + \text{OOBAR} = \text{Mean}(o^2) = \frac{1}{n} \sum_{i=1}^n o_i^2 + + \text{MAE} = \text{Mean}(|f - o|) = \frac{1}{n} \sum_{i=1}^n |f_i - o_i| Some of the other statistics for continuous forecasts (e.g., RMSE) can be derived from these moments. Scalar Anomaly L1 and L2 Values ------------------------------- -Called "FABAR", "OABAR", "FOABAR", "FFABAR", "OOABAR" in SAL1L2 output :numref:`table_PS_format_info_SAL1L2` +Called "FABAR", "OABAR", "FOABAR", "FFABAR", "OOABAR", and "MAE" in SAL1L2 output :numref:`table_PS_format_info_SAL1L2` -Computation of these statistics requires a climatological value, c. These statistics are the 1st and 2nd moments of the scalar anomalies. The moments are defined as: +Computation of these statistics requires climatological values, where :math:`c_f` is the forecast climatology value and :math:`c_o` is the observation climatology value. These statistics are the 1st and 2nd moments of the scalar anomalies. The moments are defined as: .. math:: - \text{FABAR} = \text{Mean}(f - c) = \bar{f - c} = \frac{1}{n} \sum_{i=1}^n (f_i - c) + \text{FABAR} = \text{Mean}(f - c_f) = \frac{1}{n} \sum_{i=1}^n (f_i - {c_f}_i) + + \text{OABAR} = \text{Mean}(o - c_o) = \frac{1}{n} \sum_{i=1}^n (o_i - {c_o}_i) - \text{OABAR} = \text{Mean}(o - c) = \bar{o - c} = \frac{1}{n} \sum_{i=1}^n (o_i - c) + \text{FOABAR} = \text{Mean}[(f - c_f)(o - c_o)] = \frac{1}{n} \sum_{i=1}^n (f_i - {c_f}_i)(o_i - {c_o}_i) - \text{FOABAR} = \text{Mean}[(f - c)(o - c)] = \bar{(f - c)(o - c)} = \frac{1}{n} \sum_{i=1}^n (f_i - c)(o_i - c) + \text{FFABAR} = \text{Mean}[(f - c_f)^2] = \frac{1}{n} \sum_{i=1}^n (f_i - {c_f}_i)^2 - \text{FFABAR} = \text{Mean}[(f - c)^2] = \bar{(f - c)}^2 = \frac{1}{n} \sum_{i=1}^n (f_i - c)^2 + \text{OOABAR} = \text{Mean}[(o - c_o)^2] = \frac{1}{n} \sum_{i=1}^n (o_i - {c_o}_i)^2 - \text{OOABAR} = \text{Mean}[(o - c)^2] = \bar{(o - c)}^2 = \frac{1}{n} \sum_{i=1}^n (o_i - c)^2 + \text{MAE} = \text{Mean}(|(f - c_f) - (o - c_o)|) = \frac{1}{n} \sum_{i=1}^n |(f_i - {c_f}_i) - (o_i - {c_o}_i)| Vector L1 and L2 Values ----------------------- -Called "UFBAR", "VFBAR", "UOBAR", "VOBAR", "UVFOBAR", "UVFFBAR", "UVOOBAR" in VL1L2 output :numref:`table_PS_format_info_VL1L2` +Called "UFBAR", "VFBAR", "UOBAR", "VOBAR", "UVFOBAR", "UVFFBAR", and "UVOOBAR" in VL1L2 output :numref:`table_PS_format_info_VL1L2` -These statistics are the moments for wind vector values, where **u** is the E-W wind component and **v** is the N-S wind component ( :math:`u_f` is the forecast E-W wind component; :math:`u_o` is the observed E-W wind component; :math:`v_f` is the forecast N-S wind component; and :math:`v_o` is the observed N-S wind component). The following measures are computed: +These statistics are the moments for wind vector values, where :math:`u` is the E-W wind component and :math:`v` is the N-S wind component ( :math:`u_f` is the forecast E-W wind component; :math:`u_o` is the observed E-W wind component; :math:`v_f` is the forecast N-S wind component; and :math:`v_o` is the observed N-S wind component). The following measures are computed: .. math:: - \text{UFBAR} = \text{Mean}(u_f) = \bar{u}_f = \frac{1}{n} \sum_{i=1}^n u_{fi} + \text{UFBAR} = \text{Mean}(u_f) = \frac{1}{n} \sum_{i=1}^n u_{fi} - \text{VFBAR} = \text{Mean}(v_f) = \bar{v}_f = \frac{1}{n} \sum_{i=1}^n v_{fi} + \text{VFBAR} = \text{Mean}(v_f) = \frac{1}{n} \sum_{i=1}^n v_{fi} - \text{UOBAR} = \text{Mean}(u_o) = \bar{u}_o = \frac{1}{n} \sum_{i=1}^n u_{oi} + \text{UOBAR} = \text{Mean}(u_o) = \frac{1}{n} \sum_{i=1}^n u_{oi} - \text{VOBAR} = \text{Mean}(v_o) = \bar{v}_o = \frac{1}{n} \sum_{i=1}^n v_{oi} + \text{VOBAR} = \text{Mean}(v_o) = \frac{1}{n} \sum_{i=1}^n v_{oi} \text{UVFOBAR} = \text{Mean}(u_f u_o + v_f v_o) = \frac{1}{n} \sum_{i=1}^n (u_{fi} u_{oi} + v_{fi} v_{oi}) @@ -710,25 +714,27 @@ These statistics are the moments for wind vector values, where **u** is the E-W Vector Anomaly L1 and L2 Values ------------------------------- -Called "UFABAR", "VFABAR", "UOABAR", "VOABAR", "UVFOABAR", "UVFFABAR", "UVOOABAR" in VAL1L2 output :numref:`table_PS_format_info_VAL1L2` +Called "UFABAR", "VFABAR", "UOABAR", "VOABAR", "UVFOABAR", "UVFFABAR", and "UVOOABAR" in VAL1L2 output :numref:`table_PS_format_info_VAL1L2` -These statistics require climatological values for the wind vector components, :math:`u_c \text{ and } v_c`. The measures are defined below: +These statistics require climatological values for the wind vector components, where :math:`{u_c}_f` and :math:`{v_c}_f` are the forecast climatology vectors and :math:`{u_c}_o` and :math:`{v_c}_o` are the observation climatology vectors. The measures are defined below: .. math:: - \text{UFABAR} = \text{Mean}(u_f - u_c) = \frac{1}{n} \sum_{i=1}^n (u_{fi} - u_c) + \text{UFABAR} = \text{Mean}(u_f - {u_c}_f) = \frac{1}{n} \sum_{i=1}^n ({u_f}_i - {{u_c}_f}_i) - \text{VFBAR} = \text{Mean}(v_f - v_c) = \frac{1}{n} \sum_{i=1}^n (v_{fi} - v_c) + \text{VFBAR} = \text{Mean}(v_f - {v_c}_f) = \frac{1}{n} \sum_{i=1}^n ({v_f}_i - {{v_c}_f}_i) - \text{UOABAR} = \text{Mean}(u_o - u_c) = \frac{1}{n} \sum_{i=1}^n (u_{oi} - u_c) + \text{UOABAR} = \text{Mean}(u_o - {u_c}_o) = \frac{1}{n} \sum_{i=1}^n ({u_o}_i - {{u_c}_o}_i) - \text{VOABAR} = \text{Mean}(v_o - v_c) = \frac{1}{n} \sum_{i=1}^n (v_{oi} - v_c) + \text{VOABAR} = \text{Mean}(v_o - {v_c}_o) = \frac{1}{n} \sum_{i=1}^n ({v_o}_i - {{v_c}_o}_i) - \text{UVFOABAR} &= \text{Mean}[(u_f - u_c)(u_o - u_c) + (v_f - v_c)(v_o - v_c)] \\ - &= \frac{1}{n} \sum_{i=1}^n (u_{fi} - u_c) + (u_{oi} - u_c) + (v_{fi} - v_c)(v_{oi} - v_c) + \text{UVFOABAR} &= \text{Mean}[(u_f - {u_c}_f)(u_o - {u_c}_o) + (v_f - {v_c}_f)(v_o - {v_c}_o)] \\ + &= \frac{1}{n} \sum_{i=1}^n ({u_f}_i - {{u_c}_f}_i) + ({u_o}_i - {{u_c}_o}_i) + ({v_f}_i - {{v_c}_f}_i)({v_o}_i - {{v_c}_o}_i) - \text{UVFFABAR} = \text{Mean}[(u_f - u_c)^2 + (v_f - v_c)^2] = \frac{1}{n} \sum_{i=1}^n ((u_{fi} - u_c)^2 + (v_{fi} - v_c)^2) + \text{UVFFABAR} &= \text{Mean}[(u_f - {u_c}_f)^2 + (v_f - {v_c}_f)^2] \\ + &= \frac{1}{n} \sum_{i=1}^n (({u_f}_i - {{u_c}_f}_i)^2 + ({v_f}_i - {{v_c}_f}_i)^2) - \text{UVOOABAR} = \text{Mean}[(u_o - u_c)^2 + (v_o - v_c)^2] = \frac{1}{n} \sum_{i=1}^n ((u_{oi} - u_c)^2 + (v_{oi} - v_c)^2) + \text{UVOOABAR} &= \text{Mean}[(u_o - {u_c}_o)^2 + (v_o - {v_c}_o)^2] \\ + &= \frac{1}{n} \sum_{i=1}^n (({u_o}_i - {{u_c}_o}_i)^2 + ({v_o}_i - {{v_c}_o}_i)^2) Gradient Values --------------- diff --git a/docs/Users_Guide/point-stat.rst b/docs/Users_Guide/point-stat.rst index 6c9849511e..70e3847b79 100644 --- a/docs/Users_Guide/point-stat.rst +++ b/docs/Users_Guide/point-stat.rst @@ -1204,7 +1204,7 @@ The first set of header columns are common to all of the output files generated - Mean(o²) * - 31 - MAE - - Mean Absolute Error + - Mean(\|f-o\|) .. _table_PS_format_info_SAL1L2: @@ -1223,25 +1223,25 @@ The first set of header columns are common to all of the output files generated - Scalar Anomaly L1L2 line type * - 25 - TOTAL - - Total number of matched triplets of forecast (f), observation (o), and climatological value (c) + - Total number of matched pairs of forecast (f), observation (o), forecast climatology (cf), and observation climatology (co) * - 26 - FABAR - - Mean(f-c) + - Mean(f-cf) * - 27 - OABAR - - Mean(o-c) + - Mean(o-co) * - 28 - FOABAR - - Mean((f-c)*(o-c)) + - Mean((f-cf)*(o-co)) * - 29 - FFABAR - - Mean((f-c)²) + - Mean((f-cf)²) * - 30 - OOABAR - - Mean((o-c)²) + - Mean((o-co)²) * - 31 - MAE - - Mean Absolute Error + - Mean(\|(f-cf)-(o-co)\|) .. _table_PS_format_info_VL1L2: @@ -1318,28 +1318,28 @@ The first set of header columns are common to all of the output files generated - Vector Anomaly L1L2 line type * - 25 - TOTAL - - Total number of matched triplets of forecast winds (uf, vf), observation winds (uo, vo), and climatological winds (uc, vc) + - Total number of matched pairs of forecast winds (uf, vf), observation winds (uo, vo), forecast climatology winds (ucf, vcf), and observation climatology winds (uco, vco) * - 26 - UFABAR - - Mean(uf-uc) + - Mean(uf-ucf) * - 27 - VFABAR - - Mean(vf-vc) + - Mean(vf-vcf) * - 28 - UOABAR - - Mean(uo-uc) + - Mean(uo-uco) * - 29 - VOABAR - - Mean(vo-vc) + - Mean(vo-vco) * - 30 - UVFOABAR - - Mean((uf-uc)*(uo-uc)+(vf-vc)*(vo-vc)) + - Mean((uf-ucf)*(uo-uco)+(vf-vcf)*(vo-vco)) * - 31 - UVFFABAR - - Mean((uf-uc)²+(vf-vc)²) + - Mean((uf-ucf)²+(vf-vcf)²) * - 32 - UVOOABAR - - Mean((uo-uc)²+(vo-vc)²) + - Mean((uo-uco)²+(vo-vco)²) * - 33 - FA_SPEED_BAR - Mean forecast wind speed anomaly @@ -1348,7 +1348,7 @@ The first set of header columns are common to all of the output files generated - Mean observed wind speed anomaly * - 35 - TOTAL_DIR - - Total number of matched triplets for which the forecast, observation, and climatological wind directions are well-defined (i.e. non-zero vectors) + - Total number of matched pairs for which the forecast, observation, forecast climatology, and observation climatology wind directions are well-defined (i.e. non-zero vectors) * - 36 - DIRA_ME - Mean wind direction anomaly difference, from -180 to 180 degrees diff --git a/src/libcode/vx_stat_out/stat_columns.cc b/src/libcode/vx_stat_out/stat_columns.cc index 314352a4cf..389665f177 100644 --- a/src/libcode/vx_stat_out/stat_columns.cc +++ b/src/libcode/vx_stat_out/stat_columns.cc @@ -2949,7 +2949,7 @@ void write_sl1l2_cols(const SL1L2Info &sl1l2_info, sl1l2_info.oobar); at.set_entry(r, c+6, // MAE - sl1l2_info.mae); + sl1l2_info.smae); return; } @@ -2985,7 +2985,7 @@ void write_sal1l2_cols(const SL1L2Info &sl1l2_info, sl1l2_info.ooabar); at.set_entry(r, c+6, // MAE - sl1l2_info.mae); + sl1l2_info.samae); return; } diff --git a/src/libcode/vx_statistics/compute_stats.cc b/src/libcode/vx_statistics/compute_stats.cc index 40c4e82589..bbc9e0ac1a 100644 --- a/src/libcode/vx_statistics/compute_stats.cc +++ b/src/libcode/vx_statistics/compute_stats.cc @@ -101,7 +101,7 @@ void compute_cntinfo(const SL1L2Info &s, bool aflag, CNTInfo &cnt_info) { cnt_info.me2.v = cnt_info.me.v * cnt_info.me.v; // Compute mean absolute error - cnt_info.mae.v = s.mae; + cnt_info.mae.v = s.smae; // Compute mean squared error cnt_info.mse.v = ffbar + oobar - 2.0*fobar; @@ -1111,7 +1111,7 @@ void compute_sl1l2_mean(const SL1L2Info *sl1l2_info, int n, sl1l2_mean.obar += sl1l2_info[i].obar; sl1l2_mean.ffbar += sl1l2_info[i].ffbar; sl1l2_mean.oobar += sl1l2_info[i].oobar; - sl1l2_mean.mae += sl1l2_info[i].mae; + sl1l2_mean.smae += sl1l2_info[i].smae; } if(sl1l2_info[i].sacount > 0) { @@ -1121,6 +1121,7 @@ void compute_sl1l2_mean(const SL1L2Info *sl1l2_info, int n, sl1l2_mean.oabar += sl1l2_info[i].oabar; sl1l2_mean.ffabar += sl1l2_info[i].ffabar; sl1l2_mean.ooabar += sl1l2_info[i].ooabar; + sl1l2_mean.samae += sl1l2_info[i].samae; } } // end for i @@ -1130,13 +1131,14 @@ void compute_sl1l2_mean(const SL1L2Info *sl1l2_info, int n, sl1l2_mean.obar /= n_sl1l2; sl1l2_mean.ffbar /= n_sl1l2; sl1l2_mean.oobar /= n_sl1l2; - sl1l2_mean.mae /= n_sl1l2; + sl1l2_mean.smae /= n_sl1l2; } if(sl1l2_mean.sacount > 0) { sl1l2_mean.fabar /= n_sal1l2; sl1l2_mean.oabar /= n_sal1l2; sl1l2_mean.ffabar /= n_sal1l2; sl1l2_mean.ooabar /= n_sal1l2; + sl1l2_mean.samae /= n_sal1l2; } return; diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index 9312867e49..4c679aed83 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -1124,11 +1124,11 @@ SL1L2Info & SL1L2Info::operator+=(const SL1L2Info &c) { s_info.ffbar = (ffbar*scount + c.ffbar*c.scount)/s_info.scount; s_info.oobar = (oobar*scount + c.oobar*c.scount)/s_info.scount; - if(is_bad_data(mae) || is_bad_data(c.mae)) { - s_info.mae = bad_data_double; + if(is_bad_data(smae) || is_bad_data(c.smae)) { + s_info.smae = bad_data_double; } else { - s_info.mae = (mae*scount + c.mae*c.scount)/s_info.scount; + s_info.smae = (smae*scount + c.smae*c.scount)/s_info.scount; } } @@ -1141,11 +1141,11 @@ SL1L2Info & SL1L2Info::operator+=(const SL1L2Info &c) { s_info.ffabar = (ffabar*sacount + c.ffabar*c.sacount)/s_info.sacount; s_info.ooabar = (ooabar*sacount + c.ooabar*c.sacount)/s_info.sacount; - if(is_bad_data(mae) || is_bad_data(c.mae)) { - s_info.mae = bad_data_double; + if(is_bad_data(samae) || is_bad_data(c.samae)) { + s_info.samae = bad_data_double; } else { - s_info.mae = (mae*sacount + c.mae*c.sacount)/s_info.sacount; + s_info.samae = (samae*sacount + c.samae*c.sacount)/s_info.sacount; } } @@ -1170,15 +1170,15 @@ void SL1L2Info::zero_out() { // SL1L2 Quantities fbar = obar = 0.0; fobar = ffbar = oobar = 0.0; + smae = 0.0; scount = 0; // SAL1L2 Quantities fabar = oabar = 0.0; foabar = ffabar = ooabar = 0.0; + samae = 0.0; sacount = 0; - mae = 0.0; - return; } @@ -1211,6 +1211,7 @@ void SL1L2Info::assign(const SL1L2Info &c) { fobar = c.fobar; ffbar = c.ffbar; oobar = c.oobar; + smae = c.smae; scount = c.scount; // SAL1L2 Quantities @@ -1219,10 +1220,9 @@ void SL1L2Info::assign(const SL1L2Info &c) { foabar = c.foabar; ffabar = c.ffabar; ooabar = c.ooabar; + samae = c.samae; sacount = c.sacount; - mae = c.mae; - return; } @@ -1272,7 +1272,7 @@ void SL1L2Info::set(const PairDataPoint &pd_all) { fobar += wgt*f*o; ffbar += wgt*f*f; oobar += wgt*o*o; - mae += wgt*fabs(f-o); + smae += wgt*fabs(f-o); scount++; // SAL1L2 sums @@ -1282,6 +1282,7 @@ void SL1L2Info::set(const PairDataPoint &pd_all) { foabar += wgt*(f-fc)*(o-oc); ffabar += wgt*(f-fc)*(f-fc); ooabar += wgt*(o-oc)*(o-oc); + samae += wgt*fabs((f-fc)-(o-oc)); sacount++; } } diff --git a/src/libcode/vx_statistics/met_stats.h b/src/libcode/vx_statistics/met_stats.h index b053266c33..f3bef1a90c 100644 --- a/src/libcode/vx_statistics/met_stats.h +++ b/src/libcode/vx_statistics/met_stats.h @@ -224,17 +224,16 @@ class SL1L2Info { double fbar, obar; double fobar; double ffbar, oobar; + double smae; int scount; // SAL1L2 Quantities double fabar, oabar; double foabar; double ffabar, ooabar; + double samae; int sacount; - // Mean absolute error - double mae; - // Compute sums void set(const PairDataPoint &); diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index c9cec58495..470a8589f6 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -93,27 +93,32 @@ static void do_sl1l2 (int, const PairDataPoint *); static void do_pct (int, const PairDataPoint *); // TODO: MET #1371 need logic to aggregate SL1L2, SAL1L2, and CNT (?) - -static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &); -static void read_aggr_mctc (int, const MCTSInfo &, ContingencyTable &); -static void read_aggr_pct (int, const PCTInfo &, Nx2ContingencyTable &); - -static void store_stat_fho (int, const ConcatString &, const CTSInfo &); -static void store_stat_ctc (int, const ConcatString &, const CTSInfo &); -static void store_stat_cts (int, const ConcatString &, const CTSInfo &); -static void store_stat_mctc (int, const ConcatString &, const MCTSInfo &); -static void store_stat_mcts (int, const ConcatString &, const MCTSInfo &); -static void store_stat_cnt (int, const ConcatString &, const CNTInfo &); -static void store_stat_sl1l2(int, const ConcatString &, const SL1L2Info &); -static void store_stat_pct (int, const ConcatString &, const PCTInfo &); -static void store_stat_pstd (int, const ConcatString &, const PCTInfo &); -static void store_stat_pjc (int, const ConcatString &, const PCTInfo &); -static void store_stat_prc (int, const ConcatString &, const PCTInfo &); - -static void store_stat_all_ctc (int, const CTSInfo &); -static void store_stat_all_mctc (int, const MCTSInfo &); -static void store_stat_all_sl1l2(int, const SL1L2Info &); -static void store_stat_all_pct (int, const PCTInfo &); +// Add a PCT aggregation logic test + +static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &); +static void read_aggr_mctc (int, const MCTSInfo &, ContingencyTable &); +static void read_aggr_pct (int, const PCTInfo &, Nx2ContingencyTable &); +static void read_aggr_sl1l2 (int, const SL1L2Info &, SL1L2Info &); // JHG +static void read_aggr_sal1l2 (int, const SL1L2Info &, SL1L2Info &); // JHG + +static void store_stat_fho (int, const ConcatString &, const CTSInfo &); +static void store_stat_ctc (int, const ConcatString &, const CTSInfo &); +static void store_stat_cts (int, const ConcatString &, const CTSInfo &); +static void store_stat_mctc (int, const ConcatString &, const MCTSInfo &); +static void store_stat_mcts (int, const ConcatString &, const MCTSInfo &); +static void store_stat_cnt (int, const ConcatString &, const CNTInfo &); +static void store_stat_sl1l2 (int, const ConcatString &, const SL1L2Info &); +static void store_stat_sal1l2(int, const ConcatString &, const SL1L2Info &); +static void store_stat_pct (int, const ConcatString &, const PCTInfo &); +static void store_stat_pstd (int, const ConcatString &, const PCTInfo &); +static void store_stat_pjc (int, const ConcatString &, const PCTInfo &); +static void store_stat_prc (int, const ConcatString &, const PCTInfo &); + +static void store_stat_all_ctc (int, const CTSInfo &); +static void store_stat_all_mctc (int, const MCTSInfo &); +static void store_stat_all_sl1l2 (int, const SL1L2Info &); +static void store_stat_all_sal1l2(int, const SL1L2Info &); +static void store_stat_all_pct (int, const PCTInfo &); static ConcatString build_nc_var_name_ctc(const ConcatString &, const CTSInfo &); @@ -956,13 +961,13 @@ void process_scores() { // Compute continuous statistics if(!conf_info.fcst_info[0]->is_prob() && conf_info.output_stats[STATLineType::cnt].n() > 0) { - do_cnt(i_point+i, &pd_ptr[i]); + do_cnt(i_point+i, &pd_ptr[i]); // JHG work on me } // Compute partial sums if(!conf_info.fcst_info[0]->is_prob() && - (conf_info.output_stats[STATLineType::sl1l2].n() > 0 || - conf_info.output_stats[STATLineType::sal1l2].n() > 0)) { + (conf_info.output_stats[STATLineType::sl1l2].n() + + conf_info.output_stats[STATLineType::sal1l2].n()) > 0) { do_sl1l2(i_point+i, &pd_ptr[i]); } @@ -1276,7 +1281,12 @@ void do_sl1l2(int n, const PairDataPoint *pd_ptr) { for(j=0; j " << "unsupported column name requested \"" << c @@ -2133,6 +2144,65 @@ void store_stat_sl1l2(int n, const ConcatString &col, //////////////////////////////////////////////////////////////////////// +void store_stat_sal1l2(int n, const ConcatString &col, + const SL1L2Info &s_info) { + double v; + + // Set the column name to all upper case + ConcatString c = to_upper(col); + + // Handle ALL columns + if(c == all_columns) return store_stat_all_sal1l2(n, s_info); + + // Get the column value + if(c == "TOTAL") { v = (double) s_info.sacount; } + else if(c == "FABAR") { v = s_info.fabar; } + else if(c == "OABAR") { v = s_info.oabar; } + else if(c == "FOABAR") { v = s_info.foabar; } + else if(c == "FFABAR") { v = s_info.ffabar; } + else if(c == "OOABAR") { v = s_info.ooabar; } + else if(c == "MAE") { v = s_info.samae; } + else { + mlog << Error << "\nstore_stat_sal1l2() -> " + << "unsupported column name requested \"" << c + << "\"\n\n"; + exit(1); + } + + // Construct the NetCDF variable name + ConcatString var_name("series_sal1l2_"); + var_name << c; + + // Append threshold information, if supplied + if(s_info.fthresh.get_type() != thresh_na || + s_info.othresh.get_type() != thresh_na) { + var_name << "_fcst" << s_info.fthresh.get_abbr_str() + << "_" << setlogic_to_abbr(conf_info.cnt_logic) + << "_obs" << s_info.othresh.get_abbr_str(); + } + + // Add map for this variable name + if(stat_data.count(var_name) == 0) { + + // Build key + ConcatString lty_stat("SAL1L2_"); + lty_stat << c; + + // Add new map entry + add_nc_var(var_name, c, stat_long_name[lty_stat], + s_info.fthresh.get_str(), + s_info.othresh.get_str(), + bad_data_double); + } + + // Store the statistic value + put_nc_val(n, var_name, (float) v); + + return; +} + +//////////////////////////////////////////////////////////////////////// + void store_stat_pct(int n, const ConcatString &col, const PCTInfo &pct_info) { int i = 0; @@ -2441,6 +2511,14 @@ void store_stat_all_sl1l2(int n, const SL1L2Info &s_info) { //////////////////////////////////////////////////////////////////////// +void store_stat_all_sal1l2(int n, const SL1L2Info &s_info) { + for(int i=0; i Date: Fri, 9 Aug 2024 17:30:05 -0600 Subject: [PATCH 12/41] Per #1371, work in progress aggregating all the line statistics types. Still have several issues to address --- src/libcode/vx_statistics/met_stats.cc | 62 +++ src/libcode/vx_statistics/met_stats.h | 6 + .../core/series_analysis/series_analysis.cc | 379 +++++++++++------- 3 files changed, 313 insertions(+), 134 deletions(-) diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index 4c679aed83..a83e460f64 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -1305,6 +1305,68 @@ void SL1L2Info::set(const PairDataPoint &pd_all) { return; } +//////////////////////////////////////////////////////////////////////// + +void SL1L2Info::set_sl1l2_stat(const string &stat_name, double v) { + + if(stat_name == "TOTAL") scount = nint(v); + else if(stat_name == "FBAR" ) fbar = v; + else if(stat_name == "OBAR" ) obar = v; + else if(stat_name == "FOBAR") fobar = v; + else if(stat_name == "FFBAR") ffbar = v; + else if(stat_name == "OOBAR") oobar = v; + else if(stat_name == "MAE" ) smae = v; + + return; +} + +//////////////////////////////////////////////////////////////////////// + +void SL1L2Info::set_sal1l2_stat(const string &stat_name, double v) { + + if(stat_name == "TOTAL" ) sacount = nint(v); + else if(stat_name == "FABAR" ) fabar = v; + else if(stat_name == "OABAR" ) oabar = v; + else if(stat_name == "FOABAR") foabar = v; + else if(stat_name == "FFABAR") ffabar = v; + else if(stat_name == "OOABAR") ooabar = v; + else if(stat_name == "MAE" ) samae = v; + + return; +} + +//////////////////////////////////////////////////////////////////////// + +double SL1L2Info::get_sl1l2_stat(const string &stat_name) const { + double v = bad_data_double; + + if(stat_name == "TOTAL") v = (double) scount; + else if(stat_name == "FBAR" ) v = fbar; + else if(stat_name == "OBAR" ) v = obar; + else if(stat_name == "FOBAR") v = fobar; + else if(stat_name == "FFBAR") v = ffbar; + else if(stat_name == "OOBAR") v = oobar; + else if(stat_name == "MAE" ) v = smae; + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double SL1L2Info::get_sal1l2_stat(const string &stat_name) const { + double v = bad_data_double; + + if(stat_name == "TOTAL" ) v = (double) sacount; + else if(stat_name == "FABAR" ) v = fabar; + else if(stat_name == "OABAR" ) v = oabar; + else if(stat_name == "FOABAR") v = foabar; + else if(stat_name == "FFABAR") v = ffabar; + else if(stat_name == "OOABAR") v = ooabar; + else if(stat_name == "MAE" ) v = samae; + + return v; +} + //////////////////////////////////////////////////////////////////////// // // Code for class VL1L2Info diff --git a/src/libcode/vx_statistics/met_stats.h b/src/libcode/vx_statistics/met_stats.h index f3bef1a90c..41dddb1398 100644 --- a/src/libcode/vx_statistics/met_stats.h +++ b/src/libcode/vx_statistics/met_stats.h @@ -239,6 +239,12 @@ class SL1L2Info { void zero_out(); void clear(); + + void set_sl1l2_stat (const std::string &, double); + void set_sal1l2_stat(const std::string &, double); + + double get_sl1l2_stat (const std::string &) const; + double get_sal1l2_stat(const std::string &) const; }; //////////////////////////////////////////////////////////////////////// diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 36c1e4cb57..4920c9f87c 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -86,20 +86,25 @@ static DataPlane get_aggr_data(const ConcatString &); static void process_scores(); -static void do_cts (int, const PairDataPoint *); -static void do_mcts (int, const PairDataPoint *); -static void do_cnt (int, const PairDataPoint *); -static void do_sl1l2 (int, const PairDataPoint *); -static void do_pct (int, const PairDataPoint *); - -// TODO: MET #1371 need logic to aggregate SL1L2, SAL1L2, and CNT (?) -// Add a PCT aggregation logic test +static void do_categorical (int, const PairDataPoint *); +static void do_multicategory (int, const PairDataPoint *); +static void do_continuous (int, const PairDataPoint *); +static void do_partialsums (int, const PairDataPoint *); +static void do_probabilistic (int, const PairDataPoint *); + +// TODO: MET #1371 +// - Add a PCT aggregation logic test +// - Switch to set_stat() and get_stat() functions +// - Can briercl be aggregated as a weighted average and used for bss? +// - How should valid data thresholds be applied when reading -aggr data? +// - Currently no way to aggregate anom_corr since CNTInfo::set(sl1l2) +// doesn't support it. static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &); static void read_aggr_mctc (int, const MCTSInfo &, ContingencyTable &); +static void read_aggr_sl1l2 (int, const SL1L2Info &, SL1L2Info &); +static void read_aggr_sal1l2 (int, const SL1L2Info &, SL1L2Info &); static void read_aggr_pct (int, const PCTInfo &, Nx2ContingencyTable &); -static void read_aggr_sl1l2 (int, const SL1L2Info &, SL1L2Info &); // JHG -static void read_aggr_sal1l2 (int, const SL1L2Info &, SL1L2Info &); // JHG static void store_stat_fho (int, const ConcatString &, const CTSInfo &); static void store_stat_ctc (int, const ConcatString &, const CTSInfo &); @@ -121,6 +126,9 @@ static void store_stat_all_sal1l2(int, const SL1L2Info &); static void store_stat_all_pct (int, const PCTInfo &); static ConcatString build_nc_var_name_ctc(const ConcatString &, const CTSInfo &); +static ConcatString build_nc_var_name_sl1l2(const ConcatString &, const SL1L2Info &); +static ConcatString build_nc_var_name_sal1l2(const ConcatString &, const SL1L2Info &); +static ConcatString build_nc_var_name_cnt(const ConcatString &, const CNTInfo &); static void setup_nc_file(const VarInfo *, const VarInfo *); static void add_nc_var(const ConcatString &, const ConcatString &, @@ -704,9 +712,6 @@ bool read_single_entry(VarInfo *info, const ConcatString &cur_file, return found; } -// TODO MET #1371 figure out how valid data thresholds should be handled -// when reading -aggr data - //////////////////////////////////////////////////////////////////////// DataPlane get_aggr_data(const ConcatString &var_name) { @@ -794,7 +799,6 @@ void process_scores() { int i_point = 0; VarInfo *fcst_info = (VarInfo *) nullptr; VarInfo *obs_info = (VarInfo *) nullptr; - PairDataPoint *pd_ptr = (PairDataPoint *) nullptr; DataPlane fcst_dp, obs_dp; const char *method_name = "process_scores() "; @@ -806,9 +810,20 @@ void process_scores() { int n_skip_zero = 0; int n_skip_pos = 0; + // Create a vector of PairDataPoint objects + vector pd_block; + pd_block.resize(conf_info.block_size); + for(auto &x : pd_block) x.extend(n_series); + // Loop over the data reads for(int i_read=0; i_read 0) { - do_cts(i_point+i, &pd_ptr[i]); + do_categorical(i_point+i, &pd_block[i]); } // Compute multi-category contingency table counts and statistics if(!conf_info.fcst_info[0]->is_prob() && (conf_info.output_stats[STATLineType::mctc].n() + conf_info.output_stats[STATLineType::mcts].n()) > 0) { - do_mcts(i_point+i, &pd_ptr[i]); + do_multicategory(i_point+i, &pd_block[i]); } // Compute continuous statistics if(!conf_info.fcst_info[0]->is_prob() && conf_info.output_stats[STATLineType::cnt].n() > 0) { - do_cnt(i_point+i, &pd_ptr[i]); // JHG work on me + do_continuous(i_point+i, &pd_block[i]); } // Compute partial sums if(!conf_info.fcst_info[0]->is_prob() && (conf_info.output_stats[STATLineType::sl1l2].n() + conf_info.output_stats[STATLineType::sal1l2].n()) > 0) { - do_sl1l2(i_point+i, &pd_ptr[i]); + do_partialsums(i_point+i, &pd_block[i]); } // Compute probabilistics counts and statistics @@ -977,20 +975,10 @@ void process_scores() { conf_info.output_stats[STATLineType::pstd].n() + conf_info.output_stats[STATLineType::pjc].n() + conf_info.output_stats[STATLineType::prc].n()) > 0) { - do_pct(i_point+i, &pd_ptr[i]); + do_probabilistic(i_point+i, &pd_block[i]); } } // end for i - // Erase the data - for(i=0; isubset_pairs_cnt_thresh(cnt_info.fthresh, cnt_info.othresh, - cnt_info.logic); + // Aggregate input pair data with existing partial sums + if(aggr_file.nonempty()) { - // Check for no matched pairs to process - if(pd.n_obs == 0) continue; + // Compute partial sums from the pair data + SL1L2Info s_info; + s_info.fthresh = cnt_info.fthresh; + s_info.othresh = cnt_info.othresh; + s_info.logic = cnt_info.logic; + s_info.set(*pd_ptr); - // Compute the stats, normal confidence intervals, and - // bootstrap confidence intervals - int precip_flag = (conf_info.fcst_info[0]->is_precipitation() && - conf_info.obs_info[0]->is_precipitation()); + // Aggregate partial sums + SL1L2Info aggr_psum; + read_aggr_sl1l2(n, s_info, aggr_psum); + s_info += aggr_psum; - if(conf_info.boot_interval == BootIntervalType::BCA) { - compute_cnt_stats_ci_bca(rng_ptr, pd, - precip_flag, conf_info.rank_corr_flag, - conf_info.n_boot_rep, - cnt_info, conf_info.tmp_dir.c_str()); + // Compute continuous statistics from partial sums + compute_cntinfo(s_info, false, cnt_info); } + // Compute continuous statistics from the pair data else { - compute_cnt_stats_ci_perc(rng_ptr, pd, - precip_flag, conf_info.rank_corr_flag, - conf_info.n_boot_rep, conf_info.boot_rep_prop, - cnt_info, conf_info.tmp_dir.c_str()); + + // Apply continuous filtering thresholds to subset pairs + pd = pd_ptr->subset_pairs_cnt_thresh(cnt_info.fthresh, cnt_info.othresh, + cnt_info.logic); + + // Check for no matched pairs to process + if(pd.n_obs == 0) continue; + + // Compute the stats, normal confidence intervals, and + // bootstrap confidence intervals + int precip_flag = (conf_info.fcst_info[0]->is_precipitation() && + conf_info.obs_info[0]->is_precipitation()); + + if(conf_info.boot_interval == BootIntervalType::BCA) { + compute_cnt_stats_ci_bca(rng_ptr, pd, + precip_flag, conf_info.rank_corr_flag, + conf_info.n_boot_rep, + cnt_info, conf_info.tmp_dir.c_str()); + } + else { + compute_cnt_stats_ci_perc(rng_ptr, pd, + precip_flag, conf_info.rank_corr_flag, + conf_info.n_boot_rep, conf_info.boot_rep_prop, + cnt_info, conf_info.tmp_dir.c_str()); + } } // Add statistic value for each possible CNT column @@ -1260,16 +1266,15 @@ void do_cnt(int n, const PairDataPoint *pd_ptr) { //////////////////////////////////////////////////////////////////////// -void do_sl1l2(int n, const PairDataPoint *pd_ptr) { - int i, j; - SL1L2Info s_info; +void do_partialsums(int n, const PairDataPoint *pd_ptr) { mlog << Debug(4) << "Computing Scalar Partial Sums.\n"; - // Loop over the continuous thresholds and compute scalar partial sums - for(i=0; i 0) { + read_aggr_sl1l2(n, s_info, aggr_psum); + s_info += aggr_psum; + } + + // Aggregate SAL1L2 partial sums + if(conf_info.output_stats[STATLineType::sal1l2].n() > 0) { + read_aggr_sal1l2(n, s_info, aggr_psum); + s_info += aggr_psum; + } + } + // Add statistic value for each possible SL1L2 column - for(j=0; j 1) var_name << "_a" << cnt_info.alpha[i]; @@ -2088,7 +2155,6 @@ void store_stat_cnt(int n, const ConcatString &col, void store_stat_sl1l2(int n, const ConcatString &col, const SL1L2Info &s_info) { double v; - ConcatString lty_stat, var_name; // Set the column name to all upper case ConcatString c = to_upper(col); @@ -2112,21 +2178,14 @@ void store_stat_sl1l2(int n, const ConcatString &col, } // Construct the NetCDF variable name - var_name << cs_erase << "series_sl1l2_" << c; - - // Append threshold information, if supplied - if(s_info.fthresh.get_type() != thresh_na || - s_info.othresh.get_type() != thresh_na) { - var_name << "_fcst" << s_info.fthresh.get_abbr_str() - << "_" << setlogic_to_abbr(conf_info.cnt_logic) - << "_obs" << s_info.othresh.get_abbr_str(); - } + ConcatString var_name(build_nc_var_name_sl1l2(c, s_info)); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "SL1L2_" << c; + ConcatString lty_stat("SL1L2_"); + lty_stat << c; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], @@ -2169,16 +2228,7 @@ void store_stat_sal1l2(int n, const ConcatString &col, } // Construct the NetCDF variable name - ConcatString var_name("series_sal1l2_"); - var_name << c; - - // Append threshold information, if supplied - if(s_info.fthresh.get_type() != thresh_na || - s_info.othresh.get_type() != thresh_na) { - var_name << "_fcst" << s_info.fthresh.get_abbr_str() - << "_" << setlogic_to_abbr(conf_info.cnt_logic) - << "_obs" << s_info.othresh.get_abbr_str(); - } + ConcatString var_name(build_nc_var_name_sal1l2(c, s_info)); // Add map for this variable name if(stat_data.count(var_name) == 0) { @@ -2527,11 +2577,12 @@ void store_stat_all_pct(int n, const PCTInfo &pct_info) { //////////////////////////////////////////////////////////////////////// -ConcatString build_nc_var_name_ctc(const ConcatString &col, const CTSInfo &cts_info) { - ConcatString var_name; +ConcatString build_nc_var_name_ctc(const ConcatString &col, + const CTSInfo &cts_info) { // Append the column name - var_name << "series_ctc_" << col; + ConcatString var_name("series_ctc_"); + var_name << col; // Append threshold information if(cts_info.fthresh == cts_info.othresh) { @@ -2547,6 +2598,66 @@ ConcatString build_nc_var_name_ctc(const ConcatString &col, const CTSInfo &cts_i //////////////////////////////////////////////////////////////////////// +ConcatString build_nc_var_name_sl1l2(const ConcatString &col, + const SL1L2Info &s_info) { + + // Append the column name + ConcatString var_name("series_sl1l2_"); + var_name << col; + + // Append threshold information, if supplied + if(s_info.fthresh.get_type() != thresh_na || + s_info.othresh.get_type() != thresh_na) { + var_name << "_fcst" << s_info.fthresh.get_abbr_str() + << "_" << setlogic_to_abbr(s_info.logic) + << "_obs" << s_info.othresh.get_abbr_str(); + } + + return var_name; +} + +//////////////////////////////////////////////////////////////////////// + +ConcatString build_nc_var_name_sal1l2(const ConcatString &col, + const SL1L2Info &s_info) { + + // Append the column name + ConcatString var_name("series_sal1l2_"); + var_name << col; + + // Append threshold information, if supplied + if(s_info.fthresh.get_type() != thresh_na || + s_info.othresh.get_type() != thresh_na) { + var_name << "_fcst" << s_info.fthresh.get_abbr_str() + << "_" << setlogic_to_abbr(s_info.logic) + << "_obs" << s_info.othresh.get_abbr_str(); + } + + return var_name; +} + +//////////////////////////////////////////////////////////////////////// + +ConcatString build_nc_var_name_cnt(const ConcatString &col, + const CNTInfo &cnt_info) { + + // Append the column name + ConcatString var_name("series_cnt_"); + var_name << col; + + // Append threshold information, if supplied + if(cnt_info.fthresh.get_type() != thresh_na || + cnt_info.othresh.get_type() != thresh_na) { + var_name << "_fcst" << cnt_info.fthresh.get_abbr_str() + << "_" << setlogic_to_abbr(cnt_info.logic) + << "_obs" << cnt_info.othresh.get_abbr_str(); + } + + return var_name; +} + +//////////////////////////////////////////////////////////////////////// + void setup_nc_file(const VarInfo *fcst_info, const VarInfo *obs_info) { // Create a new NetCDF file and open it From ca3b2b123800ef5be3e5a7bf52850e294e34a73e Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Mon, 12 Aug 2024 17:49:58 -0600 Subject: [PATCH 13/41] Per #1371, switch to using get_stat() functions --- src/libcode/vx_statistics/met_stats.cc | 834 ++++++++++++++++-- src/libcode/vx_statistics/met_stats.h | 28 +- .../core/series_analysis/series_analysis.cc | 825 +++++------------ .../core/stat_analysis/aggr_stat_line.cc | 8 +- .../stat_analysis/skill_score_index_job.cc | 8 +- 5 files changed, 976 insertions(+), 727 deletions(-) diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index a83e460f64..6af346c014 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -425,41 +425,176 @@ void CTSInfo::compute_ci() { //////////////////////////////////////////////////////////////////////// -double CTSInfo::get_stat(const char *stat_name) { +double CTSInfo::get_stat_fho(const string &stat_name) const { double v = bad_data_double; // Find the statistic by name - if(strcmp(stat_name, "TOTAL" ) == 0) v = cts.n(); - else if(strcmp(stat_name, "BASER" ) == 0) v = cts.baser(); - else if(strcmp(stat_name, "FMEAN" ) == 0) v = cts.fmean(); - else if(strcmp(stat_name, "ACC" ) == 0) v = cts.accuracy(); - else if(strcmp(stat_name, "FBIAS" ) == 0) v = cts.fbias(); - else if(strcmp(stat_name, "PODY" ) == 0) v = cts.pod_yes(); - else if(strcmp(stat_name, "PODN" ) == 0) v = cts.pod_no(); - else if(strcmp(stat_name, "POFD" ) == 0) v = cts.pofd(); - else if(strcmp(stat_name, "FAR" ) == 0) v = cts.far(); - else if(strcmp(stat_name, "CSI" ) == 0) v = cts.csi(); - else if(strcmp(stat_name, "GSS" ) == 0) v = cts.gss(); - else if(strcmp(stat_name, "HK" ) == 0) v = cts.hk(); - else if(strcmp(stat_name, "HSS" ) == 0) v = cts.hss(); - else if(strcmp(stat_name, "HSS_EC") == 0) v = cts.gheidke_ec(cts.ec_value()); - else if(strcmp(stat_name, "ODDS" ) == 0) v = cts.odds(); - else if(strcmp(stat_name, "LODDS" ) == 0) v = cts.lodds(); - else if(strcmp(stat_name, "ORSS" ) == 0) v = cts.orss(); - else if(strcmp(stat_name, "EDS" ) == 0) v = cts.eds(); - else if(strcmp(stat_name, "SEDS" ) == 0) v = cts.seds(); - else if(strcmp(stat_name, "EDI" ) == 0) v = cts.edi(); - else if(strcmp(stat_name, "SEDI" ) == 0) v = cts.sedi(); - else if(strcmp(stat_name, "BAGSS" ) == 0) v = cts.bagss(); + if(stat_name == "TOTAL" ) v = cts.n(); + else if(stat_name == "F_RATE") v = cts.f_rate(); + else if(stat_name == "H_RATE") v = cts.h_rate(); + else if(stat_name == "O_RATE") v = cts.o_rate(); else { - mlog << Error << "\nCTSInfo::get_stat() -> " + mlog << Error << "\nCTSInfo::get_stat_fho() -> " + << "unknown categorical statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } + + // Return bad data for 0 pairs + if(cts.n() == 0 && stat_name != "TOTAL") { + v = bad_data_double; + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double CTSInfo::get_stat_ctc(const string &stat_name) const { + double v = bad_data_double; + + // Find the statistic by name + if(stat_name == "TOTAL" ) v = cts.n(); + else if(stat_name == "FY_OY" ) v = cts.fy_oy(); + else if(stat_name == "FY_ON" ) v = cts.fy_on(); + else if(stat_name == "FN_OY" ) v = cts.fn_oy(); + else if(stat_name == "FN_ON" ) v = cts.fn_on(); + else if(stat_name == "EC_VALUE") v = cts.ec_value(); + else { + mlog << Error << "\nCTSInfo::get_stat_ctc() -> " + << "unknown categorical statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } + + // Return bad data for 0 pairs + if(cts.n() == 0 && stat_name != "TOTAL") { + v = bad_data_double; + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double CTSInfo::get_stat_cts(const string &stat_name, int i_alpha) const { + double v = bad_data_double; + + // Range check alpha index + if(i_alpha >= n_alpha && is_ci_stat_name(stat_name)) { + mlog << Error << "\nCTSInfo::get_stat_cts() -> " + << "alpha index out of range (" << i_alpha << " >= " + << n_alpha << ")!\n\n"; + exit(1); + } + + // Find the statistic by name + if(stat_name == "TOTAL" ) v = (double) cts.n(); + else if(stat_name == "BASER" ) v = baser.v; + else if(stat_name == "BASER_NCL" ) v = baser.v_ncl[i_alpha]; + else if(stat_name == "BASER_NCU" ) v = baser.v_ncu[i_alpha]; + else if(stat_name == "BASER_BCL" ) v = baser.v_bcl[i_alpha]; + else if(stat_name == "BASER_BCU" ) v = baser.v_bcu[i_alpha]; + else if(stat_name == "FMEAN" ) v = fmean.v; + else if(stat_name == "FMEAN_NCL" ) v = fmean.v_ncl[i_alpha]; + else if(stat_name == "FMEAN_NCU" ) v = fmean.v_ncu[i_alpha]; + else if(stat_name == "FMEAN_BCL" ) v = fmean.v_bcl[i_alpha]; + else if(stat_name == "FMEAN_BCU" ) v = fmean.v_bcu[i_alpha]; + else if(stat_name == "ACC" ) v = acc.v; + else if(stat_name == "ACC_NCL" ) v = acc.v_ncl[i_alpha]; + else if(stat_name == "ACC_NCU" ) v = acc.v_ncu[i_alpha]; + else if(stat_name == "ACC_BCL" ) v = acc.v_bcl[i_alpha]; + else if(stat_name == "ACC_BCU" ) v = acc.v_bcu[i_alpha]; + else if(stat_name == "FBIAS" ) v = fbias.v; + else if(stat_name == "FBIAS_BCL" ) v = fbias.v_bcl[i_alpha]; + else if(stat_name == "FBIAS_BCU" ) v = fbias.v_bcu[i_alpha]; + else if(stat_name == "PODY" ) v = pody.v; + else if(stat_name == "PODY_NCL" ) v = pody.v_ncl[i_alpha]; + else if(stat_name == "PODY_NCU" ) v = pody.v_ncu[i_alpha]; + else if(stat_name == "PODY_BCL" ) v = pody.v_bcl[i_alpha]; + else if(stat_name == "PODY_BCU" ) v = pody.v_bcu[i_alpha]; + else if(stat_name == "PODN" ) v = podn.v; + else if(stat_name == "PODN_NCL" ) v = podn.v_ncl[i_alpha]; + else if(stat_name == "PODN_NCU" ) v = podn.v_ncu[i_alpha]; + else if(stat_name == "PODN_BCL" ) v = podn.v_bcl[i_alpha]; + else if(stat_name == "PODN_BCU" ) v = podn.v_bcu[i_alpha]; + else if(stat_name == "POFD" ) v = pofd.v; + else if(stat_name == "POFD_NCL" ) v = pofd.v_ncl[i_alpha]; + else if(stat_name == "POFD_NCU" ) v = pofd.v_ncu[i_alpha]; + else if(stat_name == "POFD_BCL" ) v = pofd.v_bcl[i_alpha]; + else if(stat_name == "POFD_BCU" ) v = pofd.v_bcu[i_alpha]; + else if(stat_name == "FAR" ) v = far.v; + else if(stat_name == "FAR_NCL" ) v = far.v_ncl[i_alpha]; + else if(stat_name == "FAR_NCU" ) v = far.v_ncu[i_alpha]; + else if(stat_name == "FAR_BCL" ) v = far.v_bcl[i_alpha]; + else if(stat_name == "FAR_BCU" ) v = far.v_bcu[i_alpha]; + else if(stat_name == "CSI" ) v = csi.v; + else if(stat_name == "CSI_NCL" ) v = csi.v_ncl[i_alpha]; + else if(stat_name == "CSI_NCU" ) v = csi.v_ncu[i_alpha]; + else if(stat_name == "CSI_BCL" ) v = csi.v_bcl[i_alpha]; + else if(stat_name == "CSI_BCU" ) v = csi.v_bcu[i_alpha]; + else if(stat_name == "GSS" ) v = gss.v; + else if(stat_name == "GSS_BCL" ) v = gss.v_bcl[i_alpha]; + else if(stat_name == "GSS_BCU" ) v = gss.v_bcu[i_alpha]; + else if(stat_name == "HK" ) v = hk.v; + else if(stat_name == "HK_NCL" ) v = hk.v_ncl[i_alpha]; + else if(stat_name == "HK_NCU" ) v = hk.v_ncu[i_alpha]; + else if(stat_name == "HK_BCL" ) v = hk.v_bcl[i_alpha]; + else if(stat_name == "HK_BCU" ) v = hk.v_bcu[i_alpha]; + else if(stat_name == "HSS" ) v = hss.v; + else if(stat_name == "HSS_BCL" ) v = hss.v_bcl[i_alpha]; + else if(stat_name == "HSS_BCU" ) v = hss.v_bcu[i_alpha]; + else if(stat_name == "ODDS" ) v = odds.v; + else if(stat_name == "ODDS_NCL" ) v = odds.v_ncl[i_alpha]; + else if(stat_name == "ODDS_NCU" ) v = odds.v_ncu[i_alpha]; + else if(stat_name == "ODDS_BCL" ) v = odds.v_bcl[i_alpha]; + else if(stat_name == "ODDS_BCU" ) v = odds.v_bcu[i_alpha]; + else if(stat_name == "LODDS" ) v = lodds.v; + else if(stat_name == "LODDS_NCL" ) v = lodds.v_ncl[i_alpha]; + else if(stat_name == "LODDS_NCU" ) v = lodds.v_ncu[i_alpha]; + else if(stat_name == "LODDS_BCL" ) v = lodds.v_bcl[i_alpha]; + else if(stat_name == "LODDS_BCU" ) v = lodds.v_bcu[i_alpha]; + else if(stat_name == "ORSS" ) v = orss.v; + else if(stat_name == "ORSS_NCL" ) v = orss.v_ncl[i_alpha]; + else if(stat_name == "ORSS_NCU" ) v = orss.v_ncu[i_alpha]; + else if(stat_name == "ORSS_BCL" ) v = orss.v_bcl[i_alpha]; + else if(stat_name == "ORSS_BCU" ) v = orss.v_bcu[i_alpha]; + else if(stat_name == "EDS" ) v = eds.v; + else if(stat_name == "EDS_NCL" ) v = eds.v_ncl[i_alpha]; + else if(stat_name == "EDS_NCU" ) v = eds.v_ncu[i_alpha]; + else if(stat_name == "EDS_BCL" ) v = eds.v_bcl[i_alpha]; + else if(stat_name == "EDS_BCU" ) v = eds.v_bcu[i_alpha]; + else if(stat_name == "SEDS" ) v = seds.v; + else if(stat_name == "SEDS_NCL" ) v = seds.v_ncl[i_alpha]; + else if(stat_name == "SEDS_NCU" ) v = seds.v_ncu[i_alpha]; + else if(stat_name == "SEDS_BCL" ) v = seds.v_bcl[i_alpha]; + else if(stat_name == "SEDS_BCU" ) v = seds.v_bcu[i_alpha]; + else if(stat_name == "EDI" ) v = edi.v; + else if(stat_name == "EDI_NCL" ) v = edi.v_ncl[i_alpha]; + else if(stat_name == "EDI_NCU" ) v = edi.v_ncu[i_alpha]; + else if(stat_name == "EDI_BCL" ) v = edi.v_bcl[i_alpha]; + else if(stat_name == "EDI_BCU" ) v = edi.v_bcu[i_alpha]; + else if(stat_name == "SEDI" ) v = sedi.v; + else if(stat_name == "SEDI_NCL" ) v = sedi.v_ncl[i_alpha]; + else if(stat_name == "SEDI_NCU" ) v = sedi.v_ncu[i_alpha]; + else if(stat_name == "SEDI_BCL" ) v = sedi.v_bcl[i_alpha]; + else if(stat_name == "SEDI_BCU" ) v = sedi.v_bcu[i_alpha]; + else if(stat_name == "BAGSS" ) v = bagss.v; + else if(stat_name == "BAGSS_BCL" ) v = bagss.v_bcl[i_alpha]; + else if(stat_name == "BAGSS_BCU" ) v = bagss.v_bcu[i_alpha]; + else if(stat_name == "HSS_EC" ) v = hss_ec.v; + else if(stat_name == "HSS_EC_BCL") v = hss_ec.v_bcl[i_alpha]; + else if(stat_name == "HSS_EC_BCU") v = hss_ec.v_bcu[i_alpha]; + else if(stat_name == "EC_VALUE" ) v = cts.ec_value(); + else { + mlog << Error << "\nCTSInfo::get_stat_cts() -> " << "unknown categorical statistic name \"" << stat_name << "\"!\n\n"; exit(1); } // Return bad data for 0 pairs - if(cts.n() == 0 && strcmp(stat_name, "TOTAL") != 0) { + if(cts.n() == 0 && stat_name != "TOTAL") { v = bad_data_double; } @@ -653,6 +788,98 @@ void MCTSInfo::compute_ci() { return; } +//////////////////////////////////////////////////////////////////////// + +double MCTSInfo::get_stat_mctc(const string &stat_name, + ConcatString &col_name) const { + double v = bad_data_double; + col_name = stat_name; + + // Find the statistic by name + if(stat_name == "TOTAL" ) v = (double) cts.total(); + else if(stat_name == "N_CAT" ) v = (double) cts.nrows(); + else if(stat_name == "EC_VALUE") v = cts.ec_value(); + else if(check_reg_exp("F[0-9]*_O[0-9]*", stat_name.c_str())) { + + col_name = "FI_OJ"; + + // Parse column name to retrieve index values + ConcatString cs(stat_name); + StringArray sa = cs.split("_"); + int i = atoi(sa[0].c_str()+1) - 1; + int j = atoi(sa[1].c_str()+1) - 1; + + // Range check + if(i < 0 || i >= cts.nrows() || + j < 0 || j >= cts.ncols()) { + mlog << Error << "\nget_stat_mctc() -> " + << "range check error for column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + + // Retrieve the value + v = (double) cts.entry(i, j); + } + else { + mlog << Error << "\nMCTSInfo::get_stat_mctc() -> " + << "unknown multi-category statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double MCTSInfo::get_stat_mcts(const string &stat_name, int i_alpha) const { + double v = bad_data_double; + + // Range check alpha index + if(i_alpha >= n_alpha && is_ci_stat_name(stat_name)) { + mlog << Error << "\nMCTSInfo::get_stat_mcts() -> " + << "alpha index out of range (" << i_alpha << " >= " + << n_alpha << ")!\n\n"; + exit(1); + } + + // Find the statistic by name + if(stat_name == "TOTAL" ) v = (double) cts.total(); + else if(stat_name == "N_CAT" ) v = (double) cts.nrows(); + else if(stat_name == "ACC" ) v = acc.v; + else if(stat_name == "ACC_NCL" ) v = acc.v_ncl[i_alpha]; + else if(stat_name == "ACC_NCU" ) v = acc.v_ncu[i_alpha]; + else if(stat_name == "ACC_BCL" ) v = acc.v_bcl[i_alpha]; + else if(stat_name == "ACC_BCU" ) v = acc.v_bcu[i_alpha]; + else if(stat_name == "HK" ) v = hk.v; + else if(stat_name == "HK_BCL" ) v = hk.v_bcl[i_alpha]; + else if(stat_name == "HK_BCU" ) v = hk.v_bcu[i_alpha]; + else if(stat_name == "HSS" ) v = hss.v; + else if(stat_name == "HSS_BCL" ) v = hss.v_bcl[i_alpha]; + else if(stat_name == "HSS_BCU" ) v = hss.v_bcu[i_alpha]; + else if(stat_name == "GER" ) v = ger.v; + else if(stat_name == "GER_BCL" ) v = ger.v_bcl[i_alpha]; + else if(stat_name == "GER_BCU" ) v = ger.v_bcu[i_alpha]; + else if(stat_name == "HSS_EC" ) v = hss_ec.v; + else if(stat_name == "HSS_EC_BCL") v = hss_ec.v_bcl[i_alpha]; + else if(stat_name == "HSS_EC_BCU") v = hss_ec.v_bcu[i_alpha]; + else if(stat_name == "EC_VALUE" ) v = cts.ec_value(); + else { + mlog << Error << "\nMCTSInfo::get_stat_mcts() -> " + << "unknown multi-category statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } + + // Return bad data for 0 pairs + if(cts.total() == 0 && stat_name != "TOTAL") { + v = bad_data_double; + } + + return v; +} + //////////////////////////////////////////////////////////////////////// // // Code for class CNTInfo @@ -1023,51 +1250,130 @@ void CNTInfo::compute_ci() { //////////////////////////////////////////////////////////////////////// -double CNTInfo::get_stat(const char *stat_name) { +double CNTInfo::get_stat_cnt(const string &stat_name, int i_alpha) const { double v = bad_data_double; + // Range check alpha index + if(i_alpha >= n_alpha && is_ci_stat_name(stat_name)) { + mlog << Error << "\nCNTInfo::get_stat_cnt() -> " + << "alpha index out of range (" << i_alpha << " >= " + << n_alpha << ")!\n\n"; + exit(1); + } + // Find the statistic by name - if(strcmp(stat_name, "TOTAL" ) == 0) v = n; - else if(strcmp(stat_name, "FBAR" ) == 0) v = fbar.v; - else if(strcmp(stat_name, "FSTDEV" ) == 0) v = fstdev.v; - else if(strcmp(stat_name, "OBAR" ) == 0) v = obar.v; - else if(strcmp(stat_name, "OSTDEV" ) == 0) v = ostdev.v; - else if(strcmp(stat_name, "PR_CORR" ) == 0) v = pr_corr.v; - else if(strcmp(stat_name, "SP_CORR" ) == 0) v = sp_corr.v; - else if(strcmp(stat_name, "KT_CORR" ) == 0) v = kt_corr.v; - else if(strcmp(stat_name, "RANKS" ) == 0) v = n_ranks; - else if(strcmp(stat_name, "FRANK_TIES" ) == 0) v = frank_ties; - else if(strcmp(stat_name, "ORANK_TIES" ) == 0) v = orank_ties; - else if(strcmp(stat_name, "ME" ) == 0) v = me.v; - else if(strcmp(stat_name, "ESTDEV" ) == 0) v = estdev.v; - else if(strcmp(stat_name, "MBIAS" ) == 0) v = mbias.v; - else if(strcmp(stat_name, "MAE" ) == 0) v = mae.v; - else if(strcmp(stat_name, "MSE" ) == 0) v = mse.v; - else if(strcmp(stat_name, "BCMSE" ) == 0) v = bcmse.v; - else if(strcmp(stat_name, "RMSE" ) == 0) v = rmse.v; - else if(strcmp(stat_name, "SI" ) == 0) v = si.v; - else if(strcmp(stat_name, "E10" ) == 0) v = e10.v; - else if(strcmp(stat_name, "E25" ) == 0) v = e25.v; - else if(strcmp(stat_name, "E50" ) == 0) v = e50.v; - else if(strcmp(stat_name, "E75" ) == 0) v = e75.v; - else if(strcmp(stat_name, "E90" ) == 0) v = e90.v; - else if(strcmp(stat_name, "EIQR" ) == 0) v = eiqr.v; - else if(strcmp(stat_name, "MAD " ) == 0) v = mad.v; - else if(strcmp(stat_name, "ANOM_CORR" ) == 0) v = anom_corr.v; - else if(strcmp(stat_name, "ME2" ) == 0) v = me2.v; - else if(strcmp(stat_name, "MSESS" ) == 0) v = msess.v; - else if(strcmp(stat_name, "RMSFA" ) == 0) v = rmsfa.v; - else if(strcmp(stat_name, "RMSOA" ) == 0) v = rmsoa.v; - else if(strcmp(stat_name, "ANOM_CORR_UNCNTR") == 0) v = anom_corr_uncntr.v; + if(stat_name == "TOTAL" ) v = (double) n; + else if(stat_name == "FBAR" ) v = fbar.v; + else if(stat_name == "FBAR_NCL" ) v = fbar.v_ncl[i_alpha]; + else if(stat_name == "FBAR_NCU" ) v = fbar.v_ncu[i_alpha]; + else if(stat_name == "FBAR_BCL" ) v = fbar.v_bcl[i_alpha]; + else if(stat_name == "FBAR_BCU" ) v = fbar.v_bcu[i_alpha]; + else if(stat_name == "FSTDEV" ) v = fstdev.v; + else if(stat_name == "FSTDEV_NCL" ) v = fstdev.v_ncl[i_alpha]; + else if(stat_name == "FSTDEV_NCU" ) v = fstdev.v_ncu[i_alpha]; + else if(stat_name == "FSTDEV_BCL" ) v = fstdev.v_bcl[i_alpha]; + else if(stat_name == "FSTDEV_BCU" ) v = fstdev.v_bcu[i_alpha]; + else if(stat_name == "OBAR" ) v = obar.v; + else if(stat_name == "OBAR_NCL" ) v = obar.v_ncl[i_alpha]; + else if(stat_name == "OBAR_NCU" ) v = obar.v_ncu[i_alpha]; + else if(stat_name == "OBAR_BCL" ) v = obar.v_bcl[i_alpha]; + else if(stat_name == "OBAR_BCU" ) v = obar.v_bcu[i_alpha]; + else if(stat_name == "OSTDEV" ) v = ostdev.v; + else if(stat_name == "OSTDEV_NCL" ) v = ostdev.v_ncl[i_alpha]; + else if(stat_name == "OSTDEV_NCU" ) v = ostdev.v_ncu[i_alpha]; + else if(stat_name == "OSTDEV_BCL" ) v = ostdev.v_bcl[i_alpha]; + else if(stat_name == "OSTDEV_BCU" ) v = ostdev.v_bcu[i_alpha]; + else if(stat_name == "PR_CORR" ) v = pr_corr.v; + else if(stat_name == "PR_CORR_NCL" ) v = pr_corr.v_ncl[i_alpha]; + else if(stat_name == "PR_CORR_NCU" ) v = pr_corr.v_ncu[i_alpha]; + else if(stat_name == "PR_CORR_BCL" ) v = pr_corr.v_bcl[i_alpha]; + else if(stat_name == "PR_CORR_BCU" ) v = pr_corr.v_bcu[i_alpha]; + else if(stat_name == "SP_CORR" ) v = sp_corr.v; + else if(stat_name == "KT_CORR" ) v = kt_corr.v; + else if(stat_name == "RANKS" ) v = n_ranks; + else if(stat_name == "FRANK_TIES" ) v = frank_ties; + else if(stat_name == "ORANK_TIES" ) v = orank_ties; + else if(stat_name == "ME" ) v = me.v; + else if(stat_name == "ME_NCL" ) v = me.v_ncl[i_alpha]; + else if(stat_name == "ME_NCU" ) v = me.v_ncu[i_alpha]; + else if(stat_name == "ME_BCL" ) v = me.v_bcl[i_alpha]; + else if(stat_name == "ME_BCU" ) v = me.v_bcu[i_alpha]; + else if(stat_name == "ESTDEV" ) v = estdev.v; + else if(stat_name == "ESTDEV_NCL" ) v = estdev.v_ncl[i_alpha]; + else if(stat_name == "ESTDEV_NCU" ) v = estdev.v_ncu[i_alpha]; + else if(stat_name == "ESTDEV_BCL" ) v = estdev.v_bcl[i_alpha]; + else if(stat_name == "ESTDEV_BCU" ) v = estdev.v_bcu[i_alpha]; + else if(stat_name == "MBIAS" ) v = mbias.v; + else if(stat_name == "MBIAS_BCL" ) v = mbias.v_bcl[i_alpha]; + else if(stat_name == "MBIAS_BCU" ) v = mbias.v_bcu[i_alpha]; + else if(stat_name == "MAE" ) v = mae.v; + else if(stat_name == "MAE_BCL" ) v = mae.v_bcl[i_alpha]; + else if(stat_name == "MAE_BCU" ) v = mae.v_bcu[i_alpha]; + else if(stat_name == "MSE" ) v = mse.v; + else if(stat_name == "MSE_BCL" ) v = mse.v_bcl[i_alpha]; + else if(stat_name == "MSE_BCU" ) v = mse.v_bcu[i_alpha]; + else if(stat_name == "BCMSE" ) v = bcmse.v; + else if(stat_name == "BCMSE_BCL" ) v = bcmse.v_bcl[i_alpha]; + else if(stat_name == "BCMSE_BCU" ) v = bcmse.v_bcu[i_alpha]; + else if(stat_name == "RMSE" ) v = rmse.v; + else if(stat_name == "RMSE_BCL" ) v = rmse.v_bcl[i_alpha]; + else if(stat_name == "RMSE_BCU" ) v = rmse.v_bcu[i_alpha]; + else if(stat_name == "SI" ) v = si.v; + else if(stat_name == "SI_BCL" ) v = si.v_bcl[i_alpha]; + else if(stat_name == "SI_BCU" ) v = si.v_bcu[i_alpha]; + else if(stat_name == "E10" ) v = e10.v; + else if(stat_name == "E10_BCL" ) v = e10.v_bcl[i_alpha]; + else if(stat_name == "E10_BCU" ) v = e10.v_bcu[i_alpha]; + else if(stat_name == "E25" ) v = e25.v; + else if(stat_name == "E25_BCL" ) v = e25.v_bcl[i_alpha]; + else if(stat_name == "E25_BCU" ) v = e25.v_bcu[i_alpha]; + else if(stat_name == "E50" ) v = e50.v; + else if(stat_name == "E50_BCL" ) v = e50.v_bcl[i_alpha]; + else if(stat_name == "E50_BCU" ) v = e50.v_bcu[i_alpha]; + else if(stat_name == "E75" ) v = e75.v; + else if(stat_name == "E75_BCL" ) v = e75.v_bcl[i_alpha]; + else if(stat_name == "E75_BCU" ) v = e75.v_bcu[i_alpha]; + else if(stat_name == "E90" ) v = e90.v; + else if(stat_name == "E90_BCL" ) v = e90.v_bcl[i_alpha]; + else if(stat_name == "E90_BCU" ) v = e90.v_bcu[i_alpha]; + else if(stat_name == "EIQR" ) v = eiqr.v; + else if(stat_name == "EIQR_BCL" ) v = eiqr.v_bcl[i_alpha]; + else if(stat_name == "EIQR_BCU" ) v = eiqr.v_bcu[i_alpha]; + else if(stat_name == "MAD" ) v = mad.v; + else if(stat_name == "MAD_BCL" ) v = mad.v_bcl[i_alpha]; + else if(stat_name == "MAD_BCU" ) v = mad.v_bcu[i_alpha]; + else if(stat_name == "ANOM_CORR" ) v = anom_corr.v; + else if(stat_name == "ANOM_CORR_NCL" ) v = anom_corr.v_ncl[i_alpha]; + else if(stat_name == "ANOM_CORR_NCU" ) v = anom_corr.v_ncu[i_alpha]; + else if(stat_name == "ANOM_CORR_BCL" ) v = anom_corr.v_bcl[i_alpha]; + else if(stat_name == "ANOM_CORR_BCU" ) v = anom_corr.v_bcu[i_alpha]; + else if(stat_name == "ME2" ) v = me2.v; + else if(stat_name == "ME2_BCL" ) v = me2.v_bcl[i_alpha]; + else if(stat_name == "ME2_BCU" ) v = me2.v_bcu[i_alpha]; + else if(stat_name == "MSESS" ) v = msess.v; + else if(stat_name == "MSESS_BCL" ) v = msess.v_bcl[i_alpha]; + else if(stat_name == "MSESS_BCU" ) v = msess.v_bcu[i_alpha]; + else if(stat_name == "RMSFA" ) v = rmsfa.v; + else if(stat_name == "RMSFA_BCL" ) v = rmsfa.v_bcl[i_alpha]; + else if(stat_name == "RMSFA_BCU" ) v = rmsfa.v_bcu[i_alpha]; + else if(stat_name == "RMSOA" ) v = rmsoa.v; + else if(stat_name == "RMSOA_BCL" ) v = rmsoa.v_bcl[i_alpha]; + else if(stat_name == "RMSOA_BCU" ) v = rmsoa.v_bcu[i_alpha]; + else if(stat_name == "ANOM_CORR_UNCNTR" ) v = anom_corr_uncntr.v; + else if(stat_name == "ANOM_CORR_UNCNTR_BCL") v = anom_corr_uncntr.v_bcl[i_alpha]; + else if(stat_name == "ANOM_CORR_UNCNTR_BCU") v = anom_corr_uncntr.v_bcu[i_alpha]; + else if(stat_name == "SI" ) v = si.v; + else if(stat_name == "SI_BCL" ) v = si.v_bcl[i_alpha]; + else if(stat_name == "SI_BCU" ) v = si.v_bcu[i_alpha]; else { - mlog << Error << "\nCNTInfo::get_stat() -> " + mlog << Error << "\nCNTInfo::get_stat_cnt() -> " << "unknown continuous statistic name \"" << stat_name << "\"!\n\n"; exit(1); } // Return bad data for 0 pairs - if(n == 0 && strcmp(stat_name, "TOTAL") != 0) { + if(n == 0 && stat_name != "TOTAL") { v = bad_data_double; } @@ -1307,7 +1613,7 @@ void SL1L2Info::set(const PairDataPoint &pd_all) { //////////////////////////////////////////////////////////////////////// -void SL1L2Info::set_sl1l2_stat(const string &stat_name, double v) { +void SL1L2Info::set_stat_sl1l2(const string &stat_name, double v) { if(stat_name == "TOTAL") scount = nint(v); else if(stat_name == "FBAR" ) fbar = v; @@ -1316,13 +1622,19 @@ void SL1L2Info::set_sl1l2_stat(const string &stat_name, double v) { else if(stat_name == "FFBAR") ffbar = v; else if(stat_name == "OOBAR") oobar = v; else if(stat_name == "MAE" ) smae = v; + else { + mlog << Error << "\nSL1L2Info::set_stat_sl1l2() -> " + << "unknown scalar partial sum statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } return; } //////////////////////////////////////////////////////////////////////// -void SL1L2Info::set_sal1l2_stat(const string &stat_name, double v) { +void SL1L2Info::set_stat_sal1l2(const string &stat_name, double v) { if(stat_name == "TOTAL" ) sacount = nint(v); else if(stat_name == "FABAR" ) fabar = v; @@ -1331,13 +1643,19 @@ void SL1L2Info::set_sal1l2_stat(const string &stat_name, double v) { else if(stat_name == "FFABAR") ffabar = v; else if(stat_name == "OOABAR") ooabar = v; else if(stat_name == "MAE" ) samae = v; + else { + mlog << Error << "\nSL1L2Info::set_stat_sal1l2() -> " + << "unknown scalar anomaly partial sum statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } return; } //////////////////////////////////////////////////////////////////////// -double SL1L2Info::get_sl1l2_stat(const string &stat_name) const { +double SL1L2Info::get_stat_sl1l2(const string &stat_name) const { double v = bad_data_double; if(stat_name == "TOTAL") v = (double) scount; @@ -1347,13 +1665,19 @@ double SL1L2Info::get_sl1l2_stat(const string &stat_name) const { else if(stat_name == "FFBAR") v = ffbar; else if(stat_name == "OOBAR") v = oobar; else if(stat_name == "MAE" ) v = smae; + else { + mlog << Error << "\nSL1L2Info::get_stat_sl1l2() -> " + << "unknown scalar partial sum statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } return v; } //////////////////////////////////////////////////////////////////////// -double SL1L2Info::get_sal1l2_stat(const string &stat_name) const { +double SL1L2Info::get_stat_sal1l2(const string &stat_name) const { double v = bad_data_double; if(stat_name == "TOTAL" ) v = (double) sacount; @@ -1363,6 +1687,12 @@ double SL1L2Info::get_sal1l2_stat(const string &stat_name) const { else if(stat_name == "FFABAR") v = ffabar; else if(stat_name == "OOABAR") v = ooabar; else if(stat_name == "MAE" ) v = samae; + else { + mlog << Error << "\nSL1L2Info::get_stat_sal1l2() -> " + << "unknown scalar anomaly partial sum statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } return v; } @@ -2028,40 +2358,99 @@ void VL1L2Info::compute_ci() { return; } +//////////////////////////////////////////////////////////////////////// + +double VL1L2Info::get_stat_vl1l2(const string &stat_name) const { + double v = bad_data_double; + + // Find the statistic by name + if(stat_name == "TOTAL" ) v = vcount; + else if(stat_name == "UFBAR" ) v = uf_bar; + else if(stat_name == "VFBAR" ) v = vf_bar; + else if(stat_name == "UOBAR" ) v = uo_bar; + else if(stat_name == "VOBAR" ) v = vo_bar; + else if(stat_name == "UVFOBAR" ) v = uvfo_bar; + else if(stat_name == "UVFFBAR" ) v = uvff_bar; + else if(stat_name == "UVOOBAR" ) v = uvoo_bar; + else if(stat_name == "F_SPEED_BAR") v = f_speed_bar; + else if(stat_name == "O_SPEED_BAR") v = o_speed_bar; + else if(stat_name == "TOTAL_DIR" ) v = dcount; + else if(stat_name == "DIR_ME" ) v = dir_bar; + else if(stat_name == "DIR_MAE" ) v = absdir_bar; + else if(stat_name == "DIR_MSE" ) v = dir2_bar; + else { + mlog << Error << "\nVL1L2Info::get_stat_vl1l2() -> " + << "unknown vector partial sums statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } + + return v; +} //////////////////////////////////////////////////////////////////////// -double VL1L2Info::get_stat(const char *stat_name) { +double VL1L2Info::get_stat_val1l2(const string &stat_name) const { double v = bad_data_double; - if(strcmp(stat_name, "TOTAL" ) == 0) v = vcount; - else if(strcmp(stat_name, "FBAR" ) == 0) v = FBAR.v; - else if(strcmp(stat_name, "OBAR" ) == 0) v = OBAR.v; - else if(strcmp(stat_name, "FS_RMS" ) == 0) v = FS_RMS.v; - else if(strcmp(stat_name, "OS_RMS" ) == 0) v = OS_RMS.v; - else if(strcmp(stat_name, "MSVE" ) == 0) v = MSVE.v; - else if(strcmp(stat_name, "RMSVE" ) == 0) v = RMSVE.v; - else if(strcmp(stat_name, "FSTDEV" ) == 0) v = FSTDEV.v; - else if(strcmp(stat_name, "OSTDEV" ) == 0) v = OSTDEV.v; - else if(strcmp(stat_name, "FDIR" ) == 0) v = FDIR.v; - else if(strcmp(stat_name, "ODIR" ) == 0) v = ODIR.v; - else if(strcmp(stat_name, "FBAR_SPEED" ) == 0) v = FBAR_SPEED.v; - else if(strcmp(stat_name, "OBAR_SPEED" ) == 0) v = OBAR_SPEED.v; - else if(strcmp(stat_name, "VDIFF_SPEED" ) == 0) v = VDIFF_SPEED.v; - else if(strcmp(stat_name, "VDIFF_DIR" ) == 0) v = VDIFF_DIR.v; - else if(strcmp(stat_name, "SPEED_ERR" ) == 0) v = SPEED_ERR.v; - else if(strcmp(stat_name, "SPEED_ABSERR" ) == 0) v = SPEED_ABSERR.v; - else if(strcmp(stat_name, "DIR_ERR" ) == 0) v = DIR_ERR.v; - else if(strcmp(stat_name, "DIR_ABSERR" ) == 0) v = DIR_ABSERR.v; - else if(strcmp(stat_name, "ANOM_CORR" ) == 0) v = ANOM_CORR.v; - else if(strcmp(stat_name, "ANOM_CORR_UNCNTR") == 0) v = ANOM_CORR_UNCNTR.v; - else if(strcmp(stat_name, "DIR_ME" ) == 0) v = DIR_ME.v; - else if(strcmp(stat_name, "DIR_MAE" ) == 0) v = DIR_MAE.v; - else if(strcmp(stat_name, "DIR_MSE" ) == 0) v = DIR_MSE.v; - else if(strcmp(stat_name, "DIR_RMSE" ) == 0) v = DIR_RMSE.v; + // Find the statistic by name + if(stat_name == "TOTAL" ) v = vacount; + else if(stat_name == "UFABAR" ) v = ufa_bar; + else if(stat_name == "VFABAR" ) v = vfa_bar; + else if(stat_name == "UOABAR" ) v = uoa_bar; + else if(stat_name == "VOABAR" ) v = voa_bar; + else if(stat_name == "UVFOABAR" ) v = uvfoa_bar; + else if(stat_name == "UVFFABAR" ) v = uvffa_bar; + else if(stat_name == "UVOOABAR" ) v = uvooa_bar; + else if(stat_name == "FA_SPEED_BAR") v = fa_speed_bar; + else if(stat_name == "OA_SPEED_BAR") v = oa_speed_bar; + else if(stat_name == "TOTAL_DIR" ) v = dacount; + else if(stat_name == "DIRA_ME" ) v = dira_bar; + else if(stat_name == "DIRA_MAE" ) v = absdira_bar; + else if(stat_name == "DIRA_MSE" ) v = dira2_bar; else { mlog << Error << "\nVL1L2Info::get_stat() -> " - << "unknown continuous statistic name \"" << stat_name + << "unknown vector anomaly partial sums statistic name \"" << stat_name + << "\"!\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double VL1L2Info::get_stat_vcnt(const string &stat_name) const { + double v = bad_data_double; + + if(stat_name == "TOTAL" ) v = vcount; + else if(stat_name == "FBAR" ) v = FBAR.v; + else if(stat_name == "OBAR" ) v = OBAR.v; + else if(stat_name == "FS_RMS" ) v = FS_RMS.v; + else if(stat_name == "OS_RMS" ) v = OS_RMS.v; + else if(stat_name == "MSVE" ) v = MSVE.v; + else if(stat_name == "RMSVE" ) v = RMSVE.v; + else if(stat_name == "FSTDEV" ) v = FSTDEV.v; + else if(stat_name == "OSTDEV" ) v = OSTDEV.v; + else if(stat_name == "FDIR" ) v = FDIR.v; + else if(stat_name == "ODIR" ) v = ODIR.v; + else if(stat_name == "FBAR_SPEED" ) v = FBAR_SPEED.v; + else if(stat_name == "OBAR_SPEED" ) v = OBAR_SPEED.v; + else if(stat_name == "VDIFF_SPEED" ) v = VDIFF_SPEED.v; + else if(stat_name == "VDIFF_DIR" ) v = VDIFF_DIR.v; + else if(stat_name == "SPEED_ERR" ) v = SPEED_ERR.v; + else if(stat_name == "SPEED_ABSERR" ) v = SPEED_ABSERR.v; + else if(stat_name == "DIR_ERR" ) v = DIR_ERR.v; + else if(stat_name == "DIR_ABSERR" ) v = DIR_ABSERR.v; + else if(stat_name == "ANOM_CORR" ) v = ANOM_CORR.v; + else if(stat_name == "ANOM_CORR_UNCNTR") v = ANOM_CORR_UNCNTR.v; + else if(stat_name == "DIR_ME" ) v = DIR_ME.v; + else if(stat_name == "DIR_MAE" ) v = DIR_MAE.v; + else if(stat_name == "DIR_MSE" ) v = DIR_MSE.v; + else if(stat_name == "DIR_RMSE" ) v = DIR_RMSE.v; + else { + mlog << Error << "\nVL1L2Info::get_stat_vcnt() -> " + << "unknown vector continuous statistic name \"" << stat_name << "\"!\n\n"; exit(1); } @@ -2818,6 +3207,262 @@ void PCTInfo::compute_ci() { return; } +//////////////////////////////////////////////////////////////////////// + +double PCTInfo::get_stat_pct(const string &stat_name, + ConcatString &col_name) const { + int i = 0; + double v = bad_data_double; + col_name = stat_name; + + // Get index value for variable column numbers + if(check_reg_exp("_[0-9]", stat_name.c_str())) { + + // Parse the index value from the column name + i = atoi(strrchr(stat_name.c_str(), '_') + 1) - 1; + + // Range check + if(i < 0 || i >= pct.nrows()) { + mlog << Error << "\nPCTInfo::get_stat_pct() -> " + << "range check error for column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + } // end if + + // Find the statistic by name + if(stat_name == "TOTAL") { + v = (double) pct.n(); + } + else if(stat_name == "N_THRESH") { + v = (double) pct.nrows() + 1; + } + else if(check_reg_exp("THRESH_[0-9]", stat_name.c_str())) { + v = pct.threshold(i); + col_name = "THRESH_I"; + } + else if(check_reg_exp("OY_[0-9]", stat_name.c_str())){ + v = (double) pct.event_count_by_row(i); + col_name = "OY_I"; + } + else if(check_reg_exp("ON_[0-9]", stat_name.c_str())) { + v = (double) pct.nonevent_count_by_row(i); + col_name = "ON_I"; + } + else { + mlog << Error << "\nPCTInfo::get_stat_pct() -> " + << "unsupported column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double PCTInfo::get_stat_pjc(const string &stat_name, + ConcatString &col_name) const { + int i = 0; + double v = bad_data_double; + col_name = stat_name; + + // Get index value for variable column numbers + if(check_reg_exp("_[0-9]", stat_name.c_str())) { + + // Parse the index value from the column name + i = atoi(strrchr(stat_name.c_str(), '_') + 1) - 1; + + // Range check + if(i < 0 || i >= pct.nrows()) { + mlog << Error << "\nPCTInfo::get_stat_pjc() -> " + << "range check error for column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + } // end if + + // Find the statistic by name + if(stat_name == "TOTAL") { + v = (double) pct.n(); + } + else if(stat_name == "N_THRESH") { + v = (double) pct.nrows() + 1; + } + else if(check_reg_exp("THRESH_[0-9]", stat_name.c_str())) { + v = pct.threshold(i); + col_name = "THRESH_I"; + } + else if(check_reg_exp("OY_TP_[0-9]", stat_name.c_str())) { + v = pct.event_count_by_row(i)/(double) pct.n(); + col_name = "OY_TP_I"; + } + else if(check_reg_exp("ON_TP_[0-9]", stat_name.c_str())) { + v = pct.nonevent_count_by_row(i)/(double) pct.n(); + col_name = "ON_TP_I"; + } + else if(check_reg_exp("CALIBRATION_[0-9]", stat_name.c_str())) { + v = pct.row_calibration(i); + col_name = "CALIBRATION_I"; + } + else if(check_reg_exp("REFINEMENT_[0-9]", stat_name.c_str())) { + v = pct.row_refinement(i); + col_name = "REFINEMENT_I"; + } + else if(check_reg_exp("LIKELIHOOD_[0-9]", stat_name.c_str())) { + v = pct.row_event_likelihood(i); + col_name = "LIKELIHOOD_I"; + } + else if(check_reg_exp("BASER_[0-9]", stat_name.c_str())) { + v = pct.row_obar(i); + col_name = "BASER_I"; + } + else { + mlog << Error << "\nPCTInfo::get_stat_pjc() -> " + << "unsupported column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + + // Return bad data for 0 pairs + if(pct.n() == 0 && stat_name != "TOTAL") { + v = bad_data_double; + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double PCTInfo::get_stat_prc(const string &stat_name, + ConcatString &col_name) const { + int i = 0; + double v = bad_data_double; + col_name = stat_name; + TTContingencyTable ct; + + // Get index value for variable column numbers + if(check_reg_exp("_[0-9]", stat_name.c_str())) { + + // Parse the index value from the column name + i = atoi(strrchr(stat_name.c_str(), '_') + 1) - 1; + + // Range check + if(i < 0 || i >= pct.nrows()) { + mlog << Error << "\nPCTInfo::get_stat_prc() -> " + << "range check error for column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + + // Get the 2x2 contingency table for this row + ct = pct.ctc_by_row(i); + + } // end if + + // Find the statistic by name + if(stat_name == "TOTAL") { + v = (double) pct.n(); + } + else if(stat_name == "N_THRESH") { + v = (double) pct.nrows() + 1; + } + else if(check_reg_exp("THRESH_[0-9]", stat_name.c_str())) { + v = pct.threshold(i); + col_name = "THRESH_I"; + } + else if(check_reg_exp("PODY_[0-9]", stat_name.c_str())) { + v = ct.pod_yes(); + col_name = "PODY_I"; + } + else if(check_reg_exp("POFD_[0-9]", stat_name.c_str())) { + v = ct.pofd(); + col_name = "POFD_I"; + } + else { + mlog << Error << "\nPCTInfo::get_stat_prc() -> " + << "unsupported column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + + // Return bad data for 0 pairs + if(pct.n() == 0 && stat_name != "TOTAL") { + v = bad_data_double; + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + +double PCTInfo::get_stat_pstd(const string &stat_name, + ConcatString &col_name, + int i_alpha) const { + int i = 0; + double v = bad_data_double; + col_name = stat_name; + + // Range check alpha index + if(i_alpha >= n_alpha && is_ci_stat_name(stat_name)) { + mlog << Error << "\nPCTInfo::get_stat_pstd() -> " + << "alpha index out of range (" << i_alpha << " >= " + << n_alpha << ")!\n\n"; + exit(1); + } + + // Get index value for variable column numbers + if(check_reg_exp("_[0-9]", stat_name.c_str())) { + + // Parse the index value from the column name + i = atoi(strrchr(stat_name.c_str(), '_') + 1) - 1; + + // Range check + if(i < 0 || i >= pct.nrows()) { + mlog << Error << "\nPCTInfo::get_stat_pstd() -> " + << "range check error for column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + } // end if + + // Find the statistic by name + if(stat_name == "TOTAL" ) v = (double) pct.n(); + else if(stat_name == "N_THRESH" ) v = (double) pct.nrows() + 1; + else if(stat_name == "BASER" ) v = baser.v; + else if(stat_name == "BASER_NCL" ) v = baser.v_ncl[i_alpha]; + else if(stat_name == "BASER_NCU" ) v = baser.v_ncu[i_alpha]; + else if(stat_name == "RELIABILITY") v = pct.reliability(); + else if(stat_name == "RESOLUTION" ) v = pct.resolution(); + else if(stat_name == "UNCERTAINTY") v = pct.uncertainty(); + else if(stat_name == "ROC_AUC" ) v = pct.roc_auc(); + else if(stat_name == "BRIER" ) v = brier.v; + else if(stat_name == "BRIER_NCL" ) v = brier.v_ncl[i_alpha]; + else if(stat_name == "BRIER_NCU" ) v = brier.v_ncu[i_alpha]; + else if(stat_name == "BRIERCL" ) v = briercl.v; + else if(stat_name == "BRIERCL_NCL") v = briercl.v_ncl[i_alpha]; + else if(stat_name == "BRIERCL_NCU") v = briercl.v_ncu[i_alpha]; + else if(stat_name == "BSS" ) v = bss; + else if(stat_name == "BSS_SMPL" ) v = bss_smpl; + else if(check_reg_exp("THRESH_[0-9]", stat_name.c_str())) { + v = pct.threshold(i); + col_name = "THRESH_I"; + } + else { + mlog << Error << "\nPCTInfo::get_stat_pstd() -> " + << "unsupported column name requested \"" << stat_name + << "\"\n\n"; + exit(1); + } + + // Return bad data for 0 pairs + if(pct.n() == 0 && stat_name != "TOTAL") { + v = bad_data_double; + } + + return v; +} + //////////////////////////////////////////////////////////////////////// // // Code for class GRADInfo @@ -3689,3 +4334,10 @@ int compute_rank(const DataPlane &dp, DataPlane &dp_rank, double *data_rank, int } //////////////////////////////////////////////////////////////////////// + +bool is_ci_stat_name(const string &stat_name) { + return (stat_name.find("_NC") != string::npos || + stat_name.find("_BC") != string::npos); +} + +//////////////////////////////////////////////////////////////////////// diff --git a/src/libcode/vx_statistics/met_stats.h b/src/libcode/vx_statistics/met_stats.h index 41dddb1398..3d6891f8b2 100644 --- a/src/libcode/vx_statistics/met_stats.h +++ b/src/libcode/vx_statistics/met_stats.h @@ -97,7 +97,9 @@ class CTSInfo { void compute_stats(); void compute_ci(); - double get_stat(const char *); + double get_stat_fho(const std::string &) const; + double get_stat_ctc(const std::string &) const; + double get_stat_cts(const std::string &, int i_alpha=0) const; }; //////////////////////////////////////////////////////////////////////// @@ -136,6 +138,9 @@ class MCTSInfo { void add(double, double, const ClimoPntInfo *cpi = nullptr); void compute_stats(); void compute_ci(); + + double get_stat_mctc(const std::string &, ConcatString &) const; + double get_stat_mcts(const std::string &, int i_alpha=0) const; }; //////////////////////////////////////////////////////////////////////// @@ -192,7 +197,7 @@ class CNTInfo { void allocate_n_alpha(int); void compute_ci(); - double get_stat(const char *); + double get_stat_cnt(const std::string &, int i_alpha=0) const; }; //////////////////////////////////////////////////////////////////////// @@ -240,11 +245,11 @@ class SL1L2Info { void zero_out(); void clear(); - void set_sl1l2_stat (const std::string &, double); - void set_sal1l2_stat(const std::string &, double); + void set_stat_sl1l2(const std::string &, double); + void set_stat_sal1l2(const std::string &, double); - double get_sl1l2_stat (const std::string &) const; - double get_sal1l2_stat(const std::string &) const; + double get_stat_sl1l2(const std::string &) const; + double get_stat_sal1l2(const std::string &) const; }; //////////////////////////////////////////////////////////////////////// @@ -368,7 +373,9 @@ class VL1L2Info { void compute_stats(); void compute_ci(); - double get_stat(const char *); + double get_stat_vl1l2(const std::string &) const; + double get_stat_val1l2(const std::string &) const; + double get_stat_vcnt(const std::string &) const; }; //////////////////////////////////////////////////////////////////////// @@ -564,6 +571,11 @@ class PCTInfo { void set_fthresh(const ThreshArray &); void compute_stats(); void compute_ci(); + + double get_stat_pct(const std::string &, ConcatString &) const; + double get_stat_pjc(const std::string &, ConcatString &) const; + double get_stat_prc(const std::string &, ConcatString &) const; + double get_stat_pstd(const std::string &, ConcatString &, int i_alpha=0) const; }; //////////////////////////////////////////////////////////////////////// @@ -743,6 +755,8 @@ extern double compute_ufss(double); extern int compute_rank(const DataPlane &, DataPlane &, double *, int &); +extern bool is_ci_stat_name(const std::string &); + //////////////////////////////////////////////////////////////////////// #endif // __MET_STATS_H__ diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 4920c9f87c..a03b00d17e 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -94,7 +94,6 @@ static void do_probabilistic (int, const PairDataPoint *); // TODO: MET #1371 // - Add a PCT aggregation logic test -// - Switch to set_stat() and get_stat() functions // - Can briercl be aggregated as a weighted average and used for bss? // - How should valid data thresholds be applied when reading -aggr data? // - Currently no way to aggregate anom_corr since CNTInfo::set(sl1l2) @@ -125,10 +124,21 @@ static void store_stat_all_sl1l2 (int, const SL1L2Info &); static void store_stat_all_sal1l2(int, const SL1L2Info &); static void store_stat_all_pct (int, const PCTInfo &); -static ConcatString build_nc_var_name_ctc(const ConcatString &, const CTSInfo &); -static ConcatString build_nc_var_name_sl1l2(const ConcatString &, const SL1L2Info &); -static ConcatString build_nc_var_name_sal1l2(const ConcatString &, const SL1L2Info &); -static ConcatString build_nc_var_name_cnt(const ConcatString &, const CNTInfo &); +static ConcatString build_nc_var_name_categorical( + STATLineType, const ConcatString &, + const CTSInfo &, double); +static ConcatString build_nc_var_name_multicategory( + STATLineType, const ConcatString &, + double); +static ConcatString build_nc_var_name_partialsums( + STATLineType, const ConcatString &, + const SL1L2Info &); +static ConcatString build_nc_var_name_continuous( + STATLineType, const ConcatString &, + const CNTInfo &, double); +static ConcatString build_nc_var_name_probabilistic( + STATLineType, const ConcatString &, + const PCTInfo &, double); static void setup_nc_file(const VarInfo *, const VarInfo *); static void add_nc_var(const ConcatString &, const ConcatString &, @@ -1328,7 +1338,9 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, for(int i=0; i " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } - // Construct the NetCDF variable name - var_name << cs_erase << "series_fho_" << c; - - // Append threshold information - if(cts_info.fthresh == cts_info.othresh) { - var_name << "_" << cts_info.fthresh.get_abbr_str(); - } - else { - var_name << "_fcst" << cts_info.fthresh.get_abbr_str() - << "_obs" << cts_info.othresh.get_abbr_str(); - } + ConcatString var_name(build_nc_var_name_categorical( + STATLineType::fho, c, + cts_info, bad_data_double)); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "FHO_" << c; + ConcatString lty_stat("FHO_"); + lty_stat << c; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], @@ -1638,7 +1634,8 @@ void store_stat_fho(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, + (float) cts_info.get_stat_fho(c)); return; } @@ -1647,7 +1644,6 @@ void store_stat_fho(int n, const ConcatString &col, void store_stat_ctc(int n, const ConcatString &col, const CTSInfo &cts_info) { - double v; // Set the column name to all upper case ConcatString c = to_upper(col); @@ -1655,22 +1651,10 @@ void store_stat_ctc(int n, const ConcatString &col, // Handle ALL columns if(c == all_columns) return store_stat_all_ctc(n, cts_info); - // Get the column value - if(c == "TOTAL") { v = cts_info.cts.n(); } - else if(c == "FY_OY") { v = cts_info.cts.fy_oy(); } - else if(c == "FY_ON") { v = cts_info.cts.fy_on(); } - else if(c == "FN_OY") { v = cts_info.cts.fn_oy(); } - else if(c == "FN_ON") { v = cts_info.cts.fn_on(); } - else if(c == "EC_VALUE") { v = cts_info.cts.ec_value(); } - else { - mlog << Error << "\nstore_stat_ctc() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } - // Construct the NetCDF variable name - ConcatString var_name(build_nc_var_name_ctc(c, cts_info)); + ConcatString var_name(build_nc_var_name_categorical( + STATLineType::ctc, c, + cts_info, bad_data_double)); // Add map for this variable name if(stat_data.count(var_name) == 0) { @@ -1687,7 +1671,8 @@ void store_stat_ctc(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, v); + put_nc_val(n, var_name, + (float) cts_info.get_stat_ctc(c)); return; } @@ -1696,157 +1681,44 @@ void store_stat_ctc(int n, const ConcatString &col, void store_stat_cts(int n, const ConcatString &col, const CTSInfo &cts_info) { - int i; - double v; - ConcatString lty_stat, var_name; - int n_ci = 1; // Set the column name to all upper case ConcatString c = to_upper(col); // Check for columns with normal or bootstrap confidence limits - if(strstr(c.c_str(), "_NC") || strstr(c.c_str(), "_BC")) n_ci = cts_info.n_alpha; - - // Loop over the alpha values, if necessary - for(i=0; i " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } + int n_alpha = 1; + if(is_ci_stat_name(c)) n_alpha = cts_info.n_alpha; - // Construct the NetCDF variable name - var_name << cs_erase << "series_cts_" << c; + // Loop over the alpha values + for(int i_alpha=0; i_alpha 1 ? cts_info.alpha[i_alpha] : bad_data_double); - // Append confidence interval alpha value - if(n_ci > 1) var_name << "_a" << cts_info.alpha[i]; + // Construct the NetCDF variable name + ConcatString var_name(build_nc_var_name_categorical( + STATLineType::cts, c, + cts_info, alpha)); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "CTS_" << c; + ConcatString lty_stat("CTS_"); + lty_stat << c; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], cts_info.fthresh.get_str(), cts_info.othresh.get_str(), - (n_ci > 1 ? cts_info.alpha[i] : bad_data_double)); + alpha); } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, + (float) cts_info.get_stat_cts(c, i_alpha)); - } // end for i + } // end for i_alpha return; } @@ -1855,58 +1727,28 @@ void store_stat_cts(int n, const ConcatString &col, void store_stat_mctc(int n, const ConcatString &col, const MCTSInfo &mcts_info) { - int i, j; - double v; - ConcatString lty_stat, var_name; - StringArray sa; // Set the column name to all upper case ConcatString c = to_upper(col); - ConcatString d = c; // Handle ALL columns if(c == all_columns) return store_stat_all_mctc(n, mcts_info); - // Get the column value - if(c == "TOTAL") { v = (double) mcts_info.cts.total(); } - else if(c == "N_CAT") { v = (double) mcts_info.cts.nrows(); } - else if(c == "EC_VALUE") { v = mcts_info.cts.ec_value(); } - else if(check_reg_exp("F[0-9]*_O[0-9]*", c.c_str())) { - - d = "FI_OJ"; - - // Parse column name to retrieve index values - sa = c.split("_"); - i = atoi(sa[0].c_str()+1) - 1; - j = atoi(sa[1].c_str()+1) - 1; - - // Range check - if(i < 0 || i >= mcts_info.cts.nrows() || - j < 0 || j >= mcts_info.cts.ncols()) { - mlog << Error << "\nstore_stat_mctc() -> " - << "range check error for column name requested \"" << c - << "\"\n\n"; - exit(1); - } - - // Retrieve the value - v = (double) mcts_info.cts.entry(i, j); - } - else { - mlog << Error << "\nstore_stat_mctc() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } - // Construct the NetCDF variable name - var_name << cs_erase << "series_mctc_" << c; + ConcatString var_name(build_nc_var_name_multicategory( + STATLineType::mctc, c, + bad_data_double)); + + // Store the data value + ConcatString col_name; + float v = (float) mcts_info.get_stat_mctc(c, col_name); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "MCTC_" << d; + ConcatString lty_stat("MCTC_"); + lty_stat << col_name; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], @@ -1916,7 +1758,7 @@ void store_stat_mctc(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, v); return; } @@ -1925,71 +1767,44 @@ void store_stat_mctc(int n, const ConcatString &col, void store_stat_mcts(int n, const ConcatString &col, const MCTSInfo &mcts_info) { - int i; - double v; - ConcatString lty_stat, var_name; - int n_ci = 1; // Set the column name to all upper case ConcatString c = to_upper(col); // Check for columns with normal or bootstrap confidence limits - if(strstr(c.c_str(), "_NC") || strstr(c.c_str(), "_BC")) n_ci = mcts_info.n_alpha; - - // Loop over the alpha values, if necessary - for(i=0; i " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } + int n_alpha = 1; + if(is_ci_stat_name(c)) n_alpha = mcts_info.n_alpha; - // Construct the NetCDF variable name - var_name << cs_erase << "series_mcts_" << c; + // Loop over the alpha values + for(int i_alpha=0; i_alpha 1) var_name << "_a" << mcts_info.alpha[i]; + // Store alpha value + double alpha = (n_alpha > 1 ? mcts_info.alpha[i_alpha] : bad_data_double); + + // Construct the NetCDF variable name + ConcatString var_name(build_nc_var_name_multicategory( + STATLineType::mcts, c, + alpha)); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "MCTS_" << c; + ConcatString lty_stat("MCTS_"); + lty_stat << c; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], mcts_info.fthresh.get_str(","), mcts_info.othresh.get_str(","), - (n_ci > 1 ? mcts_info.alpha[i] : bad_data_double)); + alpha); } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, + (float) mcts_info.get_stat_mcts(c, i_alpha)); - } // end for i + } // end for i_alpha return; } @@ -1998,154 +1813,44 @@ void store_stat_mcts(int n, const ConcatString &col, void store_stat_cnt(int n, const ConcatString &col, const CNTInfo &cnt_info) { - int i; - double v; - ConcatString lty_stat, var_name; - int n_ci = 1; // Set the column name to all upper case ConcatString c = to_upper(col); // Check for columns with normal or bootstrap confidence limits - if(strstr(c.c_str(), "_NC") || strstr(c.c_str(), "_BC")) n_ci = cnt_info.n_alpha; - - // Loop over the alpha values, if necessary - for(i=0; i " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } + int n_alpha = 1; + if(is_ci_stat_name(c)) n_alpha = cnt_info.n_alpha; - // Construct the NetCDF variable name - ConcatString var_name(build_nc_var_name_cnt(c, cnt_info)); + // Loop over the alpha values + for(int i_alpha=0; i_alpha 1) var_name << "_a" << cnt_info.alpha[i]; + // Store alpha value + double alpha = (n_alpha > 1 ? cnt_info.alpha[i_alpha] : bad_data_double); + + // Construct the NetCDF variable name + ConcatString var_name(build_nc_var_name_continuous( + STATLineType::cnt, c, + cnt_info, alpha)); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "CNT_" << c; + ConcatString lty_stat("CNT_"); + lty_stat << c; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], cnt_info.fthresh.get_str(), cnt_info.othresh.get_str(), - (n_ci > 1 ? cnt_info.alpha[i] : bad_data_double)); + alpha); } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, + (float) cnt_info.get_stat_cnt(c, i_alpha)); - } // end for i + } // end for i_alpha return; } @@ -2154,7 +1859,6 @@ void store_stat_cnt(int n, const ConcatString &col, void store_stat_sl1l2(int n, const ConcatString &col, const SL1L2Info &s_info) { - double v; // Set the column name to all upper case ConcatString c = to_upper(col); @@ -2162,23 +1866,10 @@ void store_stat_sl1l2(int n, const ConcatString &col, // Handle ALL columns if(c == all_columns) return store_stat_all_sl1l2(n, s_info); - // Get the column value - if(c == "TOTAL") { v = (double) s_info.scount; } - else if(c == "FBAR") { v = s_info.fbar; } - else if(c == "OBAR") { v = s_info.obar; } - else if(c == "FOBAR") { v = s_info.fobar; } - else if(c == "FFBAR") { v = s_info.ffbar; } - else if(c == "OOBAR") { v = s_info.oobar; } - else if(c == "MAE") { v = s_info.smae; } - else { - mlog << Error << "\nstore_stat_sl1l2() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } - // Construct the NetCDF variable name - ConcatString var_name(build_nc_var_name_sl1l2(c, s_info)); + ConcatString var_name(build_nc_var_name_partialsums( + STATLineType::sl1l2, c, + s_info)); // Add map for this variable name if(stat_data.count(var_name) == 0) { @@ -2195,7 +1886,8 @@ void store_stat_sl1l2(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, + (float) s_info.get_stat_sl1l2(c)); return; } @@ -2212,23 +1904,9 @@ void store_stat_sal1l2(int n, const ConcatString &col, // Handle ALL columns if(c == all_columns) return store_stat_all_sal1l2(n, s_info); - // Get the column value - if(c == "TOTAL") { v = (double) s_info.sacount; } - else if(c == "FABAR") { v = s_info.fabar; } - else if(c == "OABAR") { v = s_info.oabar; } - else if(c == "FOABAR") { v = s_info.foabar; } - else if(c == "FFABAR") { v = s_info.ffabar; } - else if(c == "OOABAR") { v = s_info.ooabar; } - else if(c == "MAE") { v = s_info.samae; } - else { - mlog << Error << "\nstore_stat_sal1l2() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } - - // Construct the NetCDF variable name - ConcatString var_name(build_nc_var_name_sal1l2(c, s_info)); + ConcatString var_name(build_nc_var_name_partialsums( + STATLineType::sal1l2, c, + s_info)); // Add map for this variable name if(stat_data.count(var_name) == 0) { @@ -2245,7 +1923,8 @@ void store_stat_sal1l2(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, + (float) s_info.get_stat_sal1l2(c.c_str())); return; } @@ -2254,56 +1933,28 @@ void store_stat_sal1l2(int n, const ConcatString &col, void store_stat_pct(int n, const ConcatString &col, const PCTInfo &pct_info) { - int i = 0; - double v; - ConcatString lty_stat, var_name; // Set the column name to all upper case ConcatString c = to_upper(col); - ConcatString d = c; // Handle ALL columns if(c == all_columns) return store_stat_all_pct(n, pct_info); - // Get index value for variable column numbers - if(check_reg_exp("_[0-9]", c.c_str())) { - - // Parse the index value from the column name - i = atoi(strrchr(c.c_str(), '_') + 1) - 1; - - // Range check - if(i < 0 || i >= pct_info.pct.nrows()) { - mlog << Error << "\nstore_stat_pct() -> " - << "range check error for column name requested \"" << c - << "\"\n\n"; - exit(1); - } - } // end if - - // Get the column value - if(c == "TOTAL") { v = (double) pct_info.pct.n(); } - else if(c == "N_THRESH") { v = (double) pct_info.pct.nrows() + 1; } - else if(check_reg_exp("THRESH_[0-9]", c.c_str())) { v = pct_info.pct.threshold(i); } - else if(check_reg_exp("OY_[0-9]", c.c_str())) { v = (double) pct_info.pct.event_count_by_row(i); - d = "OY_I"; } - else if(check_reg_exp("ON_[0-9]", c.c_str())) { v = (double) pct_info.pct.nonevent_count_by_row(i); - d = "ON_I"; } - else { - mlog << Error << "\nstore_stat_pct() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } - // Construct the NetCDF variable name - var_name << cs_erase << "series_pct_" << c - << "_obs" << pct_info.othresh.get_abbr_str(); + ConcatString var_name(build_nc_var_name_probabilistic( + STATLineType::pct, c, + pct_info, bad_data_double)); + + // Store the data value + ConcatString col_name; + float v = (float) pct_info.get_stat_pct(c, col_name); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "PCT_" << d; + ConcatString lty_stat("PCT_"); + lty_stat << col_name; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], @@ -2313,7 +1964,7 @@ void store_stat_pct(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, v); return; } @@ -2322,68 +1973,47 @@ void store_stat_pct(int n, const ConcatString &col, void store_stat_pstd(int n, const ConcatString &col, const PCTInfo &pct_info) { - int i; - double v; - ConcatString lty_stat, var_name; - int n_ci = 1; // Set the column name to all upper case ConcatString c = to_upper(col); // Check for columns with normal or bootstrap confidence limits - if(strstr(c.c_str(), "_NC") || strstr(c.c_str(), "_BC")) n_ci = pct_info.n_alpha; - - // Loop over the alpha values, if necessary - for(i=0; i " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } + int n_alpha = 1; + if(is_ci_stat_name(c)) n_alpha = pct_info.n_alpha; + + // Loop over the alpha values + for(int i_alpha=0; i_alpha 1 ? pct_info.alpha[i_alpha] : bad_data_double); // Construct the NetCDF variable name - var_name << cs_erase << "series_pstd_" << c; + ConcatString var_name(build_nc_var_name_probabilistic( + STATLineType::pstd, c, + pct_info, alpha)); - // Append confidence interval alpha value - if(n_ci > 1) var_name << "_a" << pct_info.alpha[i]; + // Store the data value + ConcatString col_name; + float v = (float) pct_info.get_stat_pstd(c, col_name); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "PSTD_" << c; + ConcatString lty_stat("PSTD_"); + lty_stat << col_name; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], pct_info.fthresh.get_str(","), pct_info.othresh.get_str(), - (n_ci > 1 ? pct_info.alpha[i] : bad_data_double)); + alpha); } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, v); - } // end for i + } // end for i_alpha return; } @@ -2392,66 +2022,25 @@ void store_stat_pstd(int n, const ConcatString &col, void store_stat_pjc(int n, const ConcatString &col, const PCTInfo &pct_info) { - int i = 0; - int tot; - double v; - ConcatString lty_stat, var_name; // Set the column name to all upper case ConcatString c = to_upper(col); - ConcatString d = c; - - // Get index value for variable column numbers - if(check_reg_exp("_[0-9]", c.c_str())) { - - // Parse the index value from the column name - i = atoi(strrchr(c.c_str(), '_') + 1) - 1; - - // Range check - if(i < 0 || i >= pct_info.pct.nrows()) { - mlog << Error << "\nstore_stat_pjc() -> " - << "range check error for column name requested \"" << c - << "\"\n\n"; - exit(1); - } - } // end if - - // Store the total count - tot = pct_info.pct.n(); - - // Get the column value - if(c == "TOTAL") { v = (double) tot; } - else if(c == "N_THRESH") { v = (double) pct_info.pct.nrows() + 1; } - else if(check_reg_exp("THRESH_[0-9]", c.c_str())) { v = pct_info.pct.threshold(i); - d = "THRESH_I"; } - else if(check_reg_exp("OY_TP_[0-9]", c.c_str())) { v = pct_info.pct.event_count_by_row(i)/(double) tot; - d = "OY_TP_I"; } - else if(check_reg_exp("ON_TP_[0-9]", c.c_str())) { v = pct_info.pct.nonevent_count_by_row(i)/(double) tot; - d = "ON_TP_I"; } - else if(check_reg_exp("CALIBRATION_[0-9]", c.c_str())) { v = pct_info.pct.row_calibration(i); - d = "CALIBRATION_I"; } - else if(check_reg_exp("REFINEMENT_[0-9]", c.c_str())) { v = pct_info.pct.row_refinement(i); - d = "REFINEMENT_I"; } - else if(check_reg_exp("LIKELIHOOD_[0-9]", c.c_str())) { v = pct_info.pct.row_event_likelihood(i); - d = "LIKELIHOOD_I"; } - else if(check_reg_exp("BASER_[0-9]", c.c_str())) { v = pct_info.pct.row_obar(i); - d = "BASER_I"; } - else { - mlog << Error << "\nstore_stat_pjc() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } // Construct the NetCDF variable name - var_name << cs_erase << "series_pjc_" << c - << "_obs" << pct_info.othresh.get_abbr_str(); + ConcatString var_name(build_nc_var_name_probabilistic( + STATLineType::pjc, c, + pct_info, bad_data_double)); + + // Store the data value + ConcatString col_name; + float v = (float) pct_info.get_stat_pct(c, col_name); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "PJC_" << d; + ConcatString lty_stat("PJC_"); + lty_stat << col_name; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], @@ -2461,7 +2050,7 @@ void store_stat_pjc(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, v); return; } @@ -2470,55 +2059,25 @@ void store_stat_pjc(int n, const ConcatString &col, void store_stat_prc(int n, const ConcatString &col, const PCTInfo &pct_info) { - int i = 0; - double v; - ConcatString lty_stat, var_name; - TTContingencyTable ct; // Set the column name to all upper case ConcatString c = to_upper(col); - ConcatString d = c; - // Get index value for variable column numbers - if(check_reg_exp("_[0-9]", c.c_str())) { - - // Parse the index value from the column name - i = atoi(strrchr(c.c_str(), '_') + 1) - 1; - - // Range check - if(i < 0 || i >= pct_info.pct.nrows()) { - mlog << Error << "\nstore_stat_prc() -> " - << "range check error for column name requested \"" << c - << "\"\n\n"; - exit(1); - } - - // Get the 2x2 contingency table for this row - ct = pct_info.pct.ctc_by_row(i); + // Construct the NetCDF variable name + ConcatString var_name(build_nc_var_name_probabilistic( + STATLineType::prc, c, + pct_info, bad_data_double)); - } // end if - - // Get the column value - if(c == "TOTAL") { v = (double) pct_info.pct.n(); } - else if(c == "N_THRESH") { v = (double) pct_info.pct.nrows() + 1; } - else if(check_reg_exp("THRESH_[0-9]", c.c_str())) { v = pct_info.pct.threshold(i); - d = "THRESH_I"; } - else if(check_reg_exp("PODY_[0-9]", c.c_str())) { v = ct.pod_yes(); - d = "PODY_I"; } - else if(check_reg_exp("POFD_[0-9]", c.c_str())) { v = ct.pofd(); - d = "POFD_I"; } - else { - mlog << Error << "\nstore_stat_prc() -> " - << "unsupported column name requested \"" << c - << "\"\n\n"; - exit(1); - } + // Store the data value + ConcatString col_name; + float v = (float) pct_info.get_stat_pct(c, col_name); // Add map for this variable name if(stat_data.count(var_name) == 0) { // Build key - lty_stat << "PRC_" << d; + ConcatString lty_stat("PRC_"); + lty_stat << col_name; // Add new map entry add_nc_var(var_name, c, stat_long_name[lty_stat], @@ -2528,7 +2087,7 @@ void store_stat_prc(int n, const ConcatString &col, } // Store the statistic value - put_nc_val(n, var_name, (float) v); + put_nc_val(n, var_name, v); return; } @@ -2577,12 +2136,13 @@ void store_stat_all_pct(int n, const PCTInfo &pct_info) { //////////////////////////////////////////////////////////////////////// -ConcatString build_nc_var_name_ctc(const ConcatString &col, - const CTSInfo &cts_info) { +ConcatString build_nc_var_name_categorical( + STATLineType lt, const ConcatString &col, + const CTSInfo &cts_info, double alpha) { // Append the column name - ConcatString var_name("series_ctc_"); - var_name << col; + ConcatString var_name("series_"); + var_name << to_lower(statlinetype_to_string(lt)) << "_" << col; // Append threshold information if(cts_info.fthresh == cts_info.othresh) { @@ -2593,37 +2153,37 @@ ConcatString build_nc_var_name_ctc(const ConcatString &col, << "_obs" << cts_info.othresh.get_abbr_str(); } + // Append confidence interval alpha value + if(!is_bad_data(alpha)) var_name << "_a" << alpha; + return var_name; } //////////////////////////////////////////////////////////////////////// -ConcatString build_nc_var_name_sl1l2(const ConcatString &col, - const SL1L2Info &s_info) { +ConcatString build_nc_var_name_multicategory( + STATLineType lt, const ConcatString &col, + double alpha) { // Append the column name - ConcatString var_name("series_sl1l2_"); - var_name << col; + ConcatString var_name("series_"); + var_name << to_lower(statlinetype_to_string(lt)) << "_" << col; - // Append threshold information, if supplied - if(s_info.fthresh.get_type() != thresh_na || - s_info.othresh.get_type() != thresh_na) { - var_name << "_fcst" << s_info.fthresh.get_abbr_str() - << "_" << setlogic_to_abbr(s_info.logic) - << "_obs" << s_info.othresh.get_abbr_str(); - } + // Append confidence interval alpha value + if(!is_bad_data(alpha)) var_name << "_a" << alpha; return var_name; } //////////////////////////////////////////////////////////////////////// -ConcatString build_nc_var_name_sal1l2(const ConcatString &col, - const SL1L2Info &s_info) { +ConcatString build_nc_var_name_partialsums( + STATLineType lt, const ConcatString &col, + const SL1L2Info &s_info) { // Append the column name - ConcatString var_name("series_sal1l2_"); - var_name << col; + ConcatString var_name("series_"); + var_name << to_lower(statlinetype_to_string(lt)) << "_" << col; // Append threshold information, if supplied if(s_info.fthresh.get_type() != thresh_na || @@ -2638,12 +2198,13 @@ ConcatString build_nc_var_name_sal1l2(const ConcatString &col, //////////////////////////////////////////////////////////////////////// -ConcatString build_nc_var_name_cnt(const ConcatString &col, - const CNTInfo &cnt_info) { +ConcatString build_nc_var_name_continuous( + STATLineType lt, const ConcatString &col, + const CNTInfo &cnt_info, double alpha) { // Append the column name - ConcatString var_name("series_cnt_"); - var_name << col; + ConcatString var_name("series_"); + var_name << to_lower(statlinetype_to_string(lt)) << "_" << col; // Append threshold information, if supplied if(cnt_info.fthresh.get_type() != thresh_na || @@ -2653,6 +2214,28 @@ ConcatString build_nc_var_name_cnt(const ConcatString &col, << "_obs" << cnt_info.othresh.get_abbr_str(); } + // Append confidence interval alpha value + if(!is_bad_data(alpha)) var_name << "_a" << alpha; + + return var_name; +} + +//////////////////////////////////////////////////////////////////////// + +ConcatString build_nc_var_name_probabilistic( + STATLineType lt, const ConcatString &col, + const PCTInfo &pct_info, double alpha) { + + // Append the column name + ConcatString var_name("series_"); + var_name << to_lower(statlinetype_to_string(lt)) << "_" << col; + + // Append the observation threshold + var_name << "_obs" << pct_info.othresh.get_abbr_str(); + + // Append confidence interval alpha value + if(!is_bad_data(alpha)) var_name << "_a" << alpha; + return var_name; } diff --git a/src/tools/core/stat_analysis/aggr_stat_line.cc b/src/tools/core/stat_analysis/aggr_stat_line.cc index 441cbe5e07..b9f551e1bf 100644 --- a/src/tools/core/stat_analysis/aggr_stat_line.cc +++ b/src/tools/core/stat_analysis/aggr_stat_line.cc @@ -731,19 +731,19 @@ void aggr_summary_lines(LineDataFile &f, STATAnalysisJob &job, // if((line.type() == STATLineType::fho || line.type() == STATLineType::ctc) && lty == STATLineType::cts) { - v = cts_info.get_stat(req_col[i].c_str()); + v = cts_info.get_stat_cts(req_col[i].c_str()); w = cts_info.cts.n(); } else if(line.type() == STATLineType::sl1l2 && lty == STATLineType::cnt) { - v = cnt_info.get_stat(req_col[i].c_str()); + v = cnt_info.get_stat_cnt(req_col[i].c_str()); w = cnt_info.n; } else if(line.type() == STATLineType::sal1l2 && lty == STATLineType::cnt) { - v = cnt_info.get_stat(req_col[i].c_str()); + v = cnt_info.get_stat_cnt(req_col[i].c_str()); w = cnt_info.n; } else if(line.type() == STATLineType::vl1l2 && lty == STATLineType::vcnt) { - v = vl1l2_info.get_stat(req_col[i].c_str()); + v = vl1l2_info.get_stat_vcnt(req_col[i].c_str()); w = (is_vector_dir_stat(line.type(), req_col[i].c_str()) ? vl1l2_info.dcount : vl1l2_info.vcount); diff --git a/src/tools/core/stat_analysis/skill_score_index_job.cc b/src/tools/core/stat_analysis/skill_score_index_job.cc index 9651e50a0d..13b837cac2 100644 --- a/src/tools/core/stat_analysis/skill_score_index_job.cc +++ b/src/tools/core/stat_analysis/skill_score_index_job.cc @@ -246,16 +246,16 @@ SSIDXData SSIndexJobInfo::compute_ss_index() { // Continuous stats if(job_lt[i] == STATLineType::sl1l2) { compute_cntinfo(fcst_sl1l2[i], 0, fcst_cnt); - fcst_stat = fcst_cnt.get_stat(fcst_job[i].column[0].c_str()); + fcst_stat = fcst_cnt.get_stat_cnt(fcst_job[i].column[0].c_str()); compute_cntinfo(ref_sl1l2[i], 0, ref_cnt); - ref_stat = ref_cnt.get_stat(fcst_job[i].column[0].c_str()); + ref_stat = ref_cnt.get_stat_cnt(fcst_job[i].column[0].c_str()); } // Categorical stats else if(job_lt[i] == STATLineType::ctc) { fcst_cts[i].compute_stats(); - fcst_stat = fcst_cts[i].get_stat(fcst_job[i].column[0].c_str()); + fcst_stat = fcst_cts[i].get_stat_cts(fcst_job[i].column[0].c_str()); ref_cts[i].compute_stats(); - ref_stat = ref_cts[i].get_stat(fcst_job[i].column[0].c_str()); + ref_stat = ref_cts[i].get_stat_cts(fcst_job[i].column[0].c_str()); } else { mlog << Error << "\nSSIndexJobInfo::compute_ss_index() -> " From 48eb23c69b97a4f71c66bd55bc012ffad86b43bb Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Mon, 12 Aug 2024 19:06:52 -0600 Subject: [PATCH 14/41] Per #1371, work in progress. More consolidation --- src/libcode/vx_statistics/met_stats.cc | 103 +++- src/libcode/vx_statistics/met_stats.h | 8 +- .../core/series_analysis/series_analysis.cc | 528 +++++------------- .../core/stat_analysis/aggr_stat_line.cc | 4 +- .../stat_analysis/skill_score_index_job.cc | 4 +- 5 files changed, 265 insertions(+), 382 deletions(-) diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index 6af346c014..2d69fa672b 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -425,6 +425,38 @@ void CTSInfo::compute_ci() { //////////////////////////////////////////////////////////////////////// +void CTSInfo::set_stat_ctc(const string &stat_name, double v) { + + if(stat_name == "FY_OY") cts.set_fy_oy(nint(v)); + else if(stat_name == "FY_ON") cts.set_fy_on(nint(v)); + else if(stat_name == "FN_OY") cts.set_fn_oy(nint(v)); + else if(stat_name == "FN_ON") cts.set_fn_on(nint(v)); + else if(stat_name == "EC_VALUE") cts.set_ec_value(v); + + return; +} + +//////////////////////////////////////////////////////////////////////// + +double CTSInfo::get_stat(STATLineType lt, const string &stat_name, int i_alpha) const { + double v = bad_data_double; + + // Get statistic by line type + if(lt == STATLineType::fho) v = get_stat_fho(stat_name); + else if(lt == STATLineType::ctc) v = get_stat_ctc(stat_name); + else if(lt == STATLineType::cts) v = get_stat_cts(stat_name, i_alpha); + else { + mlog << Error << "\nCTSInfo::get_stat() -> " + << "unexpected line type \"" << statlinetype_to_string(lt) + << "\"!\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + double CTSInfo::get_stat_fho(const string &stat_name) const { double v = bad_data_double; @@ -790,6 +822,30 @@ void MCTSInfo::compute_ci() { //////////////////////////////////////////////////////////////////////// +double MCTSInfo::get_stat(STATLineType lt, + const string &stat_name, + ConcatString &col_name, + int i_alpha) const { + double v = bad_data_double; + + // Initialize + col_name = stat_name; + + // Get statistic by line type + if(lt == STATLineType::mctc) v = get_stat_mctc(stat_name, col_name); + else if(lt == STATLineType::mcts) v = get_stat_mcts(stat_name, i_alpha); + else { + mlog << Error << "\nMCTSInfo::get_stat() -> " + << "unexpected line type \"" << statlinetype_to_string(lt) + << "\"!\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + double MCTSInfo::get_stat_mctc(const string &stat_name, ConcatString &col_name) const { double v = bad_data_double; @@ -1250,12 +1306,12 @@ void CNTInfo::compute_ci() { //////////////////////////////////////////////////////////////////////// -double CNTInfo::get_stat_cnt(const string &stat_name, int i_alpha) const { +double CNTInfo::get_stat(const string &stat_name, int i_alpha) const { double v = bad_data_double; // Range check alpha index if(i_alpha >= n_alpha && is_ci_stat_name(stat_name)) { - mlog << Error << "\nCNTInfo::get_stat_cnt() -> " + mlog << Error << "\nCNTInfo::get_stat() -> " << "alpha index out of range (" << i_alpha << " >= " << n_alpha << ")!\n\n"; exit(1); @@ -1366,7 +1422,7 @@ double CNTInfo::get_stat_cnt(const string &stat_name, int i_alpha) const { else if(stat_name == "SI_BCL" ) v = si.v_bcl[i_alpha]; else if(stat_name == "SI_BCU" ) v = si.v_bcu[i_alpha]; else { - mlog << Error << "\nCNTInfo::get_stat_cnt() -> " + mlog << Error << "\nCNTInfo::get_stat() -> " << "unknown continuous statistic name \"" << stat_name << "\"!\n\n"; exit(1); @@ -1655,6 +1711,24 @@ void SL1L2Info::set_stat_sal1l2(const string &stat_name, double v) { //////////////////////////////////////////////////////////////////////// +double SL1L2Info::get_stat(STATLineType lt, const string &stat_name) const { + double v = bad_data_double; + + // Get statistic by line type + if(lt == STATLineType::sl1l2) v = get_stat_sl1l2(stat_name); + else if(lt == STATLineType::sal1l2) v = get_stat_sal1l2(stat_name); + else { + mlog << Error << "\nSL1L2Info::get_stat() -> " + << "unexpected line type \"" << statlinetype_to_string(lt) + << "\"!\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + double SL1L2Info::get_stat_sl1l2(const string &stat_name) const { double v = bad_data_double; @@ -3209,6 +3283,29 @@ void PCTInfo::compute_ci() { //////////////////////////////////////////////////////////////////////// +double PCTInfo::get_stat(STATLineType lt, + const string &stat_name, + ConcatString &col_name, + int i_alpha) const { + double v = bad_data_double; + + // Get statistic by line type + if(lt == STATLineType::pct) v = get_stat_pct(stat_name, col_name); + else if(lt == STATLineType::pjc) v = get_stat_pjc(stat_name, col_name); + else if(lt == STATLineType::prc) v = get_stat_prc(stat_name, col_name); + else if(lt == STATLineType::pstd) v = get_stat_pstd(stat_name, col_name, i_alpha); + else { + mlog << Error << "\nPCTInfo::get_stat() -> " + << "unexpected line type \"" << statlinetype_to_string(lt) + << "\"!\n\n"; + exit(1); + } + + return v; +} + +//////////////////////////////////////////////////////////////////////// + double PCTInfo::get_stat_pct(const string &stat_name, ConcatString &col_name) const { int i = 0; diff --git a/src/libcode/vx_statistics/met_stats.h b/src/libcode/vx_statistics/met_stats.h index 3d6891f8b2..5b2939b74d 100644 --- a/src/libcode/vx_statistics/met_stats.h +++ b/src/libcode/vx_statistics/met_stats.h @@ -97,6 +97,9 @@ class CTSInfo { void compute_stats(); void compute_ci(); + void set_stat_ctc(const std::string &, double); + + double get_stat(STATLineType, const std::string &, int i_alpha=0) const; double get_stat_fho(const std::string &) const; double get_stat_ctc(const std::string &) const; double get_stat_cts(const std::string &, int i_alpha=0) const; @@ -139,6 +142,7 @@ class MCTSInfo { void compute_stats(); void compute_ci(); + double get_stat(STATLineType, const std::string &, ConcatString &, int i_alpha=0) const; double get_stat_mctc(const std::string &, ConcatString &) const; double get_stat_mcts(const std::string &, int i_alpha=0) const; }; @@ -197,7 +201,7 @@ class CNTInfo { void allocate_n_alpha(int); void compute_ci(); - double get_stat_cnt(const std::string &, int i_alpha=0) const; + double get_stat(const std::string &, int i_alpha=0) const; }; //////////////////////////////////////////////////////////////////////// @@ -248,6 +252,7 @@ class SL1L2Info { void set_stat_sl1l2(const std::string &, double); void set_stat_sal1l2(const std::string &, double); + double get_stat(STATLineType, const std::string &) const; double get_stat_sl1l2(const std::string &) const; double get_stat_sal1l2(const std::string &) const; }; @@ -572,6 +577,7 @@ class PCTInfo { void compute_stats(); void compute_ci(); + double get_stat(STATLineType, const std::string &, ConcatString &, int i_alpha=0) const; double get_stat_pct(const std::string &, ConcatString &) const; double get_stat_pjc(const std::string &, ConcatString &) const; double get_stat_prc(const std::string &, ConcatString &) const; diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index a03b00d17e..c2b837d841 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -99,24 +99,27 @@ static void do_probabilistic (int, const PairDataPoint *); // - Currently no way to aggregate anom_corr since CNTInfo::set(sl1l2) // doesn't support it. -static void read_aggr_ctc (int, const CTSInfo &, TTContingencyTable &); -static void read_aggr_mctc (int, const MCTSInfo &, ContingencyTable &); +static void read_aggr_ctc (int, const CTSInfo &, CTSInfo &); +static void read_aggr_mctc (int, const MCTSInfo &, MCTSInfo &); static void read_aggr_sl1l2 (int, const SL1L2Info &, SL1L2Info &); static void read_aggr_sal1l2 (int, const SL1L2Info &, SL1L2Info &); -static void read_aggr_pct (int, const PCTInfo &, Nx2ContingencyTable &); - -static void store_stat_fho (int, const ConcatString &, const CTSInfo &); -static void store_stat_ctc (int, const ConcatString &, const CTSInfo &); -static void store_stat_cts (int, const ConcatString &, const CTSInfo &); -static void store_stat_mctc (int, const ConcatString &, const MCTSInfo &); -static void store_stat_mcts (int, const ConcatString &, const MCTSInfo &); -static void store_stat_cnt (int, const ConcatString &, const CNTInfo &); -static void store_stat_sl1l2 (int, const ConcatString &, const SL1L2Info &); -static void store_stat_sal1l2(int, const ConcatString &, const SL1L2Info &); -static void store_stat_pct (int, const ConcatString &, const PCTInfo &); -static void store_stat_pstd (int, const ConcatString &, const PCTInfo &); -static void store_stat_pjc (int, const ConcatString &, const PCTInfo &); -static void store_stat_prc (int, const ConcatString &, const PCTInfo &); +static void read_aggr_pct (int, const PCTInfo &, PCTInfo &); + +static void store_stat_categorical(int, + STATLineType, const ConcatString &, + const CTSInfo &); +static void store_stat_multicategory(int, + STATLineType, const ConcatString &, + const MCTSInfo &); +static void store_stat_partialsums(int, + STATLineType, const ConcatString &, + const SL1L2Info &); +static void store_stat_continuous(int, + STATLineType, const ConcatString &, + const CNTInfo &); +static void store_stat_probabilistic(int, + STATLineType, const ConcatString &, + const PCTInfo &); static void store_stat_all_ctc (int, const CTSInfo &); static void store_stat_all_mctc (int, const MCTSInfo &); @@ -1064,11 +1067,11 @@ void do_categorical(int n, const PairDataPoint *pd_ptr) { compute_ctsinfo(*pd_ptr, i_na, false, false, cts_info[i]); // Read the CTC data to be aggregated - TTContingencyTable aggr_ctc; - read_aggr_ctc(n, cts_info[i], aggr_ctc); + CTSInfo aggr_cts; + read_aggr_ctc(n, cts_info[i], aggr_cts); // Aggregate CTC counts - cts_info[i].cts += aggr_ctc; + cts_info[i].cts += aggr_cts.cts; // Compute statistics and confidence intervals cts_info[i].compute_stats(); @@ -1096,20 +1099,23 @@ void do_categorical(int n, const PairDataPoint *pd_ptr) { // Add statistic value for each possible FHO column for(int j=0; j " << "the number of MCTC categories do not match (" - << nint(v) << " != " << aggr_ctc.nrows() << ")!\n\n"; + << nint(v) << " != " << aggr_mcts.cts.nrows() << ")!\n\n"; exit(1); } // Check the expected correct else if(c == "EC_VALUE" && !is_bad_data(v) && - !is_eq(v, aggr_ctc.ec_value(), loose_tol)) { + !is_eq(v, aggr_mcts.cts.ec_value(), loose_tol)) { mlog << Error << "\nread_aggr_mctc() -> " << "the MCTC expected correct values do not match (" - << v << " != " << aggr_ctc.ec_value() << ")!\n\n"; + << v << " != " << aggr_mcts.cts.ec_value() << ")!\n\n"; exit(1); } // Populate the MCTC table @@ -1467,7 +1478,7 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, StringArray sa(c.split("_")); int i_row = atoi(sa[0].c_str()+1) - 1; int i_col = atoi(sa[1].c_str()+1) - 1; - aggr_ctc.set_entry(i_row, i_col, nint(v)); + aggr_mcts.cts.set_entry(i_row, i_col, nint(v)); } } // end for i @@ -1477,21 +1488,23 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, //////////////////////////////////////////////////////////////////////// void read_aggr_pct(int n, const PCTInfo &pct_info, - Nx2ContingencyTable &aggr_pct) { + PCTInfo &aggr_pct) { // Initialize - aggr_pct = pct_info.pct; - aggr_pct.zero_out(); + aggr_pct.pct = pct_info.pct; + aggr_pct.pct.zero_out(); // Get PCT column names - StringArray pct_cols(get_pct_columns(aggr_pct.nrows())); + StringArray pct_cols(get_pct_columns(aggr_pct.pct.nrows())); // Loop over the PCT colum names for(int i=0; i " << "the number of PCT categories do not match (" - << nint(v)+1 << " != " << aggr_pct.nrows() << ")!\n\n"; + << nint(v)+1 << " != " << aggr_pct.pct.nrows() << ")!\n\n"; exit(1); } // Set the event counts @@ -1514,14 +1527,14 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, // Parse the index value from the column name int i_row = atoi(strrchr(c.c_str(), '_') + 1) - 1; - aggr_pct.set_event(i_row, nint(v)); + aggr_pct.pct.set_event(i_row, nint(v)); } // Set the non-event counts else if(check_reg_exp("ON_[0-9]", c.c_str())) { // Parse the index value from the column name int i_row = atoi(strrchr(c.c_str(), '_') + 1) - 1; - aggr_pct.set_nonevent(i_row, nint(v)); + aggr_pct.pct.set_nonevent(i_row, nint(v)); } } // end for i @@ -1559,11 +1572,11 @@ void do_probabilistic(int n, const PairDataPoint *pd_ptr) { compute_pctinfo(*pd_ptr, false, pct_info); // Read the PCT data to be aggregated - Nx2ContingencyTable aggr_pct; + PCTInfo aggr_pct; read_aggr_pct(n, pct_info, aggr_pct); // Aggregate PCT counts - pct_info.pct += aggr_pct; + pct_info.pct += aggr_pct.pct; // Zero out the climatology PCT table which cannot be aggregated pct_info.climo_pct.zero_out(); @@ -1580,26 +1593,30 @@ void do_probabilistic(int n, const PairDataPoint *pd_ptr) { // Add statistic value for each possible PCT column for(j=0; j Date: Tue, 13 Aug 2024 09:17:47 -0600 Subject: [PATCH 15/41] Per #1371, correct expected output file name --- internal/test_unit/xml/unit_series_analysis.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/internal/test_unit/xml/unit_series_analysis.xml b/internal/test_unit/xml/unit_series_analysis.xml index 036b1e7607..997b1cadcc 100644 --- a/internal/test_unit/xml/unit_series_analysis.xml +++ b/internal/test_unit/xml/unit_series_analysis.xml @@ -93,7 +93,7 @@ -v 1 - &OUTPUT_DIR;/series_analysis/series_analysis_AGGREGATE_APCP_06_2012040900_to_2012041100.nc + &OUTPUT_DIR;/series_analysis/series_analysis_AGGREGATE_APCP_06_2012040900_to_2012041012.nc From a6ffe0668b14cfd8c492b7499d83387c7686d38e Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Tue, 13 Aug 2024 10:01:12 -0600 Subject: [PATCH 16/41] Per #1371, consistent regridding log messages and fix the Series-Analysis PairDataPoint object handling logic. --- src/libcode/vx_statistics/read_climo.cc | 4 +-- .../core/series_analysis/series_analysis.cc | 35 ++++++++++--------- 2 files changed, 21 insertions(+), 18 deletions(-) diff --git a/src/libcode/vx_statistics/read_climo.cc b/src/libcode/vx_statistics/read_climo.cc index 8e43749a8d..cecd96a029 100644 --- a/src/libcode/vx_statistics/read_climo.cc +++ b/src/libcode/vx_statistics/read_climo.cc @@ -221,9 +221,9 @@ void read_climo_file(const char *climo_file, GrdFileType ctype, // Regrid, if needed if(!(mtddf->grid() == vx_grid)) { - mlog << Debug(2) << "Regridding " << clm_ut_cs << " \"" + mlog << Debug(2) << "Regridding climatology " << clm_ut_cs << " \"" << info->magic_str() - << "\" climatology field to the verification grid.\n"; + << "\" to the verification grid.\n"; dp = met_regrid(clm_dpa[i], mtddf->grid(), vx_grid, regrid_info); } diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index c2b837d841..8906c72384 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -561,8 +561,8 @@ void get_series_data(int i_series, exit(1); } - mlog << Debug(3) - << "Regridding field " << fcst_info->magic_str() + mlog << Debug(2) + << "Regridding forecast " << fcst_info->magic_str() << " to the verification grid.\n"; fcst_dp = met_regrid(fcst_dp, fcst_grid, grid, fcst_info->regrid()); @@ -581,8 +581,8 @@ void get_series_data(int i_series, exit(1); } - mlog << Debug(3) - << "Regridding field " << obs_info->magic_str() + mlog << Debug(2) + << "Regridding observation " << obs_info->magic_str() << " to the verification grid.\n"; obs_dp = met_regrid(obs_dp, obs_grid, grid, obs_info->regrid()); @@ -813,6 +813,7 @@ void process_scores() { VarInfo *fcst_info = (VarInfo *) nullptr; VarInfo *obs_info = (VarInfo *) nullptr; DataPlane fcst_dp, obs_dp; + vector pd_block; const char *method_name = "process_scores() "; // Climatology mean and standard deviation @@ -823,20 +824,9 @@ void process_scores() { int n_skip_zero = 0; int n_skip_pos = 0; - // Create a vector of PairDataPoint objects - vector pd_block; - pd_block.resize(conf_info.block_size); - for(auto &x : pd_block) x.extend(n_series); - // Loop over the data reads for(int i_read=0; i_read Date: Tue, 13 Aug 2024 14:16:35 -0600 Subject: [PATCH 17/41] Per #1371, check the return status when opening the aggregate file. --- src/tools/core/series_analysis/series_analysis.cc | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 8906c72384..6bd5a72c54 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -737,7 +737,12 @@ DataPlane get_aggr_data(const ConcatString &var_name) { mlog << Debug(1) << "Reading aggregate data file: " << aggr_file << "\n"; - aggr_nc.open(aggr_file.c_str()); + if(!aggr_nc.open(aggr_file.c_str())) { + mlog << Error << "\nget_aggr_data() -> " + << "unable to open the aggregate NetCDF file \"" + << aggr_file << "\"\n\n"; + exit(1); + } // Update timing info based on aggregate file global attributes ConcatString cs; From 0aec0ca225000512273b4f6d91d646a6d8bd9726 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Tue, 13 Aug 2024 14:22:43 -0600 Subject: [PATCH 18/41] Per #1371, fix prc/pjc typo --- src/tools/core/series_analysis/series_analysis.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 6bd5a72c54..5417a6dd23 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -1615,7 +1615,7 @@ void do_probabilistic(int n, const PairDataPoint *pd_ptr) { // Add statistic value for each possible PJC column for(j=0; j Date: Tue, 13 Aug 2024 15:00:25 -0600 Subject: [PATCH 19/41] Per #1371, fix the series_analysis PCT aggregation logic and add a test to unit_series_analysis.xml to demonstrate. --- .../test_unit/xml/unit_series_analysis.xml | 66 +++++++++++++++---- src/libcode/vx_statistics/met_stats.cc | 4 +- .../core/series_analysis/series_analysis.cc | 16 ++--- 3 files changed, 59 insertions(+), 27 deletions(-) diff --git a/internal/test_unit/xml/unit_series_analysis.xml b/internal/test_unit/xml/unit_series_analysis.xml index 997b1cadcc..98e39cd5b8 100644 --- a/internal/test_unit/xml/unit_series_analysis.xml +++ b/internal/test_unit/xml/unit_series_analysis.xml @@ -59,7 +59,7 @@ - + &MET_BIN;/series_analysis MODEL GFS @@ -88,12 +88,12 @@ -obs &DATA_DIR_OBS;/stage4_hmt/stage4_2012041006_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012041012_06h.grib \ -aggr &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041000.nc \ - -out &OUTPUT_DIR;/series_analysis/series_analysis_AGGREGATE_APCP_06_2012040900_to_2012041012.nc \ + -out &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_CMD_LINE_APCP_06_2012040900_to_2012041012.nc \ -config &CONFIG_DIR;/SeriesAnalysisConfig \ -v 1 - &OUTPUT_DIR;/series_analysis/series_analysis_AGGREGATE_APCP_06_2012040900_to_2012041012.nc + &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_CMD_LINE_APCP_06_2012040900_to_2012041012.nc @@ -101,18 +101,12 @@ echo "&DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F009.grib \ &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F015.grib \ &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F021.grib \ - &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F027.grib \ - &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F033.grib \ - &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F039.grib \ - &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F045.grib" \ + &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F027.grib" \ > &OUTPUT_DIR;/series_analysis/input_fcst_file_list; \ echo "&DATA_DIR_OBS;/stage4_hmt/stage4_2012040906_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012040912_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012040918_06h.grib \ - &DATA_DIR_OBS;/stage4_hmt/stage4_2012041000_06h.grib \ - &DATA_DIR_OBS;/stage4_hmt/stage4_2012041006_06h.grib \ - &DATA_DIR_OBS;/stage4_hmt/stage4_2012041012_06h.grib \ - &DATA_DIR_OBS;/stage4_hmt/stage4_2012041018_06h.grib" \ + &DATA_DIR_OBS;/stage4_hmt/stage4_2012041000_06h.grib" \ > &OUTPUT_DIR;/series_analysis/input_obs_file_list; \ &MET_BIN;/series_analysis @@ -131,7 +125,7 @@ CNT_STATS SL1L2_STATS SAL1L2_STATS - PCT_STATS "OY_1", "ON_1" + PCT_STATS "ALL" PSTD_STATS "TOTAL", "ROC_AUC", "BRIER", "BRIER_NCL", "BRIER_NCU" PJC_STATS "CALIBRATION_1", "REFINEMENT_1" PRC_STATS "PODY_1", "POFD_1" @@ -139,12 +133,56 @@ \ -fcst &OUTPUT_DIR;/series_analysis/input_fcst_file_list \ -obs &OUTPUT_DIR;/series_analysis/input_obs_file_list \ - -out &OUTPUT_DIR;/series_analysis/series_analysis_FILE_LIST_PROB_APCP_06_2012040900_to_2012041100.nc \ + -out &OUTPUT_DIR;/series_analysis/series_analysis_FILE_LIST_PROB_APCP_06_2012040900_to_2012041000.nc \ + -config &CONFIG_DIR;/SeriesAnalysisConfig \ + -v 1 + + + &OUTPUT_DIR;/series_analysis/series_analysis_FILE_LIST_PROB_APCP_06_2012040900_to_2012041000.nc + + + + + echo "&DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F033.grib \ + &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F039.grib \ + &DATA_DIR_MODEL;/grib1/sref/sref_2012040821_F045.grib" \ + > &OUTPUT_DIR;/series_analysis/aggregate_fcst_file_list; \ + echo "&DATA_DIR_OBS;/stage4_hmt/stage4_2012041006_06h.grib \ + &DATA_DIR_OBS;/stage4_hmt/stage4_2012041012_06h.grib \ + &DATA_DIR_OBS;/stage4_hmt/stage4_2012041018_06h.grib" \ + > &OUTPUT_DIR;/series_analysis/aggregate_obs_file_list; \ + &MET_BIN;/series_analysis + + MODEL SREF + OBTYPE STAGE4 + FCST_CAT_THRESH >=0.00, >=0.25, >=0.50, >=0.75, >=1.00 + FCST_FIELD { name = "PROB"; level = "A06"; prob = { name = "APCP"; thresh_lo = 0.25; }; } + OBS_CAT_THRESH >0.25 + OBS_FIELD { name = "APCP"; level = "A06"; } + MASK_POLY + FHO_STATS + CTC_STATS + CTS_STATS + MCTC_STATS + MCTS_STATS + CNT_STATS + SL1L2_STATS + SAL1L2_STATS + PCT_STATS "ALL" + PSTD_STATS "TOTAL", "ROC_AUC", "BRIER", "BRIER_NCL", "BRIER_NCU" + PJC_STATS "CALIBRATION_1", "REFINEMENT_1" + PRC_STATS "PODY_1", "POFD_1" + + \ + -fcst &OUTPUT_DIR;/series_analysis/aggregate_fcst_file_list \ + -obs &OUTPUT_DIR;/series_analysis/aggregate_obs_file_list \ + -aggr &OUTPUT_DIR;/series_analysis/series_analysis_FILE_LIST_PROB_APCP_06_2012040900_to_2012041000.nc \ + -out &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_FILE_LIST_PROB_APCP_06_2012040900_to_2012041018.nc \ -config &CONFIG_DIR;/SeriesAnalysisConfig \ -v 1 - &OUTPUT_DIR;/series_analysis/series_analysis_FILE_LIST_PROB_APCP_06_2012040900_to_2012041100.nc + &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_FILE_LIST_PROB_APCP_06_2012040900_to_2012041018.nc diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index 2d69fa672b..ba16258fa0 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -3318,8 +3318,8 @@ double PCTInfo::get_stat_pct(const string &stat_name, // Parse the index value from the column name i = atoi(strrchr(stat_name.c_str(), '_') + 1) - 1; - // Range check - if(i < 0 || i >= pct.nrows()) { + // Range check (allow THRESH_N for N == nrows) + if(i < 0 || i > pct.nrows()) { mlog << Error << "\nPCTInfo::get_stat_pct() -> " << "range check error for column name requested \"" << stat_name << "\"\n\n"; diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 5417a6dd23..5c150fb259 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -92,13 +92,6 @@ static void do_continuous (int, const PairDataPoint *); static void do_partialsums (int, const PairDataPoint *); static void do_probabilistic (int, const PairDataPoint *); -// TODO: MET #1371 -// - Add a PCT aggregation logic test -// - Can briercl be aggregated as a weighted average and used for bss? -// - How should valid data thresholds be applied when reading -aggr data? -// - Currently no way to aggregate anom_corr since CNTInfo::set(sl1l2) -// doesn't support it. - static void read_aggr_ctc (int, const CTSInfo &, CTSInfo &); static void read_aggr_mctc (int, const MCTSInfo &, MCTSInfo &); static void read_aggr_sl1l2 (int, const SL1L2Info &, SL1L2Info &); @@ -1503,7 +1496,7 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, aggr_pct.pct.zero_out(); // Get PCT column names - StringArray pct_cols(get_pct_columns(aggr_pct.pct.nrows())); + StringArray pct_cols(get_pct_columns(aggr_pct.pct.nrows()+1)); // Loop over the PCT colum names for(int i=0; i " - << "the number of PCT categories do not match (" - << nint(v)+1 << " != " << aggr_pct.pct.nrows() << ")!\n\n"; + << "the number of PCT thresholds do not match (" + << nint(v) << " != " << aggr_pct.pct.nrows()+1 + << ")!\n\n"; exit(1); } // Set the event counts From 8f7a80ca17ea5a1a8e50628e0ba8064bf4d8e462 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Tue, 13 Aug 2024 15:31:58 -0600 Subject: [PATCH 20/41] Per #1371, resolve a few SonarQube findings --- .../core/series_analysis/series_analysis.cc | 49 +++++++++---------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 5c150fb259..4b1dff302f 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -618,7 +618,6 @@ void get_series_entry(int i_series, VarInfo *info, const GrdFileType type, StringArray &found_files, DataPlane &dp, Grid &cur_grid) { - int i, j; bool found = false; // Initialize @@ -628,10 +627,10 @@ void get_series_entry(int i_series, VarInfo *info, if(found_files[i_series].length() == 0) { // Loop through the file list - for(i=0; i " << "Could not find data for " << info->magic_str() << " in file list:\n"; - for(i=0; i pd_block; const char *method_name = "process_scores() "; @@ -844,16 +844,16 @@ void process_scores() { // block_size is defined in get_series_data() if(pd_block.size() == 0) { pd_block.resize(conf_info.block_size); - for(auto &x : pd_block) x.extend(n_series); + for(auto &pd : pd_block) pd.extend(n_series); } // Beginning of each data pass if(i_series == 0) { // Re-initialize the PairDataPoint objects - for(auto &x : pd_block) { - x.erase(); - x.set_climo_cdf_info_ptr(&conf_info.cdf_info); + for(auto &pd : pd_block) { + pd.erase(); + pd.set_climo_cdf_info_ptr(&conf_info.cdf_info); } // Starting grid point @@ -903,7 +903,7 @@ void process_scores() { set_range(obs_dp.lead(), obs_lead_beg, obs_lead_end); // Store matched pairs for each grid point - for(i=0; i Date: Tue, 13 Aug 2024 17:39:02 -0600 Subject: [PATCH 21/41] Per #1371, make use of range-based for loop, as recommeded by SonarQube --- .../core/series_analysis/series_analysis.cc | 56 +++++++++---------- 1 file changed, 27 insertions(+), 29 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 4b1dff302f..3f3fe385d8 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -842,7 +842,7 @@ void process_scores() { // Initialize PairDataPoint vector, if needed // block_size is defined in get_series_data() - if(pd_block.size() == 0) { + if(pd_block.empty()) { pd_block.resize(conf_info.block_size); for(auto &pd : pd_block) pd.extend(n_series); } @@ -1347,10 +1347,10 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, // Initialize aggr_cts.cts.zero_out(); - // Loop over the CTC column names - for(int i=0; i Date: Thu, 15 Aug 2024 12:59:25 -0600 Subject: [PATCH 22/41] Per #1371, update series-analysis to apply the valid data threshold properly using the old aggregate data and the new pair data. --- .../core/series_analysis/series_analysis.cc | 228 +++++++++++------- .../core/series_analysis/series_analysis.h | 10 +- 2 files changed, 146 insertions(+), 92 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 3f3fe385d8..c5b0bb1c6c 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -82,7 +82,9 @@ static void get_series_entry(int, VarInfo *, const StringArray &, DataPlane &, Grid &); static bool read_single_entry(VarInfo *, const ConcatString &, const GrdFileType, DataPlane &, Grid &); -static DataPlane get_aggr_data(const ConcatString &); + +static void open_aggr_file(); +static DataPlane read_aggr_data_plane(const ConcatString &); static void process_scores(); @@ -92,6 +94,7 @@ static void do_continuous (int, const PairDataPoint *); static void do_partialsums (int, const PairDataPoint *); static void do_probabilistic (int, const PairDataPoint *); +static int read_aggr_total (int); static void read_aggr_ctc (int, const CTSInfo &, CTSInfo &); static void read_aggr_mctc (int, const MCTSInfo &, MCTSInfo &); static void read_aggr_sl1l2 (int, const SL1L2Info &, SL1L2Info &); @@ -305,35 +308,35 @@ void process_command_line(int argc, char **argv) { // - Observation file list if(conf_info.get_n_fcst() > 1) { series_type = SeriesType::Fcst_Conf; - n_series = conf_info.get_n_fcst(); + n_series_pair = conf_info.get_n_fcst(); mlog << Debug(1) << "Series defined by the \"fcst.field\" configuration entry " - << "of length " << n_series << ".\n"; + << "of length " << n_series_pair << ".\n"; } else if(conf_info.get_n_obs() > 1) { series_type = SeriesType::Obs_Conf; - n_series = conf_info.get_n_obs(); + n_series_pair = conf_info.get_n_obs(); mlog << Debug(1) << "Series defined by the \"obs.field\" configuration entry " - << "of length " << n_series << ".\n"; + << "of length " << n_series_pair << ".\n"; } else if(fcst_files.n() > 1) { series_type = SeriesType::Fcst_Files; - n_series = fcst_files.n(); + n_series_pair = fcst_files.n(); mlog << Debug(1) << "Series defined by the forecast file list of length " - << n_series << ".\n"; + << n_series_pair << ".\n"; } else if(obs_files.n() > 1) { series_type = SeriesType::Obs_Files; - n_series = obs_files.n(); + n_series_pair = obs_files.n(); mlog << Debug(1) << "Series defined by the observation file list of length " - << n_series << ".\n"; + << n_series_pair << ".\n"; } else { series_type = SeriesType::Fcst_Conf; - n_series = 1; + n_series_pair = 1; mlog << Debug(1) << "The \"fcst.field\" and \"obs.field\" configuration entries " << "and the \"-fcst\" and \"-obs\" command line options " @@ -354,24 +357,24 @@ void process_command_line(int argc, char **argv) { } // The number of files must match the series length. - if(fcst_files.n() != n_series) { + if(fcst_files.n() != n_series_pair) { mlog << Error << "\nprocess_command_line() -> " << "when using the \"-paired\" command line option, the " << "the file list length (" << fcst_files.n() - << ") and series length (" << n_series + << ") and series length (" << n_series_pair << ") must match.\n\n"; usage(); } // Set the series file names to the input file lists - for(i=0; imagic_str() + << n_series_pair << ": " << fcst_info->magic_str() << " versus " << obs_info->magic_str() << "\n"; // Switch on the series type @@ -719,62 +722,68 @@ bool read_single_entry(VarInfo *info, const ConcatString &cur_file, //////////////////////////////////////////////////////////////////////// -DataPlane get_aggr_data(const ConcatString &var_name) { - DataPlane aggr_dp; - - // Open the aggregate file, if needed - if(!aggr_nc.MetNc) { +void open_aggr_file() { - mlog << Debug(1) - << "Reading aggregate data file: " << aggr_file << "\n"; + mlog << Debug(1) + << "Reading aggregate data file: " << aggr_file << "\n"; - if(!aggr_nc.open(aggr_file.c_str())) { - mlog << Error << "\nget_aggr_data() -> " - << "unable to open the aggregate NetCDF file \"" - << aggr_file << "\"\n\n"; - exit(1); - } + if(!aggr_nc.open(aggr_file.c_str())) { + mlog << Error << "\nopen_aggr_file() -> " + << "unable to open the aggregate NetCDF file \"" + << aggr_file << "\"\n\n"; + exit(1); + } - // Update timing info based on aggregate file global attributes - ConcatString cs; + // Update timing info based on aggregate file global attributes + ConcatString cs; - if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_init_beg", cs)) { - set_range(timestring_to_unix(cs.c_str()), fcst_init_beg, fcst_init_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_init_end", cs)) { - set_range(timestring_to_unix(cs.c_str()), fcst_init_beg, fcst_init_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_valid_beg", cs)) { - set_range(timestring_to_unix(cs.c_str()), fcst_valid_beg, fcst_valid_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_valid_end", cs)) { - set_range(timestring_to_unix(cs.c_str()), fcst_valid_beg, fcst_valid_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_lead_beg", cs)) { - set_range(timestring_to_sec(cs.c_str()), fcst_lead_beg, fcst_lead_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_lead_end", cs)) { - set_range(timestring_to_sec(cs.c_str()), fcst_lead_beg, fcst_lead_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_init_beg", cs)) { - set_range(timestring_to_unix(cs.c_str()), obs_init_beg, obs_init_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_init_end", cs)) { - set_range(timestring_to_unix(cs.c_str()), obs_init_beg, obs_init_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_valid_beg", cs)) { - set_range(timestring_to_unix(cs.c_str()), obs_valid_beg, obs_valid_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_valid_end", cs)) { - set_range(timestring_to_unix(cs.c_str()), obs_valid_beg, obs_valid_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_lead_beg", cs)) { - set_range(timestring_to_sec(cs.c_str()), obs_lead_beg, obs_lead_end); - } - if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_lead_end", cs)) { - set_range(timestring_to_sec(cs.c_str()), obs_lead_beg, obs_lead_end); - } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_init_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_init_beg, fcst_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_init_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_init_beg, fcst_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_valid_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_valid_beg, fcst_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_valid_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), fcst_valid_beg, fcst_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_lead_beg", cs)) { + set_range(timestring_to_sec(cs.c_str()), fcst_lead_beg, fcst_lead_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "fcst_lead_end", cs)) { + set_range(timestring_to_sec(cs.c_str()), fcst_lead_beg, fcst_lead_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_init_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_init_beg, obs_init_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_init_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_init_beg, obs_init_end); } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_valid_beg", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_valid_beg, obs_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_valid_end", cs)) { + set_range(timestring_to_unix(cs.c_str()), obs_valid_beg, obs_valid_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_lead_beg", cs)) { + set_range(timestring_to_sec(cs.c_str()), obs_lead_beg, obs_lead_end); + } + if(get_att_value_string(aggr_nc.MetNc->Nc, "obs_lead_end", cs)) { + set_range(timestring_to_sec(cs.c_str()), obs_lead_beg, obs_lead_end); + } + + // Store the aggregate series length + n_series_aggr = get_int_var(aggr_nc.MetNc->Nc, n_series_var_name, 0); + + return; +} + +//////////////////////////////////////////////////////////////////////// + +DataPlane read_aggr_data_plane(const ConcatString &var_name) { + DataPlane aggr_dp; // Setup the data request VarInfoNcMet aggr_info; @@ -782,16 +791,16 @@ DataPlane get_aggr_data(const ConcatString &var_name) { // Attempt to read the gridded data from the current file if(!aggr_nc.data_plane(aggr_info, aggr_dp)) { - mlog << Error << "\nget_aggr_data() -> " + mlog << Error << "\nread_aggr_data_plane() -> " << "Required variable \"" << aggr_info.magic_str() << "\"" - << " not found in aggregate file!\n\n"; + << " not found in the aggregate file!\n\n"; exit(1); } // Check that the grid has not changed if(aggr_nc.grid().nx() != grid.nx() || aggr_nc.grid().ny() != grid.ny()) { - mlog << Error << "\nget_aggr_data() -> " + mlog << Error << "\nread_aggr_data_plane() -> " << "the input grid dimensions (" << grid.nx() << ", " << grid.ny() << ") and aggregate grid dimensions (" << aggr_nc.grid().nx() << ", " << aggr_nc.grid().ny() << ") do not match!\n\n"; @@ -818,15 +827,18 @@ void process_scores() { DataPlane fcmn_dp, fcsd_dp; DataPlane ocmn_dp, ocsd_dp; + // Open the aggregate file, if needed + if(aggr_file.nonempty()) open_aggr_file(); + // Number of points skipped due to valid data threshold - int n_skip_zero = 0; - int n_skip_pos = 0; + int n_skip_zero_vld = 0; + int n_skip_some_vld = 0; // Loop over the data reads for(int i_read=0; i_read 1 ? i_series : 0); @@ -844,7 +856,7 @@ void process_scores() { // block_size is defined in get_series_data() if(pd_block.empty()) { pd_block.resize(conf_info.block_size); - for(auto &pd : pd_block) pd.extend(n_series); + for(auto &pd : pd_block) pd.extend(n_series_pair); } // Beginning of each data pass @@ -892,7 +904,7 @@ void process_scores() { << (ocsd_flag ? 0 : 1) << " standard deviation field(s).\n"; // Setup the output NetCDF file on the first pass - if(nc_out == (NcFile *) 0) setup_nc_file(fcst_info, obs_info); + if(!nc_out) setup_nc_file(fcst_info, obs_info); // Update timing info set_range(fcst_dp.init(), fcst_init_beg, fcst_init_end); @@ -935,16 +947,21 @@ void process_scores() { // Determine x,y location DefaultTO.one_to_two(grid.nx(), grid.ny(), i_point+i, x, y); + // Compute the total number of valid points and series length + int n_valid = pd_block[i].f_na.n() + + (aggr_file.empty() ? 0 : read_aggr_total(i_point+1)); + int n_series = n_series_pair + n_series_aggr; + // Check for the required number of matched pairs - if(pd_block[i].f_na.n()/(double) n_series < conf_info.vld_data_thresh) { + if(n_valid / (double) n_series < conf_info.vld_data_thresh) { mlog << Debug(4) << "[" << i+1 << " of " << conf_info.block_size << "] Skipping point (" << x << ", " << y << ") with " - << pd_block[i].f_na.n() << " matched pairs.\n"; + << n_valid << " of " << n_series << " valid matched pairs.\n"; // Keep track of the number of points skipped - if(pd_block[i].f_na.n() == 0) n_skip_zero++; - else n_skip_pos++; + if(n_valid == 0) n_skip_zero_vld++; + else n_skip_some_vld++; continue; } @@ -952,7 +969,7 @@ void process_scores() { mlog << Debug(4) << "[" << i+1 << " of " << conf_info.block_size << "] Processing point (" << x << ", " << y << ") with " - << pd_block[i].n_obs << " matched pairs.\n"; + << n_valid << " of " << n_series << " valid matched pairs.\n"; } // Compute contingency table counts and statistics @@ -1012,15 +1029,15 @@ void process_scores() { // Print summary counts mlog << Debug(2) << "Finished processing statistics for " - << grid.nxy() - n_skip_zero - n_skip_pos << " of " + << grid.nxy() - n_skip_zero_vld - n_skip_some_vld << " of " << grid.nxy() << " grid points.\n" - << "Skipped " << n_skip_zero << " of " << grid.nxy() + << "Skipped " << n_skip_zero_vld << " of " << grid.nxy() << " points with no valid data.\n" - << "Skipped " << n_skip_pos << " of " << grid.nxy() + << "Skipped " << n_skip_some_vld << " of " << grid.nxy() << " points that did not meet the valid data threshold.\n"; // Print config file suggestions about missing data - if(n_skip_pos > 0 && conf_info.vld_data_thresh == 1.0) { + if(n_skip_some_vld > 0 && conf_info.vld_data_thresh == 1.0) { mlog << Debug(2) << "Some points skipped due to missing data:\n" << "Consider decreasing \"vld_thresh\" in the config file " @@ -1341,6 +1358,38 @@ void do_partialsums(int n, const PairDataPoint *pd_ptr) { //////////////////////////////////////////////////////////////////////// +int read_aggr_total(int n) { + + // Read TOTAL data, if needed + if(aggr_data.count(total_name) == 0) { + + // Retrive all the aggregate file variable names + StringArray aggr_var_names; + get_var_names(aggr_nc.MetNc->Nc, &aggr_var_names); + + // Search for one containing TOTAL + for(int i=0; i " + << "No variable containing \"" << total_name << "\"" + << " not found in the aggregate file!\n\n"; + exit(1); + } + } + + // Return the TOTAL count for the current point + return nint(aggr_data[total_name].buf()[n]); +} + +//////////////////////////////////////////////////////////////////////// + void read_aggr_ctc(int n, const CTSInfo &cts_info, CTSInfo &aggr_cts) { @@ -1357,7 +1406,7 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = get_aggr_data(var_name); + aggr_data[var_name] = read_aggr_data_plane(var_name); } // Populate the CTC table @@ -1385,7 +1434,7 @@ void read_aggr_sl1l2(int n, const SL1L2Info &s_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = get_aggr_data(var_name); + aggr_data[var_name] = read_aggr_data_plane(var_name); } // Populate the partial sums @@ -1413,7 +1462,7 @@ void read_aggr_sal1l2(int n, const SL1L2Info &s_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = get_aggr_data(var_name); + aggr_data[var_name] = read_aggr_data_plane(var_name); } // Populate the partial sums @@ -1446,7 +1495,7 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = get_aggr_data(var_name); + aggr_data[var_name] = read_aggr_data_plane(var_name); } // Get the n-th value @@ -1503,7 +1552,7 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = get_aggr_data(var_name); + aggr_data[var_name] = read_aggr_data_plane(var_name); } // Get the n-th value @@ -2052,9 +2101,10 @@ void setup_nc_file(const VarInfo *fcst_info, const VarInfo *obs_info) { if (deflate_level < 0) deflate_level = conf_info.get_compression_level(); // Add the series length variable - NcVar var = add_var(nc_out, "n_series", ncInt, deflate_level); + NcVar var = add_var(nc_out, n_series_var_name, ncInt, deflate_level); add_att(&var, "long_name", "length of series"); + int n_series = n_series_pair + n_series_aggr; if(!put_nc_data(&var, &n_series)) { mlog << Error << "\nsetup_nc_file() -> " << "error writing the series length variable.\n\n"; diff --git a/src/tools/core/series_analysis/series_analysis.h b/src/tools/core/series_analysis/series_analysis.h index a741169bab..7086d12c77 100644 --- a/src/tools/core/series_analysis/series_analysis.h +++ b/src/tools/core/series_analysis/series_analysis.h @@ -61,6 +61,9 @@ static const char * default_config_filename = "MET_BASE/config/SeriesAnalysisConfig_default"; static const char * all_columns = "ALL"; +static const char * n_series_var_name = "n_series"; + +static const char * total_name = "TOTAL"; //////////////////////////////////////////////////////////////////////// // @@ -91,7 +94,7 @@ static SeriesAnalysisConfInfo conf_info; //////////////////////////////////////////////////////////////////////// // Output NetCDF file -static netCDF::NcFile *nc_out = (netCDF::NcFile *) nullptr; +static netCDF::NcFile *nc_out = nullptr; static netCDF::NcDim lat_dim; static netCDF::NcDim lon_dim ; @@ -123,7 +126,7 @@ static Met2dDataFile *obs_mtddf = nullptr; static MetNcMetDataFile aggr_nc; // Pointer to the random number generator to be used -static gsl_rng *rng_ptr = (gsl_rng *) nullptr; +static gsl_rng *rng_ptr = nullptr; // Enumeration of ways that a series can be defined enum class SeriesType { @@ -136,7 +139,8 @@ enum class SeriesType { static SeriesType series_type = SeriesType::None; // Series length -static int n_series = 0; +static int n_series_pair = 0; // Input pair data series +static int n_series_aggr = 0; // Input aggr series // Range of timing values encountered in the data static unixtime fcst_init_beg = (unixtime) 0; From 47db1ce4a1220274548a09f27507de6c4339b8d4 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 15 Aug 2024 16:39:01 -0600 Subject: [PATCH 23/41] Per #1371, update series_analysis to buffer data and write it all at once instead of storing data value by value for each point. --- .../core/series_analysis/series_analysis.cc | 203 +++++++++--------- .../core/series_analysis/series_analysis.h | 18 +- 2 files changed, 118 insertions(+), 103 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index c5b0bb1c6c..738952ea93 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -140,10 +140,10 @@ static ConcatString build_nc_var_name_probabilistic( const PCTInfo &, double); static void setup_nc_file(const VarInfo *, const VarInfo *); -static void add_nc_var(const ConcatString &, const ConcatString &, - const ConcatString &, const ConcatString &, - const ConcatString &, double); -static void put_nc_val(int, const ConcatString &, float); +static void add_stat_data(const ConcatString &, const ConcatString &, + const ConcatString &, const ConcatString &, + const ConcatString &, double); +static void write_stat_data(); static void set_range(const unixtime &, unixtime &, unixtime &); static void set_range(const int &, int &, int &); @@ -607,7 +607,8 @@ void get_series_data(int i_series, << cs << "\n\n"; } else { - mlog << Debug(3) << cs << "\n"; + mlog << Debug(3) + << cs << "\n"; } } @@ -1012,6 +1013,9 @@ void process_scores() { } // end for i_read + // Write the computed statistics + write_stat_data(); + // Add time range information to the global NetCDF attributes add_att(nc_out, "fcst_init_beg", (string)unix_to_yyyymmdd_hhmmss(fcst_init_beg)); add_att(nc_out, "fcst_init_end", (string)unix_to_yyyymmdd_hhmmss(fcst_init_end)); @@ -1053,7 +1057,8 @@ void process_scores() { void do_categorical(int n, const PairDataPoint *pd_ptr) { - mlog << Debug(4) << "Computing Categorical Statistics.\n"; + mlog << Debug(4) + << "Computing Categorical Statistics.\n"; // Allocate objects to store categorical statistics int n_cts = conf_info.fcat_ta.n(); @@ -1147,7 +1152,8 @@ void do_categorical(int n, const PairDataPoint *pd_ptr) { void do_multicategory(int n, const PairDataPoint *pd_ptr) { - mlog << Debug(4) << "Computing Multi-Category Statistics.\n"; + mlog << Debug(4) + << "Computing Multi-Category Statistics.\n"; // Object to store multi-category statistics MCTSInfo mcts_info; @@ -1223,7 +1229,8 @@ void do_continuous(int n, const PairDataPoint *pd_ptr) { CNTInfo cnt_info; PairDataPoint pd; - mlog << Debug(4) << "Computing Continuous Statistics.\n"; + mlog << Debug(4) + << "Computing Continuous Statistics.\n"; // Process each filtering threshold for(int i=0; i 0) add_att(d.var, "name", (string)name); - if(long_name.length() > 0) add_att(d.var, "long_name", (string)long_name); - if(fcst_thresh.length() > 0) add_att(d.var, "fcst_thresh", (string)fcst_thresh); - if(obs_thresh.length() > 0) add_att(d.var, "obs_thresh", (string)obs_thresh); - if(!is_bad_data(alpha)) add_att(d.var, "alpha", alpha); - - // Store the new NcVarData object in the map - stat_data[var_name] = d; +void add_stat_data(const ConcatString &var_name, + const ConcatString &name, + const ConcatString &long_name, + const ConcatString &fcst_thresh, + const ConcatString &obs_thresh, + double alpha) { + + NcVarData data; + data.dp.set_size(grid.nx(), grid.ny()); + data.name = name; + data.long_name = long_name; + data.fcst_thresh = fcst_thresh; + data.obs_thresh = obs_thresh; + data.alpha = alpha; + + // Store the new NcVarData object + stat_data[var_name] = data; + stat_data_keys.push_back(var_name); return; } //////////////////////////////////////////////////////////////////////// -void put_nc_val(int n, const ConcatString &var_name, float v) { - int x, y; +void write_stat_data() { - // Determine x,y location - DefaultTO.one_to_two(grid.nx(), grid.ny(), n, x, y); + mlog << Debug(2) + << "Writing " << stat_data_keys.size() + << " output variables.\n"; - // Check for key in the map - if(stat_data.count(var_name) == 0) { - mlog << Error << "\nput_nc_val() -> " - << "variable name \"" << var_name - << "\" does not exist in the map.\n\n"; - exit(1); + int deflate_level = compress_level; + if(deflate_level < 0) deflate_level = conf_info.get_compression_level(); + + // Allocate memory to store data values for each grid point + float *data = new float [grid.nx()*grid.ny()]; + + // Write output for each stat_data map entry + for(auto &key : stat_data_keys) { + + NcVarData *ptr = &stat_data[key]; + + // Add a new variable to the NetCDF file + NcVar nc_var = add_var(nc_out, key, ncFloat, lat_dim, lon_dim, deflate_level); + + // Add variable attributes + add_att(&nc_var, "_FillValue", bad_data_float); + add_att(&nc_var, "name", ptr->name); + add_att(&nc_var, "long_name", ptr->long_name); + if(ptr->fcst_thresh.length() > 0) add_att(&nc_var, "fcst_thresh", ptr->fcst_thresh); + if(ptr->obs_thresh.length() > 0) add_att(&nc_var, "obs_thresh", ptr->obs_thresh); + if(!is_bad_data(ptr->alpha)) add_att(&nc_var, "alpha", ptr->alpha); + + // Store the data + for(int x=0; xdp(x, y); + } // end for y + } // end for x + + // Write out the data + if(!put_nc_data_with_dims(&nc_var, &data[0], grid.ny(), grid.nx())) { + mlog << Error << "\nwrite_stat_data() -> " + << "error writing \"" << key << "\" data to the output file.\n\n"; + exit(1); + } } - // Get the NetCDF variable to be written - NcVar *var = stat_data[var_name].var; - - long offsets[2]; - long lengths[2]; - offsets[0] = y; - offsets[1] = x; - lengths[0] = 1; - lengths[1] = 1; - - // Store the current value - if(!put_nc_data(var, &v, lengths, offsets)) { - mlog << Error << "\nput_nc_val() -> " - << "error writing to variable " << var_name - << " for point (" << x << ", " << y << ").\n\n"; - exit(1); - } + // Clean up + if(data) { delete [] data; data = (float *) nullptr; } return; } @@ -2213,17 +2227,12 @@ void set_range(const int &t, int &beg, int &end) { void clean_up() { - // Deallocate NetCDF variable for each map entry - map::const_iterator it; - for(it=stat_data.begin(); it!=stat_data.end(); it++) { - if(it->second.var) { delete it->second.var; } - } - // Close the output NetCDF file if(nc_out) { // List the NetCDF file after it is finished - mlog << Debug(1) << "Output file: " << out_file << "\n"; + mlog << Debug(1) + << "Output file: " << out_file << "\n"; delete nc_out; nc_out = (NcFile *) nullptr; diff --git a/src/tools/core/series_analysis/series_analysis.h b/src/tools/core/series_analysis/series_analysis.h index 7086d12c77..73b2f3d6f6 100644 --- a/src/tools/core/series_analysis/series_analysis.h +++ b/src/tools/core/series_analysis/series_analysis.h @@ -95,19 +95,25 @@ static SeriesAnalysisConfInfo conf_info; // Output NetCDF file static netCDF::NcFile *nc_out = nullptr; -static netCDF::NcDim lat_dim; -static netCDF::NcDim lon_dim ; +static netCDF::NcDim lat_dim; +static netCDF::NcDim lon_dim ; -// Structure to store computed statistics and corresponding metadata +// Structure to store computed statistics struct NcVarData { - netCDF::NcVar * var; // Pointer to NetCDF variable + DataPlane dp; + std::string name; + std::string long_name; + std::string fcst_thresh; + std::string obs_thresh; + double alpha; }; // Mapping of NetCDF variable name to computed statistic -std::map stat_data; +std::map stat_data; +std::vector stat_data_keys; // Mapping of aggregate NetCDF variable name to DataPlane -std::map aggr_data; +std::map aggr_data; //////////////////////////////////////////////////////////////////////// // From def857cd7501d202e4fa3f630bb0c7281225c9ff Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 16 Aug 2024 12:35:28 -0600 Subject: [PATCH 24/41] Per #1371, add useful error message when required aggregation variables are not present in the input -aggr file. --- .../core/series_analysis/series_analysis.cc | 141 ++++++++++-------- 1 file changed, 80 insertions(+), 61 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 738952ea93..0db39b53d1 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -84,7 +84,8 @@ static bool read_single_entry(VarInfo *, const ConcatString &, const GrdFileType, DataPlane &, Grid &); static void open_aggr_file(); -static DataPlane read_aggr_data_plane(const ConcatString &); +static DataPlane read_aggr_data_plane(const ConcatString &, + const STATLineType aggr_lt=STATLineType::none); static void process_scores(); @@ -783,7 +784,8 @@ void open_aggr_file() { //////////////////////////////////////////////////////////////////////// -DataPlane read_aggr_data_plane(const ConcatString &var_name) { +DataPlane read_aggr_data_plane(const ConcatString &var_name, + STATLineType aggr_lt) { DataPlane aggr_dp; // Setup the data request @@ -795,6 +797,13 @@ DataPlane read_aggr_data_plane(const ConcatString &var_name) { mlog << Error << "\nread_aggr_data_plane() -> " << "Required variable \"" << aggr_info.magic_str() << "\"" << " not found in the aggregate file!\n\n"; + if(aggr_lt != STATLineType::none) { + mlog << Error + << "Recommend recreating \"" << aggr_file + << "\" to request that \"" << all_columns << "\" " + << statlinetype_to_string(aggr_lt) + << " columns be written.\n\n"; + } exit(1); } @@ -1414,7 +1423,9 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = read_aggr_data_plane(var_name); + aggr_data[var_name] = read_aggr_data_plane( + var_name, + STATLineType::ctc); } // Populate the CTC table @@ -1426,62 +1437,6 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, //////////////////////////////////////////////////////////////////////// -void read_aggr_sl1l2(int n, const SL1L2Info &s_info, - SL1L2Info &aggr_psum) { - - // Initialize - aggr_psum.zero_out(); - - // Loop over the SL1L2 columns - for(auto &col : sl1l2_columns) { - - ConcatString c(to_upper(col)); - ConcatString var_name(build_nc_var_name_partialsums( - STATLineType::sl1l2, c, - s_info)); - - // Read aggregate data, if needed - if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = read_aggr_data_plane(var_name); - } - - // Populate the partial sums - aggr_psum.set_stat_sl1l2(col, aggr_data[var_name].buf()[n]); - } - - return; -} - -//////////////////////////////////////////////////////////////////////// - -void read_aggr_sal1l2(int n, const SL1L2Info &s_info, - SL1L2Info &aggr_psum) { - - // Initialize - aggr_psum.zero_out(); - - // Loop over the SAL1L2 columns - for(auto &col : sal1l2_columns) { - - ConcatString c(to_upper(col)); - ConcatString var_name(build_nc_var_name_partialsums( - STATLineType::sal1l2, c, - s_info)); - - // Read aggregate data, if needed - if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = read_aggr_data_plane(var_name); - } - - // Populate the partial sums - aggr_psum.set_stat_sal1l2(col, aggr_data[var_name].buf()[n]); - } - - return; -} - -//////////////////////////////////////////////////////////////////////// - void read_aggr_mctc(int n, const MCTSInfo &mcts_info, MCTSInfo &aggr_mcts) { @@ -1503,7 +1458,9 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = read_aggr_data_plane(var_name); + aggr_data[var_name] = read_aggr_data_plane( + var_name, + STATLineType::mctc); } // Get the n-th value @@ -1539,6 +1496,66 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, //////////////////////////////////////////////////////////////////////// +void read_aggr_sl1l2(int n, const SL1L2Info &s_info, + SL1L2Info &aggr_psum) { + + // Initialize + aggr_psum.zero_out(); + + // Loop over the SL1L2 columns + for(auto &col : sl1l2_columns) { + + ConcatString c(to_upper(col)); + ConcatString var_name(build_nc_var_name_partialsums( + STATLineType::sl1l2, c, + s_info)); + + // Read aggregate data, if needed + if(aggr_data.count(var_name) == 0) { + aggr_data[var_name] = read_aggr_data_plane( + var_name, + STATLineType::sl1l2); + } + + // Populate the partial sums + aggr_psum.set_stat_sl1l2(col, aggr_data[var_name].buf()[n]); + } + + return; +} + +//////////////////////////////////////////////////////////////////////// + +void read_aggr_sal1l2(int n, const SL1L2Info &s_info, + SL1L2Info &aggr_psum) { + + // Initialize + aggr_psum.zero_out(); + + // Loop over the SAL1L2 columns + for(auto &col : sal1l2_columns) { + + ConcatString c(to_upper(col)); + ConcatString var_name(build_nc_var_name_partialsums( + STATLineType::sal1l2, c, + s_info)); + + // Read aggregate data, if needed + if(aggr_data.count(var_name) == 0) { + aggr_data[var_name] = read_aggr_data_plane( + var_name, + STATLineType::sal1l2); + } + + // Populate the partial sums + aggr_psum.set_stat_sal1l2(col, aggr_data[var_name].buf()[n]); + } + + return; +} + +//////////////////////////////////////////////////////////////////////// + void read_aggr_pct(int n, const PCTInfo &pct_info, PCTInfo &aggr_pct) { @@ -1560,7 +1577,9 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { - aggr_data[var_name] = read_aggr_data_plane(var_name); + aggr_data[var_name] = read_aggr_data_plane( + var_name, + STATLineType::pct); } // Get the n-th value From 2ea28a0e820130f39a61b00c3f05acb418a4c611 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 16 Aug 2024 15:15:12 -0600 Subject: [PATCH 25/41] Per #1371, print a Debug(2) message listing the aggregation fields being read. --- src/tools/core/series_analysis/series_analysis.cc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 0db39b53d1..8f044b7979 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -792,6 +792,10 @@ DataPlane read_aggr_data_plane(const ConcatString &var_name, VarInfoNcMet aggr_info; aggr_info.set_magic(var_name, "(*,*)"); + mlog << Debug(2) + << "Reading aggregation \"" + << aggr_info.magic_str() << "\" field.\n"; + // Attempt to read the gridded data from the current file if(!aggr_nc.data_plane(aggr_info, aggr_dp)) { mlog << Error << "\nread_aggr_data_plane() -> " From cbba99be976ba2bb6193ad5109750165e93558ad Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Sun, 18 Aug 2024 22:49:23 -0600 Subject: [PATCH 26/41] Per #1371, correct operator+= logic in met_stats.cc for SL1L2Info, VL1L2Info, and NBRCNTInfo. The metadata settings, like fthresh and othresh, were not being passed to the output. --- src/libcode/vx_statistics/met_stats.cc | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index ba16258fa0..ec5801b4a3 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -1475,7 +1475,8 @@ SL1L2Info & SL1L2Info::operator=(const SL1L2Info &c) { //////////////////////////////////////////////////////////////////////// SL1L2Info & SL1L2Info::operator+=(const SL1L2Info &c) { - SL1L2Info s_info; + SL1L2Info s_info = *this; + s_info.zero_out(); s_info.scount = scount + c.scount; @@ -1810,11 +1811,8 @@ VL1L2Info & VL1L2Info::operator=(const VL1L2Info &c) { //////////////////////////////////////////////////////////////////////// VL1L2Info & VL1L2Info::operator+=(const VL1L2Info &c) { - VL1L2Info v_info; - - // Store alpha values - v_info.allocate_n_alpha(n_alpha); - for(int i=0; i Date: Mon, 19 Aug 2024 08:59:52 -0600 Subject: [PATCH 27/41] Per #1371, the DataPlane for the computed statistics should be initialized to a field of bad data values rather than the default value of 0. Otherwise, 0's are reported for stats a grid points with no data when they should really be reported as bad data! --- src/tools/core/series_analysis/series_analysis.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 8f044b7979..63d23fe3e6 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -2156,7 +2156,7 @@ void add_stat_data(const ConcatString &var_name, double alpha) { NcVarData data; - data.dp.set_size(grid.nx(), grid.ny()); + data.dp.set_size(grid.nx(), grid.ny(), bad_data_double); data.name = name; data.long_name = long_name; data.fcst_thresh = fcst_thresh; From 1fd1f91d18f3c187713effa6fde21a30f6216b72 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Wed, 21 Aug 2024 14:48:09 -0600 Subject: [PATCH 28/41] Per #1371, update logic of the compute_cntinfo() function so that CNT statistics can be derived from a single SL1L2Info object containing both scalar and scalar anomaly partial sums. These changes enable CNT:ANOM_CORR to be aggregated in the Series-Analysis tool. --- src/libcode/vx_stat_out/stat_columns.cc | 2 +- src/libcode/vx_statistics/compute_stats.cc | 161 ++++++++---------- src/libcode/vx_statistics/compute_stats.h | 3 +- src/libcode/vx_statistics/met_stats.cc | 38 +++++ src/libcode/vx_statistics/met_stats.h | 9 +- .../core/series_analysis/series_analysis.cc | 10 +- .../core/stat_analysis/aggr_stat_line.cc | 4 +- .../stat_analysis/skill_score_index_job.cc | 4 +- .../core/stat_analysis/stat_analysis_job.cc | 7 +- 9 files changed, 133 insertions(+), 105 deletions(-) diff --git a/src/libcode/vx_stat_out/stat_columns.cc b/src/libcode/vx_stat_out/stat_columns.cc index 389665f177..d4354083df 100644 --- a/src/libcode/vx_stat_out/stat_columns.cc +++ b/src/libcode/vx_stat_out/stat_columns.cc @@ -4636,7 +4636,7 @@ void write_ssvar_cols(const PairDataEnsemble *pd_ptr, int i, // cnt_info.allocate_n_alpha(1); cnt_info.alpha[0] = alpha; - compute_cntinfo(pd_ptr->ssvar_bins[i].sl1l2_info, 0, cnt_info); + compute_cntinfo(pd_ptr->ssvar_bins[i].sl1l2_info, cnt_info); // // Ensemble spread/skill variance bins diff --git a/src/libcode/vx_statistics/compute_stats.cc b/src/libcode/vx_statistics/compute_stats.cc index bbc9e0ac1a..3b4e7400dd 100644 --- a/src/libcode/vx_statistics/compute_stats.cc +++ b/src/libcode/vx_statistics/compute_stats.cc @@ -27,112 +27,97 @@ using namespace std; const int detailed_debug_level = 5; - //////////////////////////////////////////////////////////////////////// -void compute_cntinfo(const SL1L2Info &s, bool aflag, CNTInfo &cnt_info) { - double fbar, obar, ffbar, fobar, oobar, den; - int n; +void compute_cntinfo(const SL1L2Info &s, CNTInfo &cnt_info) { + + // Initialize statistics + cnt_info.zero_out(); - // Set the quantities that can't be derived from SL1L2Info to bad data - cnt_info.sp_corr.set_bad_data(); - cnt_info.kt_corr.set_bad_data(); - cnt_info.e10.set_bad_data(); - cnt_info.e25.set_bad_data(); - cnt_info.e50.set_bad_data(); - cnt_info.e75.set_bad_data(); - cnt_info.e90.set_bad_data(); - cnt_info.eiqr.set_bad_data(); - cnt_info.mad.set_bad_data(); - cnt_info.n_ranks = 0; - cnt_info.frank_ties = 0; - cnt_info.orank_ties = 0; - - // Get partial sums - n = (aflag ? s.sacount : s.scount); - fbar = (aflag ? s.fabar : s.fbar); - obar = (aflag ? s.oabar : s.obar); - fobar = (aflag ? s.foabar : s.fobar); - ffbar = (aflag ? s.ffabar : s.ffbar); - oobar = (aflag ? s.ooabar : s.oobar); + // Check for consistent counts + if(s.scount > 0 && s.sacount > 0 && + s.scount != s.sacount) { + mlog << Error << "\ncompute_cntinfo() -> " + << "the scalar partial sum and scalar anomaly partial sum " + << "counts are both non-zero but do not match (" + << s.scount << " != " << s.sacount << ").\n\n"; + exit(1); + } // Number of matched pairs + int n = max(s.scount, s.sacount); cnt_info.n = n; - // Forecast mean and standard deviation - cnt_info.fbar.v = fbar; - cnt_info.fstdev.v = compute_stdev(fbar*n, ffbar*n, n); - - // Observation mean and standard deviation - cnt_info.obar.v = obar; - cnt_info.ostdev.v = compute_stdev(obar*n, oobar*n, n); - - // Multiplicative bias - cnt_info.mbias.v = (is_eq(obar, 0.0) ? bad_data_double : fbar/obar); - - // Correlation coefficient - - // Handle SAL1L2 data - if(aflag) { - cnt_info.pr_corr.v = bad_data_double; - cnt_info.anom_corr.v = compute_corr( fbar*n, obar*n, - ffbar*n, oobar*n, - fobar*n, n); - cnt_info.rmsfa.v = sqrt(ffbar); - cnt_info.rmsoa.v = sqrt(oobar); - cnt_info.anom_corr_uncntr.v = compute_anom_corr_uncntr(ffbar, oobar, - fobar); - } - // Handle SL1L2 data - else { - cnt_info.pr_corr.v = compute_corr( fbar*n, obar*n, - ffbar*n, oobar*n, - fobar*n, n); - cnt_info.anom_corr.v = bad_data_double; - cnt_info.rmsfa.v = bad_data_double; - cnt_info.rmsoa.v = bad_data_double; - cnt_info.anom_corr_uncntr.v = bad_data_double; - } + // Process scalar partial sum statistics + if(s.scount > 0) { - // Compute mean error - cnt_info.me.v = fbar - obar; + // Forecast mean and standard deviation + cnt_info.fbar.v = s.fbar; + cnt_info.fstdev.v = compute_stdev(s.fbar*n, s.ffbar*n, n); - // Compute mean error squared - cnt_info.me2.v = cnt_info.me.v * cnt_info.me.v; + // Observation mean and standard deviation + cnt_info.obar.v = s.obar; + cnt_info.ostdev.v = compute_stdev(s.obar*n, s.oobar*n, n); - // Compute mean absolute error - cnt_info.mae.v = s.smae; + // Multiplicative bias + cnt_info.mbias.v = (is_eq(s.obar, 0.0) ? bad_data_double : s.fbar/s.obar); - // Compute mean squared error - cnt_info.mse.v = ffbar + oobar - 2.0*fobar; + // Correlation coefficient + cnt_info.pr_corr.v = compute_corr( s.fbar*n, s.obar*n, + s.ffbar*n, s.oobar*n, + s.fobar*n, n); - // Compute mean squared error skill score - den = cnt_info.ostdev.v * cnt_info.ostdev.v; - if(!is_eq(den, 0.0)) { - cnt_info.msess.v = 1.0 - cnt_info.mse.v / den; - } - else { - cnt_info.msess.v = bad_data_double; - } + // Compute mean error + cnt_info.me.v = s.fbar - s.obar; - // Compute standard deviation of the mean error - cnt_info.estdev.v = compute_stdev(cnt_info.me.v*n, - cnt_info.mse.v*n, n); + // Compute mean error squared + cnt_info.me2.v = cnt_info.me.v * cnt_info.me.v; - // Compute bias corrected mean squared error (decomposition of MSE) - cnt_info.bcmse.v = cnt_info.mse.v - (fbar - obar)*(fbar - obar); + // Compute mean absolute error + cnt_info.mae.v = s.smae; - // Compute root mean squared error - cnt_info.rmse.v = sqrt(cnt_info.mse.v); + // Compute mean squared error + cnt_info.mse.v = s.ffbar + s.oobar - 2.0*s.fobar; - // Compute Scatter Index (SI) - if(!is_eq(cnt_info.obar.v, 0.0)) { - cnt_info.si.v = cnt_info.rmse.v / cnt_info.obar.v; + // Compute mean squared error skill score + double den = cnt_info.ostdev.v * cnt_info.ostdev.v; + if(!is_eq(den, 0.0)) { + cnt_info.msess.v = 1.0 - cnt_info.mse.v / den; + } + else { + cnt_info.msess.v = bad_data_double; + } + + // Compute standard deviation of the mean error + cnt_info.estdev.v = compute_stdev(cnt_info.me.v*n, + cnt_info.mse.v*n, n); + + // Compute bias corrected mean squared error (decomposition of MSE) + cnt_info.bcmse.v = cnt_info.mse.v - (s.fbar - s.obar)*(s.fbar - s.obar); + + // Compute root mean squared error + cnt_info.rmse.v = sqrt(cnt_info.mse.v); + + // Compute Scatter Index (SI) + if(!is_eq(cnt_info.obar.v, 0.0)) { + cnt_info.si.v = cnt_info.rmse.v / cnt_info.obar.v; + } + else { + cnt_info.si.v = bad_data_double; + } } - else { - cnt_info.si.v = bad_data_double; + + // Process scalar anomaly partial sum statistics + if(s.sacount > 0) { + cnt_info.anom_corr.v = compute_corr( s.fabar*n, s.oabar*n, + s.ffabar*n, s.ooabar*n, + s.foabar*n, n); + cnt_info.rmsfa.v = sqrt(s.ffabar); + cnt_info.rmsoa.v = sqrt(s.ooabar); + cnt_info.anom_corr_uncntr.v = compute_anom_corr_uncntr(s.ffabar, s.ooabar, + s.foabar); } - + // Compute normal confidence intervals cnt_info.compute_ci(); diff --git a/src/libcode/vx_statistics/compute_stats.h b/src/libcode/vx_statistics/compute_stats.h index 556afcd1e8..1649cdcec2 100644 --- a/src/libcode/vx_statistics/compute_stats.h +++ b/src/libcode/vx_statistics/compute_stats.h @@ -24,8 +24,7 @@ // //////////////////////////////////////////////////////////////////////// -extern void compute_cntinfo(const SL1L2Info &, bool, CNTInfo &); - +extern void compute_cntinfo(const SL1L2Info &, CNTInfo &); extern void compute_cntinfo(const PairDataPoint &, const NumArray &, bool, bool, bool, CNTInfo &); extern void compute_i_cntinfo(const PairDataPoint &, int, diff --git a/src/libcode/vx_statistics/met_stats.cc b/src/libcode/vx_statistics/met_stats.cc index ec5801b4a3..0ab9a05188 100644 --- a/src/libcode/vx_statistics/met_stats.cc +++ b/src/libcode/vx_statistics/met_stats.cc @@ -985,6 +985,44 @@ void CNTInfo::init_from_scratch() { //////////////////////////////////////////////////////////////////////// +void CNTInfo::zero_out() { + + fbar.set_bad_data(); + fstdev.set_bad_data(); + obar.set_bad_data(); + ostdev.set_bad_data(); + pr_corr.set_bad_data(); + sp_corr.set_bad_data(); + kt_corr.set_bad_data(); + anom_corr.set_bad_data(); + rmsfa.set_bad_data(); + rmsoa.set_bad_data(); + anom_corr_uncntr.set_bad_data(); + me.set_bad_data(); + me2.set_bad_data(); + estdev.set_bad_data(); + mbias.set_bad_data(); + mae.set_bad_data(); + mse.set_bad_data(); + msess.set_bad_data(); + bcmse.set_bad_data(); + rmse.set_bad_data(); + si.set_bad_data(); + e10.set_bad_data(); + e25.set_bad_data(); + e50.set_bad_data(); + e75.set_bad_data(); + e90.set_bad_data(); + eiqr.set_bad_data(); + mad.set_bad_data(); + + n_ranks = frank_ties = orank_ties = 0; + + return; +} + +//////////////////////////////////////////////////////////////////////// + void CNTInfo::clear() { n = 0; diff --git a/src/libcode/vx_statistics/met_stats.h b/src/libcode/vx_statistics/met_stats.h index 5b2939b74d..c968697dd7 100644 --- a/src/libcode/vx_statistics/met_stats.h +++ b/src/libcode/vx_statistics/met_stats.h @@ -197,7 +197,9 @@ class CNTInfo { int n_ranks, frank_ties, orank_ties; + void zero_out(); void clear(); + void allocate_n_alpha(int); void compute_ci(); @@ -370,9 +372,9 @@ class VL1L2Info { // Compute sums void set(const PairDataPoint &, const PairDataPoint &); - - void clear(); + void zero_out(); + void clear(); void allocate_n_alpha(int); void compute_stats(); @@ -520,8 +522,9 @@ class ISCInfo { double baser; double fbias; - void clear(); void zero_out(); + void clear(); + void allocate_n_scale(int); void compute_isc(); void compute_isc(int); diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 63d23fe3e6..8f4b54199c 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -1272,13 +1272,19 @@ void do_continuous(int n, const PairDataPoint *pd_ptr) { s_info.logic = cnt_info.logic; s_info.set(*pd_ptr); - // Aggregate partial sums + // Aggregate scalar partial sums SL1L2Info aggr_psum; read_aggr_sl1l2(n, s_info, aggr_psum); s_info += aggr_psum; + // Aggregate scalar anomaly partial sums + if(conf_info.output_stats[STATLineType::cnt].has("ANOM_CORR")) { + read_aggr_sal1l2(n, s_info, aggr_psum); + s_info += aggr_psum; + } + // Compute continuous statistics from partial sums - compute_cntinfo(s_info, false, cnt_info); + compute_cntinfo(s_info, cnt_info); } // Compute continuous statistics from the pair data else { diff --git a/src/tools/core/stat_analysis/aggr_stat_line.cc b/src/tools/core/stat_analysis/aggr_stat_line.cc index eef0420779..6c0a7add52 100644 --- a/src/tools/core/stat_analysis/aggr_stat_line.cc +++ b/src/tools/core/stat_analysis/aggr_stat_line.cc @@ -706,7 +706,7 @@ void aggr_summary_lines(LineDataFile &f, STATAnalysisJob &job, else if(do_cnt && (line.type() == STATLineType::sl1l2 || line.type() == STATLineType::sal1l2)) { parse_sl1l2_line(line, sl1l2_info); - compute_cntinfo(sl1l2_info, 0, cnt_info); + compute_cntinfo(sl1l2_info, cnt_info); } // @@ -1519,7 +1519,7 @@ void aggr_psum_lines(LineDataFile &f, STATAnalysisJob &job, // // Compute the stats for the current time // - compute_cntinfo(cur_sl1l2, 0, cur_cnt); + compute_cntinfo(cur_sl1l2, cur_cnt); // // Append the stats diff --git a/src/tools/core/stat_analysis/skill_score_index_job.cc b/src/tools/core/stat_analysis/skill_score_index_job.cc index 9ffd63dc6d..88bc450900 100644 --- a/src/tools/core/stat_analysis/skill_score_index_job.cc +++ b/src/tools/core/stat_analysis/skill_score_index_job.cc @@ -245,9 +245,9 @@ SSIDXData SSIndexJobInfo::compute_ss_index() { // Continuous stats if(job_lt[i] == STATLineType::sl1l2) { - compute_cntinfo(fcst_sl1l2[i], 0, fcst_cnt); + compute_cntinfo(fcst_sl1l2[i], fcst_cnt); fcst_stat = fcst_cnt.get_stat(fcst_job[i].column[0].c_str()); - compute_cntinfo(ref_sl1l2[i], 0, ref_cnt); + compute_cntinfo(ref_sl1l2[i], ref_cnt); ref_stat = ref_cnt.get_stat(fcst_job[i].column[0].c_str()); } // Categorical stats diff --git a/src/tools/core/stat_analysis/stat_analysis_job.cc b/src/tools/core/stat_analysis/stat_analysis_job.cc index 3492309da1..b3a9eb12cb 100644 --- a/src/tools/core/stat_analysis/stat_analysis_job.cc +++ b/src/tools/core/stat_analysis/stat_analysis_job.cc @@ -1876,10 +1876,7 @@ void write_job_aggr_psum(STATAnalysisJob &job, STATLineType lt, // // Compute CNTInfo statistics from the aggregated partial sums // - if(it->second.sl1l2_info.scount > 0) - compute_cntinfo(it->second.sl1l2_info, 0, it->second.cnt_info); - else - compute_cntinfo(it->second.sl1l2_info, 1, it->second.cnt_info); + compute_cntinfo(it->second.sl1l2_info, it->second.cnt_info); if(job.stat_out) { write_cnt_cols(it->second.cnt_info, 0, job.stat_at, @@ -2610,7 +2607,7 @@ void write_job_aggr_ssvar(STATAnalysisJob &job, STATLineType lt, // // Compute CNTInfo statistics from the aggregated partial sums // - compute_cntinfo(bin_it->second.sl1l2_info, 0, cnt_info); + compute_cntinfo(bin_it->second.sl1l2_info, cnt_info); // // Write the output STAT line From 83bf7dafb428f65ee307970c5f39f9e109fc1e60 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 22 Aug 2024 12:37:10 -0600 Subject: [PATCH 29/41] Per #1371, fix logic of climo log message. --- src/tools/core/series_analysis/series_analysis.cc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 3a351cb9ea..1e2d32b474 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -918,10 +918,10 @@ void process_scores() { mlog << Debug(3) << "For " << fcst_info->magic_str() << ", found " - << (fcmn_flag ? 0 : 1) << " forecast climatology mean and " - << (fcsd_flag ? 0 : 1) << " standard deviation field(s), and " - << (ocmn_flag ? 0 : 1) << " observation climatology mean and " - << (ocsd_flag ? 0 : 1) << " standard deviation field(s).\n"; + << (fcmn_flag ? 1 : 0) << " forecast climatology mean and " + << (fcsd_flag ? 1 : 0) << " standard deviation field(s), and " + << (ocmn_flag ? 1 : 0) << " observation climatology mean and " + << (ocsd_flag ? 1 : 0) << " standard deviation field(s).\n"; // Setup the output NetCDF file on the first pass if(!nc_out) setup_nc_file(fcst_info, obs_info); From 3eabea8dfb03fc910f66750cebf6d2143c9c0c52 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 22 Aug 2024 14:01:59 -0600 Subject: [PATCH 30/41] Per #1371, this is actually related to MET #2924. In compute_pctinfo() used obs climo data first, if provided. And if not, use fcst climo data. --- src/libcode/vx_statistics/compute_stats.cc | 23 +++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/src/libcode/vx_statistics/compute_stats.cc b/src/libcode/vx_statistics/compute_stats.cc index 3b4e7400dd..743c6b3ae0 100644 --- a/src/libcode/vx_statistics/compute_stats.cc +++ b/src/libcode/vx_statistics/compute_stats.cc @@ -733,7 +733,7 @@ void compute_i_mctsinfo(const PairDataPoint &pd, int skip, } //////////////////////////////////////////////////////////////////////// - +// JHG maybe cprob_in should be removed? void compute_pctinfo(const PairDataPoint &pd, bool pstd_flag, PCTInfo &pct_info, const NumArray *cprob_in) { int i, n_thresh, n_pair; @@ -757,10 +757,23 @@ void compute_pctinfo(const PairDataPoint &pd, bool pstd_flag, // Use input climatological probabilities or derive them if(cmn_flag) { - if(cprob_in) climo_prob = *cprob_in; - else climo_prob = derive_climo_prob(pd.cdf_info_ptr, - pd.ocmn_na, pd.ocsd_na, - pct_info.othresh); + + // Use climatological probabilities direclty, if supplied + if(cprob_in) { + climo_prob = *cprob_in; + } + // Use observation climatology data, if available + else if(pd.ocmn_na.n() > 0) { + climo_prob = derive_climo_prob(pd.cdf_info_ptr, + pd.ocmn_na, pd.ocsd_na, + pct_info.othresh); + } + // Otherwise, try using forecast climatology data + else { + climo_prob = derive_climo_prob(pd.cdf_info_ptr, + pd.fcmn_na, pd.fcsd_na, + pct_info.othresh); + } } // From 1c7a03ffcbbd2eedd2e13a7c1d2399824ff3a558 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 23 Aug 2024 11:36:46 -0600 Subject: [PATCH 31/41] Per #1371, fix indexing bug (+i instead of +1) when check the valid data count. Also update the logic of read_aggr_total() to return a count of 0 for bad data. --- src/tools/core/series_analysis/series_analysis.cc | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 1e2d32b474..20cf5f7981 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -779,6 +779,9 @@ void open_aggr_file() { // Store the aggregate series length n_series_aggr = get_int_var(aggr_nc.MetNc->Nc, n_series_var_name, 0); + mlog << Debug(3) + << "Aggregation series has length " << n_series_aggr << ".\n"; + return; } @@ -969,7 +972,7 @@ void process_scores() { // Compute the total number of valid points and series length int n_valid = pd_block[i].f_na.n() + - (aggr_file.empty() ? 0 : read_aggr_total(i_point+1)); + (aggr_file.empty() ? 0 : read_aggr_total(i_point+i)); int n_series = n_series_pair + n_series_aggr; // Check for the required number of matched pairs @@ -1028,6 +1031,7 @@ void process_scores() { conf_info.output_stats[STATLineType::prc].n()) > 0) { do_probabilistic(i_point+i, &pd_block[i]); } + } // end for i } // end for i_read @@ -1412,13 +1416,17 @@ int read_aggr_total(int n) { if(aggr_data.count(total_name) == 0) { mlog << Error << "\nread_aggr_total() -> " << "No variable containing \"" << total_name << "\"" - << " not found in the aggregate file!\n\n"; + << " found in the aggregate file!\n\n"; exit(1); } } + // Replace bad data with a total count of 0 + int total = nint(aggr_data[total_name].buf()[n]); + if(is_bad_data(total)) total = 0; + // Return the TOTAL count for the current point - return nint(aggr_data[total_name].buf()[n]); + return total; } //////////////////////////////////////////////////////////////////////// From 457c1caed7f3399097f4ff6b87f2f03ffb447b55 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 23 Aug 2024 13:12:12 -0600 Subject: [PATCH 32/41] Per #1371, add logic to aggregate the PSTD BRIERCL and BSS statistics in the do_climo_brier() function. Tested manually to confirm that it works. --- .../core/series_analysis/series_analysis.cc | 81 +++++++++++++++---- 1 file changed, 64 insertions(+), 17 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 20cf5f7981..87004f76d2 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -85,7 +85,7 @@ static bool read_single_entry(VarInfo *, const ConcatString &, static void open_aggr_file(); static DataPlane read_aggr_data_plane(const ConcatString &, - const STATLineType aggr_lt=STATLineType::none); + const char *suggestion=nullptr); static void process_scores(); @@ -94,6 +94,7 @@ static void do_multicategory (int, const PairDataPoint *); static void do_continuous (int, const PairDataPoint *); static void do_partialsums (int, const PairDataPoint *); static void do_probabilistic (int, const PairDataPoint *); +static void do_climo_brier (int, double, int, PCTInfo &); static int read_aggr_total (int); static void read_aggr_ctc (int, const CTSInfo &, CTSInfo &); @@ -788,7 +789,7 @@ void open_aggr_file() { //////////////////////////////////////////////////////////////////////// DataPlane read_aggr_data_plane(const ConcatString &var_name, - STATLineType aggr_lt) { + const char *suggestion) { DataPlane aggr_dp; // Setup the data request @@ -804,12 +805,11 @@ DataPlane read_aggr_data_plane(const ConcatString &var_name, mlog << Error << "\nread_aggr_data_plane() -> " << "Required variable \"" << aggr_info.magic_str() << "\"" << " not found in the aggregate file!\n\n"; - if(aggr_lt != STATLineType::none) { + if(suggestion) { mlog << Error << "Recommend recreating \"" << aggr_file - << "\" to request that \"" << all_columns << "\" " - << statlinetype_to_string(aggr_lt) - << " columns be written.\n\n"; + << "\" to request that " << suggestion + << " column(s) be written.\n\n"; } exit(1); } @@ -1448,8 +1448,7 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, - STATLineType::ctc); + var_name, "ALL CTC"); } // Populate the CTC table @@ -1483,8 +1482,7 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, - STATLineType::mctc); + var_name, "ALL MCTC"); } // Get the n-th value @@ -1537,8 +1535,7 @@ void read_aggr_sl1l2(int n, const SL1L2Info &s_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, - STATLineType::sl1l2); + var_name, "ALL SL1L2"); } // Populate the partial sums @@ -1567,8 +1564,7 @@ void read_aggr_sal1l2(int n, const SL1L2Info &s_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, - STATLineType::sal1l2); + var_name, "ALL SAL1L2"); } // Populate the partial sums @@ -1602,8 +1598,7 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, - STATLineType::pct); + var_name, "ALL PCT"); } // Get the n-th value @@ -1674,13 +1669,21 @@ void do_probabilistic(int n, const PairDataPoint *pd_ptr) { // Aggregate PCT counts pct_info.pct += aggr_pct.pct; - // Zero out the climatology PCT table which cannot be aggregated + // The climatology PCT table cannot be aggregated since the counts + // are not written to the output. Store the pair climo brier score + // before zeroing the PCT table. + double briercl_pair = pct_info.climo_pct.brier_score(); pct_info.climo_pct.zero_out(); // Compute statistics and confidence intervals pct_info.compute_stats(); pct_info.compute_ci(); + // Custom logic for the climatology Brier Score + if(conf_info.output_stats[STATLineType::pstd].has("BRIERCL") || + conf_info.output_stats[STATLineType::pstd].has("BSS")) { + do_climo_brier(n, briercl_pair, n_series_pair, pct_info); + } } // Compute the probabilistic counts and statistics else { @@ -1721,6 +1724,50 @@ void do_probabilistic(int n, const PairDataPoint *pd_ptr) { //////////////////////////////////////////////////////////////////////// +void do_climo_brier(int n, double briercl_pair, + int total_pair, PCTInfo &pct_info) { + + // Aggregate the climatology brier score as a weighted + // average and recompute the brier skill score + + if(is_bad_data(briercl_pair) || total_pair == 0) return; + + // Construct the NetCDF variable name + ConcatString var_name(build_nc_var_name_probabilistic( + STATLineType::pstd, "BRIERCL", + pct_info, bad_data_double)); + + // Read aggregate data, if needed + if(aggr_data.count(var_name) == 0) { + aggr_data[var_name] = read_aggr_data_plane( + var_name, "the BRIERCL PSTD"); + } + + // Get the n-th BRIERCL value + double briercl_aggr = aggr_data[var_name].buf()[n]; + int total_aggr = read_aggr_total(n); + + // Aggregate BRIERCL as a weighted average + if(!is_bad_data(briercl_pair) && + !is_bad_data(briercl_aggr) && + (total_pair + total_aggr) > 0) { + + pct_info.briercl.v = (total_pair * briercl_pair + + total_aggr * briercl_aggr) / + (total_pair + total_aggr); + + // Compute the brier skill score + if(!is_bad_data(pct_info.brier.v) && + !is_bad_data(pct_info.briercl.v)) { + pct_info.bss = 1.0 - (pct_info.brier.v / pct_info.briercl.v); + } + } + + return; +} + +//////////////////////////////////////////////////////////////////////// + void store_stat_categorical(int n, STATLineType lt, const ConcatString &col, const CTSInfo &cts_info) { From c0a1a0ba309ceba6fb59f20ca4bfeadd5622e44a Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 23 Aug 2024 14:43:12 -0600 Subject: [PATCH 33/41] Per #1371, switch to using string literals to satisfy SonarQube --- .../core/series_analysis/series_analysis.cc | 95 ++++++++++--------- 1 file changed, 49 insertions(+), 46 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 87004f76d2..187f8be93e 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -229,31 +229,32 @@ void process_command_line(int argc, char **argv) { if(fcst_files.n() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the forecast file list must be set using the " - << "\"-fcst\" or \"-both\" option.\n\n"; + << R"("-fcst" or "-both" option.\n\n)"; usage(); } if(obs_files.n() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the observation file list must be set using the " - << "\"-obs\" or \"-both\" option.\n\n"; + << R"("-obs" or "-both" option.\n\n)"; usage(); } if(config_file.length() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the configuration file must be set using the " - << "\"-config\" option.\n\n"; + << R"("-config" option.\n\n)"; usage(); } if(out_file.length() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the output NetCDF file must be set using the " - << "\"-out\" option.\n\n"; + << R"("-out" option.\n\n)"; usage(); } if(aggr_file == out_file) { mlog << Error << "\nprocess_command_line() -> " - << "the \"-out\" and \"-aggr\" options cannot be " - << "set to the same file (\"" << aggr_file << "\")!\n\n"; + << R"(the "-out" and "-aggr" options cannot be )" + << R"(set to the same file (")" << aggr_file + << R"(")!\n\n)"; usage(); } @@ -293,9 +294,9 @@ void process_command_line(int argc, char **argv) { // List the lengths of the series options mlog << Debug(1) - << "Length of configuration \"fcst.field\" = " + << R"(Length of configuration "fcst.field" = )" << conf_info.get_n_fcst() << "\n" - << "Length of configuration \"obs.field\" = " + << R"(Length of configuration "obs.field" = )" << conf_info.get_n_obs() << "\n" << "Length of forecast file list = " << fcst_files.n() << "\n" @@ -312,14 +313,14 @@ void process_command_line(int argc, char **argv) { series_type = SeriesType::Fcst_Conf; n_series_pair = conf_info.get_n_fcst(); mlog << Debug(1) - << "Series defined by the \"fcst.field\" configuration entry " + << R"(Series defined by the "fcst.field" configuration entry )" << "of length " << n_series_pair << ".\n"; } else if(conf_info.get_n_obs() > 1) { series_type = SeriesType::Obs_Conf; n_series_pair = conf_info.get_n_obs(); mlog << Debug(1) - << "Series defined by the \"obs.field\" configuration entry " + << R"(Series defined by the "obs.field" configuration entry )" << "of length " << n_series_pair << ".\n"; } else if(fcst_files.n() > 1) { @@ -340,8 +341,8 @@ void process_command_line(int argc, char **argv) { series_type = SeriesType::Fcst_Conf; n_series_pair = 1; mlog << Debug(1) - << "The \"fcst.field\" and \"obs.field\" configuration entries " - << "and the \"-fcst\" and \"-obs\" command line options " + << R"(The "fcst.field" and "obs.field" configuration entries )" + << R"(and the "-fcst" and "-obs" command line options )" << "all have length one.\n"; } @@ -351,7 +352,7 @@ void process_command_line(int argc, char **argv) { // The number of forecast and observation files must match. if(fcst_files.n() != obs_files.n()) { mlog << Error << "\nprocess_command_line() -> " - << "when using the \"-paired\" command line option, the " + << R"(when using the "-paired" command line option, the )" << "number of forecast (" << fcst_files.n() << ") and observation (" << obs_files.n() << ") files must match.\n\n"; @@ -361,7 +362,7 @@ void process_command_line(int argc, char **argv) { // The number of files must match the series length. if(fcst_files.n() != n_series_pair) { mlog << Error << "\nprocess_command_line() -> " - << "when using the \"-paired\" command line option, the " + << R"(when using the "-paired" command line option, the )" << "the file list length (" << fcst_files.n() << ") and series length (" << n_series_pair << ") must match.\n\n"; @@ -416,7 +417,7 @@ void process_grid(const Grid &fcst_grid, const Grid &obs_grid) { << "\nA block size of " << conf_info.block_size << " for a " << grid.nx() << " x " << grid.ny() << " grid requires " << n_reads << " passes through the data which will be slow.\n" - << "Consider increasing \"block_size\" in the configuration " + << R"(Consider increasing "block_size" in the configuration )" << "file based on available memory.\n\n"; } @@ -443,8 +444,8 @@ Met2dDataFile *get_mtddf(const StringArray &file_list, // Read first valid file if(!(mtddf = mtddf_factory.new_met_2d_data_file(file_list[i].c_str(), type))) { - mlog << Error << "\nTrouble reading data file \"" - << file_list[i] << "\"\n\n"; + mlog << Error << "\nTrouble reading data file: " + << file_list[i] << "\n\n"; exit(1); } @@ -555,7 +556,7 @@ void get_series_data(int i_series, << "disabled:\n" << fcst_grid.serialize() << " !=\n" << grid.serialize() << "\nSpecify regridding logic in the config file " - << "\"regrid\" section.\n\n"; + << R"("regrid" section.\n\n)"; exit(1); } @@ -575,7 +576,7 @@ void get_series_data(int i_series, << "disabled:\n" << obs_grid.serialize() << " !=\n" << grid.serialize() << "\nSpecify regridding logic in the config file " - << "\"regrid\" section.\n\n"; + << R"("regrid" section.\n\n)"; exit(1); } @@ -732,8 +733,8 @@ void open_aggr_file() { if(!aggr_nc.open(aggr_file.c_str())) { mlog << Error << "\nopen_aggr_file() -> " - << "unable to open the aggregate NetCDF file \"" - << aggr_file << "\"\n\n"; + << "unable to open the aggregate NetCDF file: " + << aggr_file << "\n\n"; exit(1); } @@ -797,18 +798,19 @@ DataPlane read_aggr_data_plane(const ConcatString &var_name, aggr_info.set_magic(var_name, "(*,*)"); mlog << Debug(2) - << "Reading aggregation \"" - << aggr_info.magic_str() << "\" field.\n"; + << R"(Reading aggregation ")" + << aggr_info.magic_str() + << R"(" field.\n)"; // Attempt to read the gridded data from the current file if(!aggr_nc.data_plane(aggr_info, aggr_dp)) { mlog << Error << "\nread_aggr_data_plane() -> " - << "Required variable \"" << aggr_info.magic_str() << "\"" - << " not found in the aggregate file!\n\n"; + << R"(Required variable ")" << aggr_info.magic_str() + << R"(" not found in the aggregate file!\n\n)"; if(suggestion) { mlog << Error - << "Recommend recreating \"" << aggr_file - << "\" to request that " << suggestion + << R"(Recommend recreating ")" << aggr_file + << R"(" to request that )" << suggestion << " column(s) be written.\n\n"; } exit(1); @@ -1067,9 +1069,9 @@ void process_scores() { if(n_skip_some_vld > 0 && conf_info.vld_data_thresh == 1.0) { mlog << Debug(2) << "Some points skipped due to missing data:\n" - << "Consider decreasing \"vld_thresh\" in the config file " + << R"(Consider decreasing "vld_thresh" in the config file )" << "to include more points.\n" - << "Consider requesting \"TOTAL\" from \"output_stats\" " + << R"(Consider requesting "TOTAL" from "output_stats" )" << "in the config file to see the valid data counts.\n"; } @@ -1415,8 +1417,8 @@ int read_aggr_total(int n) { // Check for a match if(aggr_data.count(total_name) == 0) { mlog << Error << "\nread_aggr_total() -> " - << "No variable containing \"" << total_name << "\"" - << " found in the aggregate file!\n\n"; + << R"(No variable containing ")" << total_name + << R"(" "found in the aggregate file!\n\n)"; exit(1); } } @@ -2278,7 +2280,8 @@ void write_stat_data() { // Write out the data if(!put_nc_data_with_dims(&nc_var, &data[0], grid.ny(), grid.nx())) { mlog << Error << "\nwrite_stat_data() -> " - << "error writing \"" << key << "\" data to the output file.\n\n"; + << R"(error writing ")" << key + << R"(" data to the output file.\n\n)"; exit(1); } } @@ -2360,41 +2363,41 @@ void usage() { << "\t[-v level]\n" << "\t[-compress level]\n\n" - << "\twhere\t\"-fcst file_1 ... file_n\" are the gridded " + << R"(\twhere\t"-fcst file_1 ... file_n" are the gridded )" << "forecast files to be used (required).\n" - << "\t\t\"-fcst fcst_file_list\" is an ASCII file containing " + << R"(\t\t"-fcst fcst_file_list" is an ASCII file containing )" << "a list of gridded forecast files to be used (required).\n" - << "\t\t\"-obs file_1 ... file_n\" are the gridded " + << R"(\t\t"-obs file_1 ... file_n" are the gridded )" << "observation files to be used (required).\n" - << "\t\t\"-obs obs_file_list\" is an ASCII file containing " + << R"(\t\t"-obs obs_file_list" is an ASCII file containing )" << "a list of gridded observation files to be used (required).\n" - << "\t\t\"-both\" sets the \"-fcst\" and \"-obs\" options to " + << R"(\t\t"-both" sets the "-fcst" and "-obs" options to )" << "the same set of files (optional).\n" - << "\t\t\"-aggr file\" specifies a series_analysis output " + << R"(\t\t"-aggr file" specifies a series_analysis output )" << "file with partial sums and/or contingency table counts to be " << "updated prior to deriving statistics (optional).\n" - << "\t\t\"-paired\" to indicate that the input -fcst and -obs " + << R"(\t\t"-paired" to indicate that the input -fcst and -obs )" << "file lists are already paired (optional).\n" - << "\t\t\"-out file\" is the NetCDF output file containing " + << R"(\t\t"-out file" is the NetCDF output file containing )" << "computed statistics (required).\n" - << "\t\t\"-config file\" is a SeriesAnalysisConfig file " + << R"(\t\t"-config file" is a SeriesAnalysisConfig file )" << "containing the desired configuration settings (required).\n" - << "\t\t\"-log file\" outputs log messages to the specified " + << R"(\t\t"-log file" outputs log messages to the specified )" << "file (optional).\n" - << "\t\t\"-v level\" overrides the default level of logging (" + << R"(\t\t"-v level" overrides the default level of logging ()" << mlog.verbosity_level() << ") (optional).\n" - << "\t\t\"-compress level\" overrides the compression level of NetCDF variable (" + << R"(\t\t"-compress level" overrides the compression level of NetCDF variable ()" << conf_info.get_compression_level() << ") (optional).\n\n" << flush; exit(1); @@ -2464,8 +2467,8 @@ void parse_long_names() { f_in.open(file_name.c_str()); if(!f_in) { mlog << Error << "\nparse_long_names() -> " - << "can't open the ASCII file \"" << file_name - << "\" for reading\n\n"; + << R"(can't open the ASCII file ") << file_name + << R"(" for reading\n\n)"; exit(1); } From 38e7a677a12cccb3262234355b70bd01fceb1780 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 23 Aug 2024 14:45:58 -0600 Subject: [PATCH 34/41] Per #1371, update series_analysis tests in unit_climatology_1.0deg.xml to demonstrate aggregating climo-based stats. --- .../config/SeriesAnalysisConfig_climo | 10 +- .../config/SeriesAnalysisConfig_climo_prob | 2 +- .../test_unit/xml/unit_climatology_1.0deg.xml | 96 +++++++++++++++---- 3 files changed, 82 insertions(+), 26 deletions(-) diff --git a/internal/test_unit/config/SeriesAnalysisConfig_climo b/internal/test_unit/config/SeriesAnalysisConfig_climo index 3728482541..f19bac7a20 100644 --- a/internal/test_unit/config/SeriesAnalysisConfig_climo +++ b/internal/test_unit/config/SeriesAnalysisConfig_climo @@ -132,13 +132,13 @@ vld_thresh = 0.5; // output_stats = { fho = [ "TOTAL", "F_RATE", "H_RATE", "O_RATE" ]; - ctc = [ ]; + ctc = [ "ALL" ]; cts = [ ]; mctc = [ ]; - mcts = [ "ACC" ]; - cnt = [ "TOTAL", "RMSE", "ANOM_CORR" ]; - sl1l2 = [ ]; - sal1l2 = [ ]; + mcts = [ ]; + cnt = [ "TOTAL", "RMSE", "ANOM_CORR", "RMSFA", "RMSOA" ]; + sl1l2 = [ "ALL" ]; + sal1l2 = [ "ALL" ]; pct = [ ]; pstd = [ ]; pjc = [ ]; diff --git a/internal/test_unit/config/SeriesAnalysisConfig_climo_prob b/internal/test_unit/config/SeriesAnalysisConfig_climo_prob index 8b55c508d3..149062dc41 100644 --- a/internal/test_unit/config/SeriesAnalysisConfig_climo_prob +++ b/internal/test_unit/config/SeriesAnalysisConfig_climo_prob @@ -148,7 +148,7 @@ output_stats = { cnt = [ ]; sl1l2 = [ ]; sal1l2 = [ ]; - pct = [ ]; + pct = [ "ALL" ]; pstd = [ "TOTAL", "ROC_AUC", "BRIER", "BRIERCL", "BSS", "BSS_SMPL" ]; pjc = [ ]; prc = [ ]; diff --git a/internal/test_unit/xml/unit_climatology_1.0deg.xml b/internal/test_unit/xml/unit_climatology_1.0deg.xml index a07d47ff6e..fcd6b59668 100644 --- a/internal/test_unit/xml/unit_climatology_1.0deg.xml +++ b/internal/test_unit/xml/unit_climatology_1.0deg.xml @@ -154,20 +154,18 @@ &OUTPUT_DIR;/climatology_1.0deg/stat_analysis_MPR_to_PSTD.stat - +--!> &MET_BIN;/series_analysis CLIMO_MEAN_FILE_LIST "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590409", - "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590410", - "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590411" + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590410" CLIMO_STDEV_FILE_LIST "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590409", - "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590410", - "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590411" + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590410" @@ -175,11 +173,9 @@ -fcst &DATA_DIR_MODEL;/grib2/gfs/gfs_2012040900_F012.grib2 \ &DATA_DIR_MODEL;/grib2/gfs/gfs_2012040900_F024.grib2 \ &DATA_DIR_MODEL;/grib2/gfs/gfs_2012040900_F036.grib2 \ - &DATA_DIR_MODEL;/grib2/gfs/gfs_2012040900_F048.grib2 \ -obs &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120409_1200_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_0000_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_1200_000.grb2 \ - &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120411_0000_000.grb2 \ -paired \ -out &OUTPUT_DIR;/climatology_1.0deg/series_analysis_GFS_CLIMO_1.0DEG.nc \ -config &CONFIG_DIR;/SeriesAnalysisConfig_climo \ @@ -190,25 +186,84 @@ + + &MET_BIN;/series_analysis + + CLIMO_MEAN_FILE_LIST + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590411" + + + CLIMO_STDEV_FILE_LIST + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590411" + + + + \ + -fcst &DATA_DIR_MODEL;/grib2/gfs/gfs_2012040900_F048.grib2 \ + -obs &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120411_0000_000.grb2 \ + -paired \ + -aggr &OUTPUT_DIR;/climatology_1.0deg/series_analysis_GFS_CLIMO_1.0DEG.nc \ + -out &OUTPUT_DIR;/climatology_1.0deg/series_analysis_GFS_CLIMO_1.0DEG_AGGR.nc \ + -config &CONFIG_DIR;/SeriesAnalysisConfig_climo \ + -v 2 + + + &OUTPUT_DIR;/climatology_1.0deg/series_analysis_GFS_CLIMO_1.0DEG_AGGR.nc + + + echo "&DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F003.grib2 \ &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F009.grib2 \ &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F015.grib2 \ - &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F021.grib2 \ - &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F027.grib2 \ - &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F033.grib2 \ - &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F039.grib2 \ - &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F045.grib2" \ - > &OUTPUT_DIR;/climatology_1.0deg/input_fcst_file_list; \ + &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F021.grib2" \ + > &OUTPUT_DIR;/climatology_1.0deg/20120409_fcst_file_list; \ echo "&DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120409_0000_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120409_0600_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120409_1200_000.grb2 \ - &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120409_1800_000.grb2 \ - &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_0000_000.grb2 \ + &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120409_1800_000.grb2" \ + > &OUTPUT_DIR;/climatology_1.0deg/20120409_obs_file_list; \ + &MET_BIN;/series_analysis + + DAY_INTERVAL 1 + HOUR_INTERVAL 6 + CLIMO_MEAN_FILE_LIST + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590409", + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590410", + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cmean_1d.19590411" + + + CLIMO_STDEV_FILE_LIST + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590409", + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590410", + "&DATA_DIR_CLIMO;/NCEP_NCAR_40YR_1.0deg/cstdv_1d.19590411" + + + + \ + -fcst &OUTPUT_DIR;/climatology_1.0deg/20120409_fcst_file_list \ + -obs &OUTPUT_DIR;/climatology_1.0deg/20120409_obs_file_list \ + -paired \ + -out &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG.nc \ + -config &CONFIG_DIR;/SeriesAnalysisConfig_climo_prob \ + -v 2 + + + &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG.nc + + + + + echo "&DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F027.grib2 \ + &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F033.grib2 \ + &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F039.grib2 \ + &DATA_DIR_MODEL;/grib2/sref_pr/sref_prob_2012040821_F045.grib2" \ + > &OUTPUT_DIR;/climatology_1.0deg/20120410_fcst_file_list; \ + echo "&DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_0000_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_0600_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_1200_000.grb2 \ &DATA_DIR_MODEL;/grib2/gfsanl/gfsanl_4_20120410_1800_000.grb2" \ - > &OUTPUT_DIR;/climatology_1.0deg/input_obs_file_list; \ + > &OUTPUT_DIR;/climatology_1.0deg/20120410_obs_file_list; \ &MET_BIN;/series_analysis DAY_INTERVAL 1 @@ -227,15 +282,16 @@ \ - -fcst &OUTPUT_DIR;/climatology_1.0deg/input_fcst_file_list \ - -obs &OUTPUT_DIR;/climatology_1.0deg/input_obs_file_list \ + -fcst &OUTPUT_DIR;/climatology_1.0deg/20120410_fcst_file_list \ + -obs &OUTPUT_DIR;/climatology_1.0deg/20120410_obs_file_list \ -paired \ - -out &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG.nc \ + -aggr &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG.nc \ + -out &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG_AGGR.nc \ -config &CONFIG_DIR;/SeriesAnalysisConfig_climo_prob \ -v 2 - &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG.nc + &OUTPUT_DIR;/climatology_1.0deg/series_analysis_PROB_CLIMO_1.0DEG_AGGR.nc From f0a5eb73152fbb476859ec876d5cf2e259c254de Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 23 Aug 2024 15:09:51 -0600 Subject: [PATCH 35/41] Per #1371, remove extra comment --- src/libcode/vx_statistics/compute_stats.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/libcode/vx_statistics/compute_stats.cc b/src/libcode/vx_statistics/compute_stats.cc index 743c6b3ae0..c0d988efd3 100644 --- a/src/libcode/vx_statistics/compute_stats.cc +++ b/src/libcode/vx_statistics/compute_stats.cc @@ -733,7 +733,7 @@ void compute_i_mctsinfo(const PairDataPoint &pd, int skip, } //////////////////////////////////////////////////////////////////////// -// JHG maybe cprob_in should be removed? + void compute_pctinfo(const PairDataPoint &pd, bool pstd_flag, PCTInfo &pct_info, const NumArray *cprob_in) { int i, n_thresh, n_pair; From 972f86711ed88369db5005b633e4ec020b0d6c7b Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Fri, 23 Aug 2024 16:55:26 -0600 Subject: [PATCH 36/41] Per #1371, skip writing the PCT THRESH_i columns to the Series-Analysis output since they are not used --- src/tools/core/series_analysis/series_analysis.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 187f8be93e..b494442239 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -2052,6 +2052,8 @@ void store_stat_all_sal1l2(int n, const SL1L2Info &s_info) { void store_stat_all_pct(int n, const PCTInfo &pct_info) { StringArray pct_cols(get_pct_columns(pct_info.pct.nrows() + 1)); for(int i=0; i Date: Mon, 26 Aug 2024 08:18:29 -0600 Subject: [PATCH 37/41] Per #1371, fix the R string literals to remove \t and \n escape sequences. --- .../core/series_analysis/series_analysis.cc | 60 +++++++++++-------- 1 file changed, 36 insertions(+), 24 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index b494442239..733b2823a5 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -229,32 +229,32 @@ void process_command_line(int argc, char **argv) { if(fcst_files.n() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the forecast file list must be set using the " - << R"("-fcst" or "-both" option.\n\n)"; + << R"("-fcst" or "-both" option.)" << "\n\n"; usage(); } if(obs_files.n() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the observation file list must be set using the " - << R"("-obs" or "-both" option.\n\n)"; + << R"("-obs" or "-both" option.)" << "\n\n"; usage(); } if(config_file.length() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the configuration file must be set using the " - << R"("-config" option.\n\n)"; + << R"("-config" option.)" << "\n\n"; usage(); } if(out_file.length() == 0) { mlog << Error << "\nprocess_command_line() -> " << "the output NetCDF file must be set using the " - << R"("-out" option.\n\n)"; + << R"("-out" option.)" << "\n\n"; usage(); } if(aggr_file == out_file) { mlog << Error << "\nprocess_command_line() -> " << R"(the "-out" and "-aggr" options cannot be )" << R"(set to the same file (")" << aggr_file - << R"(")!\n\n)"; + << R"(")!)" << "\n\n"; usage(); } @@ -556,7 +556,7 @@ void get_series_data(int i_series, << "disabled:\n" << fcst_grid.serialize() << " !=\n" << grid.serialize() << "\nSpecify regridding logic in the config file " - << R"("regrid" section.\n\n)"; + << R"("regrid" section.)" << "\n\n"; exit(1); } @@ -576,7 +576,7 @@ void get_series_data(int i_series, << "disabled:\n" << obs_grid.serialize() << " !=\n" << grid.serialize() << "\nSpecify regridding logic in the config file " - << R"("regrid" section.\n\n)"; + << R"("regrid" section.)" << "\n\n"; exit(1); } @@ -800,13 +800,13 @@ DataPlane read_aggr_data_plane(const ConcatString &var_name, mlog << Debug(2) << R"(Reading aggregation ")" << aggr_info.magic_str() - << R"(" field.\n)"; + << R"(" field.)" << "\n"; // Attempt to read the gridded data from the current file if(!aggr_nc.data_plane(aggr_info, aggr_dp)) { mlog << Error << "\nread_aggr_data_plane() -> " << R"(Required variable ")" << aggr_info.magic_str() - << R"(" not found in the aggregate file!\n\n)"; + << R"(" not found in the aggregate file!)" << "\n\n"; if(suggestion) { mlog << Error << R"(Recommend recreating ")" << aggr_file @@ -1418,7 +1418,7 @@ int read_aggr_total(int n) { if(aggr_data.count(total_name) == 0) { mlog << Error << "\nread_aggr_total() -> " << R"(No variable containing ")" << total_name - << R"(" "found in the aggregate file!\n\n)"; + << R"(" "found in the aggregate file!)" << "\n\n"; exit(1); } } @@ -2283,7 +2283,7 @@ void write_stat_data() { if(!put_nc_data_with_dims(&nc_var, &data[0], grid.ny(), grid.nx())) { mlog << Error << "\nwrite_stat_data() -> " << R"(error writing ")" << key - << R"(" data to the output file.\n\n)"; + << R"(" data to the output file.)" << "\n\n"; exit(1); } } @@ -2365,41 +2365,53 @@ void usage() { << "\t[-v level]\n" << "\t[-compress level]\n\n" - << R"(\twhere\t"-fcst file_1 ... file_n" are the gridded )" + << "\twhere\t" + << R"("-fcst file_1 ... file_n" are the gridded )" << "forecast files to be used (required).\n" - << R"(\t\t"-fcst fcst_file_list" is an ASCII file containing )" + << "\t\t" + << R"("-fcst fcst_file_list" is an ASCII file containing )" << "a list of gridded forecast files to be used (required).\n" - << R"(\t\t"-obs file_1 ... file_n" are the gridded )" + << "\t\t" + << R"("-obs file_1 ... file_n" are the gridded )" << "observation files to be used (required).\n" - << R"(\t\t"-obs obs_file_list" is an ASCII file containing )" + << "\t\t" + << R"("-obs obs_file_list" is an ASCII file containing )" << "a list of gridded observation files to be used (required).\n" - << R"(\t\t"-both" sets the "-fcst" and "-obs" options to )" + << "\t\t" + << R"("-both" sets the "-fcst" and "-obs" options to )" << "the same set of files (optional).\n" - << R"(\t\t"-aggr file" specifies a series_analysis output )" + << "\t\t" + << R"("-aggr file" specifies a series_analysis output )" << "file with partial sums and/or contingency table counts to be " << "updated prior to deriving statistics (optional).\n" - << R"(\t\t"-paired" to indicate that the input -fcst and -obs )" + << "\t\t" + << R"("-paired" to indicate that the input -fcst and -obs )" << "file lists are already paired (optional).\n" - << R"(\t\t"-out file" is the NetCDF output file containing )" + << "\t\t" + << R"("-out file" is the NetCDF output file containing )" << "computed statistics (required).\n" - << R"(\t\t"-config file" is a SeriesAnalysisConfig file )" + << "\t\t" + << R"("-config file" is a SeriesAnalysisConfig file )" << "containing the desired configuration settings (required).\n" - << R"(\t\t"-log file" outputs log messages to the specified )" + << "\t\t" + << R"("-log file" outputs log messages to the specified )" << "file (optional).\n" - << R"(\t\t"-v level" overrides the default level of logging ()" + << "\t\t" + << R"("-v level" overrides the default level of logging ()" << mlog.verbosity_level() << ") (optional).\n" - << R"(\t\t"-compress level" overrides the compression level of NetCDF variable ()" + << "\t\t" + << R"("-compress level" overrides the compression level of NetCDF variable ()" << conf_info.get_compression_level() << ") (optional).\n\n" << flush; exit(1); @@ -2470,7 +2482,7 @@ void parse_long_names() { if(!f_in) { mlog << Error << "\nparse_long_names() -> " << R"(can't open the ASCII file ") << file_name - << R"(" for reading\n\n)"; + << R"(" for reading!)" << "\n\n"; exit(1); } From 12c1eecf49a6ddaec8482bc91e0dd666fb3f0366 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Mon, 26 Aug 2024 08:23:01 -0600 Subject: [PATCH 38/41] Per #1371, update the read_aggr_data_plane() suggestion strings. --- src/tools/core/series_analysis/series_analysis.cc | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 733b2823a5..36350c8acd 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -1450,7 +1450,7 @@ void read_aggr_ctc(int n, const CTSInfo &cts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, "ALL CTC"); + var_name, R"("ALL" CTC)"); } // Populate the CTC table @@ -1484,7 +1484,7 @@ void read_aggr_mctc(int n, const MCTSInfo &mcts_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, "ALL MCTC"); + var_name, R"("ALL" MCTC)"); } // Get the n-th value @@ -1537,7 +1537,7 @@ void read_aggr_sl1l2(int n, const SL1L2Info &s_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, "ALL SL1L2"); + var_name, R"("ALL" SL1L2)"); } // Populate the partial sums @@ -1566,7 +1566,7 @@ void read_aggr_sal1l2(int n, const SL1L2Info &s_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, "ALL SAL1L2"); + var_name, R"("ALL" SAL1L2)"); } // Populate the partial sums @@ -1600,7 +1600,7 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, "ALL PCT"); + var_name, R"("ALL" PCT)"); } // Get the n-th value @@ -1742,7 +1742,7 @@ void do_climo_brier(int n, double briercl_pair, // Read aggregate data, if needed if(aggr_data.count(var_name) == 0) { aggr_data[var_name] = read_aggr_data_plane( - var_name, "the BRIERCL PSTD"); + var_name, R"(the "BRIERCL" PSTD)"); } // Get the n-th BRIERCL value From 4f78b2672737defecd6264ab7ab086d903dcb130 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Mon, 26 Aug 2024 08:34:19 -0600 Subject: [PATCH 39/41] Per #1371, ignore unneeded PCT 'THRESH_' variables both when reading and writing ALL PCT columns. --- src/tools/core/series_analysis/series_analysis.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/tools/core/series_analysis/series_analysis.cc b/src/tools/core/series_analysis/series_analysis.cc index 36350c8acd..79b46710fe 100644 --- a/src/tools/core/series_analysis/series_analysis.cc +++ b/src/tools/core/series_analysis/series_analysis.cc @@ -1591,6 +1591,9 @@ void read_aggr_pct(int n, const PCTInfo &pct_info, // Loop over the PCT columns for(int i=0; i Date: Thu, 29 Aug 2024 10:07:37 -0600 Subject: [PATCH 40/41] Per #1371, update the test named series_analysis_AGGR_CMD_LINE to include data for the F42 lead time that had previously been included for the same run in the develop branch. Note however that the timestamps in the output file for the develop branch (2012040900_to_2012041100) were wrong and have been corrected here (2012040900_to_2012041018) to match the actual data. --- internal/test_unit/xml/unit_series_analysis.xml | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/internal/test_unit/xml/unit_series_analysis.xml b/internal/test_unit/xml/unit_series_analysis.xml index 98e39cd5b8..96cda729dd 100644 --- a/internal/test_unit/xml/unit_series_analysis.xml +++ b/internal/test_unit/xml/unit_series_analysis.xml @@ -85,15 +85,17 @@ \ -fcst &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F030.grib \ &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F036.grib \ + &DATA_DIR_MODEL;/grib1/gfs_hmt/gfs_2012040900_F042.grib \ -obs &DATA_DIR_OBS;/stage4_hmt/stage4_2012041006_06h.grib \ &DATA_DIR_OBS;/stage4_hmt/stage4_2012041012_06h.grib \ + &DATA_DIR_OBS;/stage4_hmt/stage4_2012041018_06h.grib \ -aggr &OUTPUT_DIR;/series_analysis/series_analysis_CMD_LINE_APCP_06_2012040900_to_2012041000.nc \ - -out &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_CMD_LINE_APCP_06_2012040900_to_2012041012.nc \ + -out &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_CMD_LINE_APCP_06_2012040900_to_2012041018.nc \ -config &CONFIG_DIR;/SeriesAnalysisConfig \ -v 1 - &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_CMD_LINE_APCP_06_2012040900_to_2012041012.nc + &OUTPUT_DIR;/series_analysis/series_analysis_AGGR_CMD_LINE_APCP_06_2012040900_to_2012041018.nc From 15607006f3d9cd1892c8483b59f310bb83238eb0 Mon Sep 17 00:00:00 2001 From: John Halley Gotway Date: Thu, 29 Aug 2024 10:15:11 -0600 Subject: [PATCH 41/41] Per #1371, update the -aggr note to warn users about slow runtimes --- docs/Users_Guide/series-analysis.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Users_Guide/series-analysis.rst b/docs/Users_Guide/series-analysis.rst index fa015d6896..ed9f5578ab 100644 --- a/docs/Users_Guide/series-analysis.rst +++ b/docs/Users_Guide/series-analysis.rst @@ -61,7 +61,7 @@ Optional Arguments for series_analysis 6. The -aggr option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data. -.. note:: When the -aggr option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. +.. note:: When the -aggr option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. Runtimes are generally much slower when aggregating data since it requires many additional NetCDF variables containing the scalar partial sums and contingency table counts to be read and written. 7. The -paired option indicates that the -fcst and -obs file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When -paired is not used, missing or incomplete files result in a runtime error with no output file being created. When -paired is used, missing or incomplete files result in a warning with output being created using the available data.