Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance TC-Stat to write the RIRW job CTC/CTS output to a .stat output file #2425

Closed
7 of 20 tasks
JohnHalleyGotway opened this issue Jan 26, 2023 · 21 comments · Fixed by #2536
Closed
7 of 20 tasks

Enhance TC-Stat to write the RIRW job CTC/CTS output to a .stat output file #2425

JohnHalleyGotway opened this issue Jan 26, 2023 · 21 comments · Fixed by #2536
Assignees
Labels
MET: Tropical Cyclone Tools priority: high High Priority requestor: NOAA/GSL NOAA Global Systems Laboratory type: new feature Make it do something new
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 26, 2023

Describe the New Feature

The RIRW job in TC-Stat applies a categorical verification approach to verify rapid intensification or weakening events identified in A-Decks and B-Decks. The job can write CTC/CTS/MPR data to the output. However, that output is only written to the terminal or redirected to a log file. And the CTC/CTS line written do NOT contain the full 21 header columns common to all other .stat line types.

This task to add support for the -out_stat job command option to the TC-Stat tool. When supplied, reformat the custom header columns for the RIRW job type into the existing 21 .stat header columns. The challenge is finding an intuitive and logical place for each piece of metadata.

This is requested by NOAA/GSL for use during the 2023 hurricane season.

Acceptance Testing

List input data types and sources.
Describe tests required for new functionality.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.
No sub-issues required

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2792542

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Development issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added type: new feature Make it do something new requestor: NOAA/GSL NOAA Global Systems Laboratory alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle MET: Tropical Cyclone Tools priority: high High Priority labels Jan 26, 2023
@JohnHalleyGotway JohnHalleyGotway added this to the MET 11.1.0 milestone Jan 26, 2023
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance TC-Stat to write the CTC/CTS output from the RIRW job to a real .stat output file Enhance TC-Stat to write the RIRW job CTC/CTS output to a real .stat output file Jan 26, 2023
@JohnHalleyGotway
Copy link
Collaborator Author

Seth and John met on 3/30/23 to discuss this issue.

Recommend running the following Stat-Analysis job on seneca to demonstrate the desired functionality:

cd /d1/projects/MET/MET_regression/develop/NB20230330/MET-develop/test_output
../bin/stat_analysis \
-lookin met_test_scripts/grid_stat/grid_stat_APCP_24_240000L_20050808_000000V.stat \
-job aggregate_stat -line_type CTC -out_line_type CTS,CTC \
-vx_mask DTC165,DTC166 -by FCST_THRESH -out_stat ~/out.stat

And here's the similar tc_stat job to be enhanced to support the -out_stat ~/out.stat option:

../bin/tc_stat -lookin tc_pairs/alal2010.tcst -job rirw -rirw_window 12 -rirw_thresh '<=-15' -out_line_type CTC,CTS,MPR -out ~/out.txt

See the usage statement: https://met.readthedocs.io/en/latest/Users_Guide/tc-stat.html#practical-information
-out is an option for tc_stat that is specified once and used for the output from ALL jobs specified in the config file.
-out_stat is a job-specific option that can be specified separately for each job.

Found this unexpected use of -stat_row in tc_stat_job.cc. This is probably a bug that should be replaced with -out_stat:

1433    if(StatFile.length() > 0)
1434       // JHG: Probably a bug... s << "-stat_row " << StatFile << " ";
1435       s << "-out_stat " << StatFile << " ";

Look in stat_analysis_job.cc at this code. Lines 1328-1330 likely already exist in tc_stat, but you'll need to add logic similar to lines 1323-1326:

1319       //
1320       // CTC output line
1321       //
1322       else if(lt == stat_ctc) {
1323          if(job.stat_out) {
1324             write_ctc_cols(it->second.cts_info, job.stat_at,
1325                            job.stat_row++, n_header_columns);
1326          }
1327          else {
1328             at.set_entry(r, c++, (string)"CTC:");
1329             write_case_cols(it->first, at, r, c);
1330             write_ctc_cols(it->second.cts_info, at, r++, c);
1331          }
1332       }

Note in tc_stat_job.h, that StatFile is the path of the file to be written while StatOut is the actual output file stream to be written. Note that this is named stat_out for Stat-Analysis jobs:

      // Derived output statistics
      ConcatString StatFile;             // File name for output statistics
      std::ofstream    *StatOut;         // Output statistics file stream

@JohnHalleyGotway
Copy link
Collaborator Author

Psuedo code from our meeting about this on 4/6/23:

  • Add to TCStatJob class AsciiTable stat_at and int stat_row similar to the StatJob class.
  • Define TCStatJob::setup_stat_file() similar to StatJob::setup_stat_file() but only need logic for CTC and CTS output line types for now.
  • Define TCStatJobRIRW::do_stat_output() (see below)

Initial goal is to generate a .stat output file with all the header columns left empty. Once that's done, meet to discuss how they should actually be populated.

void TCStatJobRIRW::do_stat_output(ostream &out) {
   map<ConcatString,RIRWMapData,cs_cmp>::iterator it;
   int r;
   StatHdrColumns shc;
   AsciiTable stat_at;

   //
   // JHG... need to define TCStatJob::setup_stat_file() similar to STATAnalysisJob::setup_stat_file(...)
   // Also add 'AsciiTable stat_at' and 'int stat_row' to the TCStatJob base class.
   //
   setup_stat_file(1 + RIRWMap.size());

   // Will need some shc settings here
   shc.set_desc(na_str);
   shc.fcst_var("RIRW");

   mlog << Debug(2) << "Computing output for "
        << (int) RIRWMap.size() << " case(s).\n";

   //
   // Loop through the map
   //
   for(it = RIRWMap.begin(), r=1; it != RIRWMap.end(); it++) {

      //
      // Write the output STAT header columns
      //
      // TODO: actually populate the header columns in the shc object
      // shc = ...;
      // Will need other shc settings here
     shc.set_???(na_str);

      // move the setting of alpha down into the logic for each line type
      /*
      if(OutLineType.has(stat_ctc_str)) shc.set_alpha(job.out_alpha);
         write_header_cols(shc, job.stat_at, job.stat_row);
      }*/

      //
      // Initialize
      //
      c = 0;

      //
      // CTC output line
      //
      if(OutLineType.has(stat_ctc_str)) {
         shc.set_alpha(na_str);
         write_ctc_cols(it->second.Info, stat_at,
                        stat_row++, n_header_columns);
      }

      //
      // CTS output line
      //
      if(OutLineType.has(stat_cts_str)) {

         //
         // Store the alpha information in the CTSInfo object
         //
         it->second.Info.allocate_n_alpha(1);
         it->second.Info.alpha[0] = job.out_alpha; // JHG, not sure if out_alpha is configurable right now?
         shc.set_alpha(job.out_alpha);

         //
         // Compute the stats and confidence intervals for this
         // CTSInfo object
         //
         it->second.Info.compute_stats();
         it->second.Info.compute_ci();

         //
         // Write the data line
         //
         write_cts_cols(it->second.Info, 0, stat_at,
                        stat_row++, n_header_columns);
      }
   } // end for it

   return;
}

sethlinden pushed a commit that referenced this issue Apr 7, 2023
…t_row and prototype for setup_stat_file(). SL ci-skip-all
sethlinden pushed a commit that referenced this issue Apr 10, 2023
…_stat_output. In progress. SL ci-skip-all
sethlinden pushed a commit that referenced this issue Apr 12, 2023
JohnHalleyGotway added a commit that referenced this issue Apr 12, 2023
@sethlinden sethlinden removed the alert: NEED ACCOUNT KEY Need to assign an account key to this issue label Apr 13, 2023
sethlinden pushed a commit that referenced this issue Apr 14, 2023
…W:do_job to produce out file and stat file appropriately. SL ci-skip-all
sethlinden pushed a commit that referenced this issue Apr 17, 2023
…the file-stream object. Now getting output in the out.stat file. SL ci-skip-all
sethlinden pushed a commit that referenced this issue Apr 18, 2023
…at_output. Modified RIRWMapData struct. SL ci-skip-all
sethlinden pushed a commit that referenced this issue Apr 20, 2023
…RIRW process_pair() to get the new vars: Init, Lead, Valid. Worked on filling in StatHeader object in do_stat_output. SL
sethlinden pushed a commit that referenced this issue Apr 27, 2023
sethlinden pushed a commit that referenced this issue Apr 27, 2023
sethlinden pushed a commit that referenced this issue May 3, 2023
…of the shc (StatHdrColumns) object. SL ci-skip-all
@JohnHalleyGotway
Copy link
Collaborator Author

Command on seneca for testing:

./tc_stat \
-lookin /d1/projects/MET/MET_regression/develop/NB20230503/MET-develop/test_output/tc_pairs/alal2010.tcst \
-job rirw -rirw_window 12 -rirw_thresh '<=-15' -out_line_type CTC -out_stat ./out.stat

sethlinden pushed a commit that referenced this issue May 5, 2023
…ut_stat option, also added example to rirw jobs. SL
JohnHalleyGotway added a commit that referenced this issue May 8, 2023
…d stating the number of header columns as being 24. Always want to avoid those sorts of specifics because if/when that number changes, we'll almost definitely forget to update it here.wq
JohnHalleyGotway added a commit that referenced this issue May 8, 2023
JohnHalleyGotway added a commit that referenced this issue May 10, 2023
Co-authored-by: Seth Linden <linden@seneca.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@ucar.edu>
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label May 11, 2023
@JohnHalleyGotway JohnHalleyGotway linked a pull request May 11, 2023 that will close this issue
15 tasks
@jvigh
Copy link
Contributor

jvigh commented May 12, 2023

@sethlinden, I've finally had a chance to look through this. Overall, this seems reasonable.

A few comments/questions, some of which may be naïve:

  1. If BASIN is used as the VX_MASK when no other mask is supplied, will it support masking on multiple basins (e.g., if a user wanted to just run over the NHC basins, could they specify VX_MASK as AL,EP,CP)? Or is there some other way to specify this in the config?
  2. (possibly naïve comment): What exactly is being computed in this example? Where are the models "P43I,SPC3,GFSI,GHMI,HWFI,GFNI,NGPI,DSHP,LGEM,AHWI,BCLP' being set? Is this set just what is found in the file? Is event equalization being used? I'm just trying to understand what the CTC is from, and how that is getting configured in tc_stat.
  3. I mentioned this before, but it could be quite useful if the tool could support/provide a VX_MASK value based on the filter parameters (e.g., over water-only, land and water, over water and >100 km from land, etc.). That way, results could be compared between two verification protocols. But I don't know tcstat well enough -- does it already do this?

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented May 15, 2023

Note that on 5/15/2023, @OliviaOstwald-NOAA asked a question during the NOAA METplus User telecon that resulted in #2542. So she's actively running the TC-Stat -job rirw job and would make for a good person to test this functionality.

Recommend that we tag @OliviaOstwald-NOAA and @mollybsmith-noaa in the MET-11.1.0-rc1 release acceptance testing step for #2425 and #2542.

JohnHalleyGotway added a commit that referenced this issue May 15, 2023
…2425 by adding the  option. That produces output for both the AHWI and BCLP model names. The BCLP contingency table is empty which triggers this runtime error. Tweaking this newly added tests and confirming that it now runs without error demonstrate the fix.
sethlinden pushed a commit that referenced this issue May 15, 2023
* Per #2542, in the ContingencyTable class, update gheidke(), gheidke_ec(), gkuiper(), and gerrity() to return bad data for empty contingency tables rather than erroring out.

* Per #2542, modify TC-Stat config file job number 8 that was added for #2425 by adding the  option. That produces output for both the AHWI and BCLP model names. The BCLP contingency table is empty which triggers this runtime error. Tweaking this newly added tests and confirming that it now runs without error demonstrate the fix.
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented May 15, 2023

@jvigh regarding your questions from last week, here are some brief answers:

  1. When running a TC-Stat job, you have 2 choices for filtering:
    • You can filter based in the contents of the columns in the input data it reads. For example, -amodel OFCL only retains lines where the AMODEL columns contains OFCL. The input data filtering options basically comprise the top part of the TC-Stat config file. And each of those config file filtering options has a corresponding job command option. For example, providing -amodel OFCL on the command line is the same as setting amodel = [ "OFCL" ]; in the config file.
    • You can also filter spatially using out_init_mask or out_valid_mask to decide which lat/lon track point's should be retained.
  2. Yes, if you don't provide a filter to specify which models should be used, then all of them found in the input are used by default. No, not event equalized. That logic is turned off by default but can be enabled using event_equal = TRUE; in the config file or -event_equal on the command line.
  3. The water-only and landfall config options can be used to stratify which track points are retained.

If you're interested in learning more about TC-Stat, I'd suggest scheduling a working session where we step through some examples. We could demonstrate the existing functionality, and you can decide if those are sufficient to answer the questions you have. But I think working through concrete examples would be the most productive.

I'll leave it to you to schedule that if you'd like.

@OliviaOstwald-NOAA
Copy link

Hi John,

I'm not sure if this also applies to me, but I would be very interested in scheduling a working session to learn more about TC-Stat if possible.

Thanks!
-Olivia

@jvigh
Copy link
Contributor

jvigh commented May 18, 2023

@JohnHalleyGotway Thanks very much for the explanation. It's been a while since I've worked with TC-Stat, so it's helpful to have that context. I'm not sure when I'll have time in the immediate future for a work session, but if you end up having one with Olivia, please invite me.

@OliviaOstwald-NOAA
Copy link

@JohnHalleyGotway Hi John, I have been working on producing RIRW stats and have a question: is there a way to produce these stats for the entire basin? Right now, I am only getting stat output files for individual storms. Thanks!

@JohnHalleyGotway
Copy link
Collaborator Author

@OliviaOstwald-NOAA, yes certainly. RIRW stats should be able to be computed over multiple storms in a basin. It really all depends on what data you're passing to the TC-Stat tool. If the input is a single storm then that's all you get in the output. If the input is all storms in the basin for one or more years, then that's what you'll get in the output.

Honestly, there's lots of details and options to consider...

  • Take a close look at how you're calling TC-Stat and consider how much data is being passed as input in each call to the tool.
  • Take a look at the filtering options you're using in the TC-Stat config file. Perhaps the job you're running is restricting the analysis to a single storm?
  • Take a look to see if/how you're using the -by job command option. That stratifies results based on the contents of the columns you specify. For example, with -by STORM_ID you'd get output reported separately for each unique storm id found in the input.
  • And even if you do have RIRW CTC stats reported separately for each storm, you could run Stat-Analysis to aggregate multiple ones together. The CTC lines that are written by TC-Stat (or will be in MET version 11.1.0) can be further aggregated by Stat-Analysis and/or loaded into METviewer for further aggregation and plotting.

Give that some thought and let me know if any of the tools or config options don't work as you'd expect.

@OliviaOstwald-NOAA
Copy link

@JohnHalleyGotway Thanks so much! I'll review this and let you know. Enjoy the holiday weekend!

@JohnHalleyGotway JohnHalleyGotway changed the title Enhance TC-Stat to write the RIRW job CTC/CTS output to a real .stat output file Enhance TC-Stat to write the RIRW job CTC/CTS output to a .stat output file Jun 13, 2023
@mollybsmith-noaa
Copy link
Contributor

I've spent most of Saturday trying to test this issue for the upcoming release, and I couldn't get it to work. It's entirely possible that this is user error on my part, but if so then the syntax is not clear. In my TCStat config, I replaced this working line:
TC_STAT_JOB_ARGS = -job rirw -line_type TCMPR -rirw_time 24 -rirw_exact false -rirw_thresh ge30 -dump_row {OUTPUT_BASE}/{ENV[MODEL]}/{ENV[YYYYMMDDHH]}/summary/tc_rirw_job.tcst
with this new line:
TC_STAT_JOB_ARGS = -job rirw -line_type TCMPR -rirw_time 24 -rirw_exact false -rirw_thresh ge30 -out_line_type CTC -out_stat {OUTPUT_BASE}/{ENV[MODEL]}/{ENV[YYYYMMDDHH]}/ctc/tc_rirw_ctc.stat
and it now does not work. Apologies if I did something wrong. The log file is on Jet at /home/role.amb-verif/tcmet/verif/output/tc_stat_rt/nhc/logs/metplus.log.2023070821122 and the configuration file is at /home/role.amb-verif/tcmet/config/metplus/TCStat_nhc_jet_rt_test_rc2.conf.

@mollybsmith-noaa
Copy link
Contributor

Could this be an equivalent issue to the removal of the LOOP_ORDER parameter in TCPairs? At the time I understood that to be TCPairs specific, but maybe it applies to TCStat as well?

@georgemccabe
Copy link
Collaborator

@mollybsmith-noaa, It looks like the log file with the timestamp you listed is gone and there are newer log files in that directory. I see some successful runs. Were you able to resolve this issue already?

@mollybsmith-noaa
Copy link
Contributor

mollybsmith-noaa commented Jul 10, 2023 via email

@georgemccabe
Copy link
Collaborator

I don't think it is related to the same issue re: LOOP_ORDER.

It looks like the -out_stat job argument was somewhat recently added to TCStat. I see logic in the TCStat wrapper to check for -dump_row in the job args and create the parent directory if it does not already exist. There is no logic to handle this in the same way for the new -out_stat argument.

You could test that this is the issue by creating the directory that will contain the -out_stat file before running TCStat to see if that resolves the issue.

@mollybsmith-noaa
Copy link
Contributor

mollybsmith-noaa commented Jul 10, 2023 via email

@georgemccabe
Copy link
Collaborator

That's my guess but I don't know for sure without seeing if creating the directory fixes the problem or seeing an updated log file.

@mollybsmith-noaa
Copy link
Contributor

This worked perfectly, thank you. The testing is successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Tropical Cyclone Tools priority: high High Priority requestor: NOAA/GSL NOAA Global Systems Laboratory type: new feature Make it do something new
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants