Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Multivariate MODE to generate object statistics for each input field requested by the user. #1283

Closed
12 of 21 tasks
dwfncar opened this issue Mar 24, 2020 · 57 comments · Fixed by #2569
Closed
12 of 21 tasks
Assignees
Labels
MET: Feature Verification priority: high High Priority requestor: DTC/T&E General DTC Testing and Evaluation work type: enhancement Improve something that it is currently doing
Milestone

Comments

@dwfncar
Copy link
Contributor

dwfncar commented Mar 24, 2020

Describe the Enhancement

I am repurposing this existing issue to better define the multi-variate MODE enhancements that are needed.

  1. While the existing MODEMultivarConfig_default file exists in the repository, it is NOT actually read by Multi-Variate MODE! In all cases, MODEConfig_default is read instead. Need to understand why, and either update MODE to read it in the right circumstances or just remove it from the repo.

  2. If MODEMultivarConfig_default is retained, modify the nc_pairs_flag setting in it to specify it as a dictionary, just like it is specified in MODEConfig_default. This is needed to make the logic of the MODE wrapper clear and consistent.

  3. Make large updates to the multi-variate logic, as described below.

  • As of MET version 10.1.0, multi-variate MODE generates output NetCDF files named f_super.nc and o_super.nc. However, it does not actually write out any object statistics.
  • Recommend that it continue to write out the super object information, but write out both fields to the standard MODE NetCDF output file instead (i.e. *_obj.nc).
  • Add a new config option that is an array of booleans with the same length as the number of forecast and observation fields: multivar_intensity_flag = [ TRUE, TRUE, FALSE ]; (default will be FALSE for all)
  • After generating the super objects, loop through the multivar_intensity_flag booleans:
    • If all multivar_intensity_flag booleans are false, run MODE once on the binary 0/1 super objects to generate geometric object statistics and write them to the output "*_obj.txt" file.
    • Otherwise, for each multivar_intensity_flag boolean flag that is true, run MODE once substituting in the raw intensity values for the corresponding fcst/obs fields. The object definition has already been performed, but we're just regenerating the statistics using the intensities for those fields. Write the output to the "*_obj.txt" output file. For any FALSE booleans, the intensity statistics will contain missing values.
    • So the "*_obj.txt" may now contain output for multiple fields with different intensity values and potentially total interest values. Need to differentiate between them in the output in the FCST_VAR and OBS_VAR columns. Recommend including the forecast and observation field names in those output columns.
    • Be sure to check the nc_pairs_flag option and write intensity values to the NetCDF file, as requested by the user.
    • Add a new config option that allows the user to change the forecast and observation field names. Currently the super object output name is ‘super’, but users may be running multivariate MODE on multiple complex events (dryline/blizzard), and it will be beneficial to have a way to differentiate between the output
      • Add multivar_name=""; to change the name of the multivariate mode output as listed in the ascii file output, for example. (default will be "super")
      • For example, if we define multivar_name = "blizzard"; and we request intensity output for visibility, our ascii file(s) should append this to the FCST_VAR and OBS_VAR names e.g. blizzard_vis/vis_blizzard (preferred order?)
        • If intensity isn’t requested for any field, the name would be blizzard for the geometric object statistics?
  • Other considerations
    • Determine whether there should be separate MODE output files for each field where intensity statistics are requested, or all output contained in a single file
      • Do what makes sense, when thinking about parallelization
    • No need to keep mode runs from the individual fields - cuts down on processing time
    • Keep this info in memory for continuing the mvmode process

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2700042 WPC

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@dwfncar dwfncar added the type: task An actionable item of work label Mar 24, 2020
@dwfncar dwfncar self-assigned this Mar 24, 2020
@dwfncar dwfncar assigned dwfncar and lindsayrblank and unassigned dwfncar Mar 24, 2020
@lindsayrblank lindsayrblank added this to the MET 9.1 milestone Mar 27, 2020
@JohnHalleyGotway JohnHalleyGotway modified the milestones: MET 9.1, MET 10.0 Jun 8, 2020
@JohnHalleyGotway JohnHalleyGotway added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Sep 10, 2020
@JohnHalleyGotway JohnHalleyGotway assigned rgbullock and unassigned dwfncar Nov 5, 2020
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Nov 5, 2020
@JohnHalleyGotway JohnHalleyGotway changed the title MODE-Multivar super objects Enhance MODE-Multivar to run entirely in memory instead of spawning individual processes for each call to MODE. Dec 3, 2020
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance MODE-Multivar to run entirely in memory instead of spawning individual processes for each call to MODE. Enhance Multivariate MODE to run entirely in memory instead of spawning individual processes for each call to MODE. Dec 3, 2020
@davidalbo
Copy link
Contributor

A visual would be good @hertneky.

@davidalbo
Copy link
Contributor

@hertneky @JohnHalleyGotway .. reality check. Is this correct?

For an individual input:
Run the mode convolution algorithm at the convolution threshold
Run mode convolution again at the merge threshold (when merge flag = BOTH or THRESH), independent of the first run.

So you could for example end up with 2 objects from the first run, but 1 from the second run (that contains both of the first run objects within it.

@hertneky
Copy link
Contributor

hertneky commented May 5, 2023

The first part sounds good to me.

I am having trouble with the last sentence. If 2 super objects are identified in the 1st run, then you would still have 2 super objects in the 2nd run, but they may or may not be merged (identified as a cluster). Am I understanding that correctly?

@davidalbo
Copy link
Contributor

davidalbo commented May 5, 2023

Untitled presentation

Hi @hertneky, This is before the superobject step, it's the individual input data.. The inner ones are the 'simple' objects (is that the right terminology?) that come from using the conv_thresh, the outer one is the simple object that comes from using a more generous threshold, the merge_thresh. For example:

{ name="WIND"; level="Z10"; conv_radius = 5; conv_thresh = >=8.9408; merge_thresh = >=6.7056; merge_flag = THRESH; },

Two objects when >=8.9408, one object when >= 6.7056.

@davidalbo
Copy link
Contributor

Untitled presentation (1)

As opposed to this where the objects get bigger but do not actually mege.

@hertneky
Copy link
Contributor

hertneky commented May 8, 2023

@davidalbo Yes these images are correct for merging. I just want to reiterate that the merge_thresh will only be used for merging. They don't make the super objects bigger in the end. The super objects are the same after the 2nd pass, but may or may not be considered part of a cluster. After the determining the super object clusters using the super_merge_thresh_object, that larger super_merge_thresh_object is not saved in any output.

@davidalbo
Copy link
Contributor

@hertneky understood. I've finally gotten the 'super_merge_thresh_object's to get created and I put in some code to output them as netCDF and look at them, and they seem reasonable. I did run into a problem with the first input in your original example. For the forecast the field is PrecipFlag, and you did not specify a merge_thresh or merge_flag. The code uses a default in that case, which is merge_flag=THRESH and merge_thresh = ">=1.25". This was causing problems because then my code tried to make bigger objects using these settings, which makes no sense. I added merge_flag=NONE explicitly for those fields, and that fixed it. I'd have thought the default would be merge_flag=NONE, but... does not seem to be the case.

@hertneky
Copy link
Contributor

@davidalbo So when merge_flag is not set, it defaults to THRESH with a value of >=1.25. Is that just for the multivariate case or does that work for single MODE as well? I would have thought the default would be NONE if not specified as well. It doesn't make sense to me to have it default to THRESH with some arbitrary value, but perhaps there was some reason behind that. I don't know.
CSNOW is binary 0/1 and PrecipFlag is categorical. So a merge thresh >=1.25 would seem problematic. We did discuss that if no merge thresh were set for a variable, then the conv_thresh would be used to create the super merge object. I see various scenarios for merge_thresh:

  1. user sets merge_flag to NONE for all variables - no merging of the super objects is performed
  2. user sets merge_flag to THRESH for all variables - merging of super objects is performed using merge_thresh of each field (assumes user has selected a threshold for merge_thresh for each field)
  3. user sets merge_flag for some of the variables, but not all - for variables where merge_flag was not set, set it to the same as the other variables? If using THRESH, set the merge_thresh = conv_thresh for that variable

@davidalbo
Copy link
Contributor

@hertneky What you say for at least 2 and 3 is what I've been implementing. Right now I'm at the exact place where I'm trying to do the merging for option 2 or 3. I think I need to be reminded exactly what the merge algorithm should do, and how that compares to what happens for merging in the non-multivar mode situation. This is so I can use existing code as much as possible. I might need a tag up, but you can try a comment if you want.

@hertneky
Copy link
Contributor

@davidalbo A tag-up may be needed. So the algorithm..

  1. Uses the same multivar_logic as creating the super-object(s) to create super-merge-object(s)
  2. Super-objects within super-merge-objects are merged (called cluster objects)
    • For singular MODE, objects within merge-objects are merged (called cluster objects)
  3. Objects not merged are simple objects
  4. The remaining logic is the same for matching simple/cluster objects (if matching is requested)
    The big difference is that with mvMODE, the merge_thresh of multiple variables is used to create the super-merge-object. If this doesn't answer your question and you need something more specific, we can tag-up.
    I am also on the hook for creating some diagrams for documentation purposes. I will do that next.

@davidalbo
Copy link
Contributor

@hertneky is it true that simple objects produced by mode have unique integer numbers associated with them? It must be true based on the output that gets produced. I need some way to tell if a super object is within a particular super merge object. The code as is right now represents super objects as binary yes/no on a grid, so there is no information about individual objects. If you agree that unique integers must be in there somewhere, I'll start backing up in the design and looking for those integers.

@hertneky
Copy link
Contributor

Yes, to visualize this for example, one of the outputs in the nc file is fcst/obs_obj_id. These are IDs assigned to each simple object. If there are 4 objects, then the unique IDs are 1-4 (left image).
For merged objects, the output is called fcst/obs_cluster_id. Objects in a cluster due to merging will share the same ID, single objects not merged have a value of -1 (right image).
Colorbars at bottom show the values. In the right image, the large object was merged with the tiny object next to it an both objects share the ID of 1 and the bottom 2 objects were not merged and have a value of -1.

This output is from the individual MODE run for visibility. This should also be applied to the super objects.

In the object ASCII output, the IDS are also listed (to the right of the images). The 1st 4 lines correspond to the images of the FCST objects. The 1st column is the simple object IDs and the 2nd column is the clusters. Objects 1/2 were clustered (merged), objects 3/4 were not part of a cluster (denoted by CF000).

Screenshot 2023-05-11 at 3 10 37 PM

@davidalbo
Copy link
Contributor

do_fcst_convolution();
do_fcst_thresholding();
do_fcst_filtering();
do_fcst_splitting();

@davidalbo
Copy link
Contributor

davidalbo commented May 30, 2023

Notes for discussion today may 30:

when merge_flag is not set, it defaults to THRESH with a value of >=1.25. This seems wrong and should be change for both multivar mode and single var mode to... 'no merging'. Correct?
Answer: Keep default as is. When it is multivariate mode, check that eaCh field has merge_flag explicitly set and error exit if not.

For the '2nd pass' (individual inputs masked to be inside super objects), which of these steps can be skipped?
do_fcst_convolution();
do_fcst_thresholding();
do_fcst_filtering();
do_fcst_splitting();
Solution: Only do the splitting. Needed to copy some data into place as is (within the engine code) to get it to work

Did I implement this, which is also part of the 2nd pass, and is it correct?:
Various scenarios for merge_thresh in multivar mode:

  1. user sets merge_flag to NONE for all variables - no merging of the super objects is performed
  2. user sets merge_flag to THRESH for all variables - merging of super objects is performed using merge_thresh of each field (assumes user has selected a threshold for merge_thresh for each field)
  3. user sets merge_flag for some of the variables, but not all - for variables where merge_flag was not set, set it to the same as the other variables? If using THRESH, set the merge_thresh = conv_thresh for that variable

I'd like to talk through the expected behavior for the case where all the intensity flags are set FALSE. It would be a mode-like run for simple objects that have no intensities, they are just binary yes/no.

@davidalbo
Copy link
Contributor

@hertneky what about merging when settings are all FALSE for intensities? It can be done and should give the same results as the 'not all FALSE' option. Does it make sense scientifically to merge?

@hertneky
Copy link
Contributor

@davidalbo It still makes sense to merge objects, even if no object intensity info is requested. The merging should be the same as if the user requested intensities.

@davidalbo
Copy link
Contributor

davidalbo commented Jun 2, 2023

@JohnHalleyGotway @hertneky I'm modifying the unit tests. With intensity flags set TRUE, I'm getting output file names like the following, all in one output directory:

mode_Fcst_ALPHA(*,*)_Obs_ALPHA(*,*)_300000L_20120410_180000V_060000A_obj.txt

The unit test mode config file has:

fcst = {

   field = [

      {
         name  = "ALPHA";
         level = "(*,*)";
         merge_flag = NONE;
      },

Previously (the way the unit test is set up now) the output files went to numbered subdirectories with file names, for example, this is the same file with the old name:

00/mode_300000L_20120410_180000V_060000A_obj.txt

I'm wondering about (*,*) in file names.

@davidalbo
Copy link
Contributor

@JohnHalleyGotway eventually I'm going to want two unit tests, one with [TRUE, TRUE, TRUE] one with [FALSE, FALSE, FALSE]. The expected output files need to be put somewhere for the unit test to be able to compare them to the test output. The output file names and organization has changed, so all of that needs updating. THis is the one piece that's not in the MET repo, is that correct?

@davidalbo
Copy link
Contributor

@hertneky Could you have different multivar_logic for forecasts and for obs? Maybe a different number of input fields for obs than forecasts? What would that do to the intensity flags? If intensity flag 2 and 3 are TRUE, right now it is fcst field 2 is compared to obs field 2, fcst field 3 compared to obs field 3. Is it good enough for now to require equal number of fcst and obs inputs, and the same multivar_logic for fcst and obs? And the comparisons as indicated?

@davidalbo davidalbo linked a pull request Jun 12, 2023 that will close this issue
15 tasks
JohnHalleyGotway added a commit that referenced this issue Jun 12, 2023
…nfig_constants.h. Make each error message consistently begin with 1 newline and end with 2. Use lookup_bool_array() defined in dictionary.cc and remove the redefinition in mode_conf_info.cc
JohnHalleyGotway added a commit that referenced this issue Jun 16, 2023
JohnHalleyGotway added a commit that referenced this issue Jun 16, 2023
Co-authored-by: John Halley Gotway <johnhg@ucar.edu>
Co-authored-by: Dave Albo <dave@seneca.rap.ucar.edu>
@hertneky
Copy link
Contributor

hertneky commented Jul 5, 2023

Release Acceptance Testing Summary

Version: METplus-5.1.0-rc1
Date: July 5, 2023
Location: cheyenne
Status (PASS/FAIL): FAIL
Description: Ran blizzard test case. Two issues: 1) When setting MODE_MULTIVAR_INTENSITY = FALSE,TRUE,TRUE in the conf, the result is 'WARNING: multivar_frontend() -> empty multivar intensity array, setting to all FALSE' (in the log, I see... export METPLUS_MULTIVAR_INTENSITY="multivar_intensity = [FALSE, TRUE, TRUE];") and 2) Setting MODE_FCST/OBS_MULTIVAR_LEVEL = L0 does not seem to replace the NA in the filename or within the FCST_LEV column in the output (e.g. mode_Fcst_bliz_NA_Obs_bliz_NA_HRRR_vs_ANALYSIS_PrecipFlag_L0_210000L_20210201_210000V_000000A_obj.txt; I do not see this being exported in the log)
Test conducted in: /glade/work/hertneky/tmp/met_rc_test

@georgemccabe
Copy link
Collaborator

georgemccabe commented Jul 5, 2023

@hertneky, we discussed this on Slack but for traceability on this thread here is what we discovered:

  1. The MET variable name is actually multivar_intensity_flag, not multivar_intensity as described in the MET documentation. PR Bugfix #2235 rename multivar_itensity to multivar_intensity_flag METplus#2236 will fix this bug.
  2. It appears the rename of multivar_units to multivar_level was made after METplus 5.1.0-rc1 was created, so it did not properly set the variable. This can also be tested using the branch from the above PR.

@hertneky
Copy link
Contributor

hertneky commented Jul 6, 2023

Release Acceptance Testing Summary

Version: METplus-5.1.0-rc1
Date: July 6, 2023
Location: cheyenne
Status (PASS/FAIL): PASS
Description: The issues mentioned in the previous testing summary have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Feature Verification priority: high High Priority requestor: DTC/T&E General DTC Testing and Evaluation work type: enhancement Improve something that it is currently doing
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants