Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance TC-Gen to verify NHC tropical weather outlook shapefiles. Refine logic to prevent rounding shapefile points to the nearest grid point. #1810

Closed
20 of 21 tasks
JohnHalleyGotway opened this issue May 24, 2021 · 5 comments · Fixed by #2005 or #2086
Assignees
Labels
MET: Probability Verification requestor: DTC/T&E General DTC Testing and Evaluation work type: new feature Make it do something new
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented May 24, 2021

Describe the New Feature

tc_gen_probabilistic_algorithm_v2.pdf

Note: During development, an issue with the handling of the (lat, lon) shapefile points was discovered. When applying them to a grid, after converting from (lat, lon) to grid (x, y), the (x, y) points were rounded to the nearest grid point. There is no specific need to do this rounding and it has been removed in this feature branch. This results in changes to the output of shapefile masking in gen_vx_mask. Skipping the rounding step produces a more accurate result.

Please see the attached slides to illustrate 2 main changes that are required for TC-Genesis verification. This issue describes the second of those 2 enhancements. Enhance MET to verify the NHC tropical weather outlook files.

The logic for this verification is described in the attached PDF. The task is to read a custom ASCII file which summarizes probability forecasts through time associated with each disturbance. If the disturbance did eventually develop into a storm, then the corresponding BEST track id is listed for that storm. If not, a sequence of 9's replaces that BEST track id.

For each line, the columns indicate the probability of storm development within 48, 120, and 168 hours, although those timesteps are hard-coded and NOT actually included in the metadata anywhere. Note that earlier versions of these files only had columns for 48 and 120 hours. So the tools should support 1, 2, or 3 numeric columns of probabilities, followed by a column for forecaster initials. The result should be probabilistic contingency table counts and statistics.

@halperin-erau has provided some sample data containing these tropical weather outlook summary files, which are only used internally within NHC. There are 2 reasonable implementation options... enhance tc_gen to process these files via a new command line option, or create an entirely new tool for this novel data format.

The advantage to the former is reusing many config options that would be needed. The advantage to the latter is avoiding confusion by users outside of NHC would won't have access to data in this format anyway.

However, as of May 2021, their format is still under development. In the existing and historical versions of these files, both the lat,lon location and valid timestamps are absent. If any of those columns are missing from the input data, tc_gen should print a warning message and ignore that input.

Be sure to subset output by basin, time window, and perhaps forecaster initials.

Acceptance Testing

List input data types and sources.
Describe tests required for new functionality.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

The project for 7790901 technically ends August, 2021. However, @halperin-erau plans to request a 6-month no-cost extension.

Funding Source

7790901

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway
Copy link
Collaborator Author

From @halperin-erau on 6/7/21:

I contacted NHC about verifying the GIS shapefiles or the text files for the TWO probabilistic genesis forecasts. They were interested in verifying the TWO genesis forecasts from the GIS shapefiles. It sounds like that would be a new capability for them to use if we add it to TC-Gen.

So, I propose that we verify the GIS shapefiles using the following logic:
Read each "areas" shape/layer in the shapefile.
For each area:
Read the forecast genesis probability information.
Determine in which basin the area occurs.
Use the BEST/b-decks to determine whether the BEST genesis point (user-defined as first "TD", "TS", etc.) is within the TWO area AND whether the BEST genesis time is within X hours of the TWO issuance time.
If yes, forecast is a HIT.
If no, forecast is a FALSE ALARM.
John, I think you already downloaded a .zip file from NHC with the shapefile information. Additional sample data are available in the TWO archive:

https://www.nhc.noaa.gov/archive/xgtwo/gtwo_archive_list.php?basin=atl

I'm happy to have another telecon to discuss if it would be helpful. I indicated to NHC that if we verify the shapefiles, we may drop support for the TWO text files (i.e., component #2 in the PowerPoint we went over during the last telecon).

@JohnHalleyGotway JohnHalleyGotway changed the title Enhance MET to verify the NHC tropical weather outlook files. Enhance MET to verify the graphical tropical weather outlook shapefiles. Oct 28, 2021
@JohnHalleyGotway
Copy link
Collaborator Author

Here's a screenshot from:
https://www.nhc.noaa.gov/archive/xgtwo/gtwo_archive.php?current_issuance=202109030233&basin=atlc&fdays=5

Screen Shot 2021-11-22 at 9 38 03 AM

The corresponding shapefiles found in:
https://www.nhc.noaa.gov/archive/xgtwo/atl/202109030233/gtwo_shapefiles.zip
Contain:

  • gtwo_areas_areas... has the (lat, lon) points for the boundary of each region
  • gtwo_lines... unclear... assume it is not needed/useful
  • gtwo_points... has the (lat, lon) location of the 'X' for each shape.
  • two_atl_text_202109022339.rtf is an AT basin description:
    Shape 1:
* Formation chance through 48 hours...low...10 percent. 
* Formation chance through 5 days...low...20 percent.

Shape 2:

* Formation chance through 48 hours...low...30 percent. 
* Formation chance through 5 days...low...30 percent.
  • two_pac_text_202109022339.rtf is an EP basin description:
    Shape 1:
* Formation chance through 48 hours...low...near 0 percent.
* Formation chance through 5 days...low...20 percent.

This corresponds to the contents of the dbf file (gtwo_areas_202109022339.dbf):

Record 0 ...
|     = "Atlantic"
|     = "1"
|     = "10%"
|     = "Low"
|     = "20%"
|     = "Low"
Record 1 ...
|     = "Atlantic"
|     = "2"
|     = "30%"
|     = "Low"
|     = "30%"
|     = "Low"
Record 2 ...
|     = "East Pacific"
|     = "1"
|     = "0%"
|     = "Low"
|     = "20%"
|     = "Low"

So assume dbf probability 1 is for 48 hours and probability 2 is for 5 days.

JohnHalleyGotway added a commit that referenced this issue Nov 22, 2021
@JohnHalleyGotway
Copy link
Collaborator Author

Per @halperin-erau, if only 1 probability is listed, assume its for 2 days. The second probability is for 5 days, and third is for 7 days.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Dec 3, 2021

From @halperin-erau:
I was thinking more about the verification logic for the gtwo_area*.shp files, and I think it will be simpler than the logic we used for the DEV/OPS methods.

Unlike the deterministic forecasts and the probabilistic forecasts in e-deck format, the gtwo_area*.shp files do not have a specific forecast genesis point location or valid time. Therefore, we cannot use the same matching logic that we employed for the DEV/OPS methods. Instead, I suggest the following:

  • Define genesis as the first entry of some user-defined categories in the b-decks (e.g., "TD", "TS" as we have done for DEV/OPS).

  • For each genesis area:

    • Record the forecast init/issuance time
    • Search the b-decks to determine if genesis occurred within the area shape AND within 0-2/5/7 days of the forecast init/issuance time
      • If yes --> HIT
      • If no --> FALSE ALARM
    • Include a post-genesis discard flag for cases where genesis occurs within the area shape at or before the forecast init/issuance time. This scenario could occur if NHC moves the genesis time earlier in their post-season analysis. For example, if NHC operationally declares genesis on 2021-08-15_12Z, then there will be gtwo_area*.shp files for 2021-08-15_12Z, 2021-08-15_06Z, etc. However, if NHC moves the genesis time to 2021-08-15_06Z in their post-season analysis, then the gtwo_area*.shp files at 2021-08-15_12Z and 2021-08-15_06Z should be discarded (if flag is set to TRUE).
    • This could become logically challenging if there are genesis events that occur in the same area of the shape, but occurred well before the GTWO issuance time. For example, consider the red genesis area in the below GTWO:

unnamed

  • Let's say that genesis occurs within the red area at 12.0N, 30.0W on 2021-08-07_00Z (i.e., a forecast HIT within the 2/5/7-day windows). However, if the post-genesis discard flag is set to TRUE, and genesis also occurred within the red area earlier in the season (e.g., at 15.0N, 35.0W on 2021-07-01_00Z), then this forecast could be erroneously discarded. Therefore, I suggest that there is a temporal window that only discards GTWO forecasts if genesis occurs within the shape/area 0-2 days before the GTWO forecast init/issuance time.

Let me know if you have any questions or concerns about this logic.

JohnHalleyGotway added a commit that referenced this issue Jan 5, 2022
…esis probabilities. It was a copy/paste bug.
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jan 5, 2022

@halperin-erau I did some more digging and think that my logic is not quite yet sufficient. But I'd like you to confirm.

I have 2 questions.

(1) Should I add logic to get rid of apparent "duplicates"?

Running with all NHC 2021 shapefiles from the atl, cpac, and epac basins, tc_gen processes a total of 3913 shapes. However, I believe that only 1164 of them are unique. By way of example, here's some log output showing the following shape appearing in 5 different files (see below for more details):

Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.

I don't know what triggers NHC to publish (and re-publish) these shapes, but the data suggests that all shapes ACTIVE at that time are included. So I recommend that I add logic to avoid these duplicates, making sure not to score them more than once.

(2) The current logic rounds the hours and minutes UP to the next hour and interprets that as the "issue" time. So when looking for a BEST track genesis match, we round up to the next hour and then look for genesis events within 48 and 120 hours of that time. Is this good logic or should I be using the actual hours and minutes listed without any rounding?

Thanks,
John

FYI:
Here's the complete log showing the same shape in 5 files (2 in atl, 2 in epac, and 1 in cpac). Note that 4 of them have the same rounded issue time of 20211111_000000 and the last one is for 6 hours later (20211111_060000):

DEBUG 4: [File 752 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/atl/202111102341/gtwo_areas_202111102340.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 753 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/atl/202111102353/gtwo_areas_202111102353.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 1477 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/cpac/202111110018/gtwo_areas_202111102353.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 2291 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/epac/202111110430/gtwo_areas_202111102353.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 2292 of 2372]: Found 1 records with issue time 20211111_060000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/epac/202111110512/gtwo_areas_202111110511.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.

JohnHalleyGotway added a commit that referenced this issue Jan 6, 2022
…, 06, 12, or 18) rather than the nearest hour.
JohnHalleyGotway added a commit that referenced this issue Jan 7, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 7, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 7, 2022
…c_util library files. These classes are needed by TC-Gen to store all the genesis shapes it reads. We need the ability to check for and either ignore or update duplicates. Verifying each shape separately as it is read does not suffice.
JohnHalleyGotway added a commit that referenced this issue Jan 7, 2022
…andle the new GenShapeInfo objects. TCGenVxOpt applies the configuration options to filter them, and Nx2 contingency tables can be populated with them.
JohnHalleyGotway added a commit that referenced this issue Jan 7, 2022
…efile vx. Rather than processing each shape as its read, store them in the new GenShapeInfo and GenShapeInfo array classes. While reading them, check for duplicates. After storing all the unduplicated shapes in GenShapeInfo, apply each config file filter to subset and score them.
@JohnHalleyGotway JohnHalleyGotway linked a pull request Jan 7, 2022 that will close this issue
14 tasks
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance MET to verify the graphical tropical weather outlook shapefiles. Enhance TC-Gen to verify the graphical tropical weather outlook shapefiles and fix logic to not round shapefile points to the nearest grid point. Jan 11, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance TC-Gen to verify the graphical tropical weather outlook shapefiles and fix logic to not round shapefile points to the nearest grid point. Enhance TCGen to verify NHC graphical tropical weather outlook shapefiles. Fix logic to prevent rounding shapefile points to the nearest grid point. Jan 15, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance TCGen to verify NHC graphical tropical weather outlook shapefiles. Fix logic to prevent rounding shapefile points to the nearest grid point. Enhance TCGen to verify NHC tropical weather outlook shapefiles. Fix logic to prevent rounding shapefile points to the nearest grid point. Jan 15, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance TCGen to verify NHC tropical weather outlook shapefiles. Fix logic to prevent rounding shapefile points to the nearest grid point. Enhance TCGen to verify NHC tropical weather outlook shapefiles. Refine logic to prevent rounding shapefile points to the nearest grid point. Jan 15, 2022
JohnHalleyGotway added a commit that referenced this issue Mar 4, 2022
… multiple inputs. Its call to AsciiTable::expand() was not correct. Added logic to determine the actual required output size.
@JohnHalleyGotway JohnHalleyGotway linked a pull request Mar 4, 2022 that will close this issue
15 tasks
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance TCGen to verify NHC tropical weather outlook shapefiles. Refine logic to prevent rounding shapefile points to the nearest grid point. Enhance TC-Gen to verify NHC tropical weather outlook shapefiles. Refine logic to prevent rounding shapefile points to the nearest grid point. Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Probability Verification requestor: DTC/T&E General DTC Testing and Evaluation work type: new feature Make it do something new
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants