Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 1673 gbeta #1857

Merged
merged 41 commits into from
Aug 18, 2021
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
e423644
Per #1673, add a distance_map.beta_value entry to all the Grid-Stat c…
JohnHalleyGotway Jul 8, 2021
caff652
Per #1673, enhance Grid-Stat to parse the new beta_value option and s…
JohnHalleyGotway Jul 8, 2021
f1d85c9
Per #1673, set the user-defined beta_value when processing the distan…
JohnHalleyGotway Jul 8, 2021
b3bbc0f
Per #1673, add 4 new DMAP columns write them. Just writing bad data v…
JohnHalleyGotway Jul 8, 2021
69f82df
Per #1673, enable beta_value to be set to a default bad data value.
JohnHalleyGotway Jul 8, 2021
c89ad5c
Per #1673 and @ericgilleland instruction (https://github.com/dtcenter…
JohnHalleyGotway Jul 9, 2021
c38a348
Per #1673, fix typo that was causing the code not to compile.
JohnHalleyGotway Jul 9, 2021
c65728a
Merge branch 'develop' into feature_1673_gbeta
JohnHalleyGotway Jul 9, 2021
5d757dd
Merge branch 'develop' into feature_1673_gbeta
JohnHalleyGotway Jul 16, 2021
1915982
Per #1673, update MET User's Guide about G/GBETA.
JohnHalleyGotway Jul 16, 2021
7cb41d5
Per #1673, report BETA_VALUE = TOTAL / 2 in the output instead of NA,…
JohnHalleyGotway Jul 16, 2021
028fe9c
Per #1673, merge in latest changes from develop.
JohnHalleyGotway Jul 23, 2021
35167fa
Per #1673, remove descriptions of Grid-Stat-specific config options f…
JohnHalleyGotway Jul 23, 2021
fbe6eb1
Per #1673, do not reset the beta_value in DMAP::clear().
JohnHalleyGotway Jul 23, 2021
6cc7695
Per #1673, log the DMAP options only after the beta_value has been de…
JohnHalleyGotway Jul 23, 2021
9e79c11
Per #1673, correct the default beta_value setting as N*N / 2.0.
JohnHalleyGotway Jul 23, 2021
50a5127
Per #1673, trying to format G-beta well.
JohnHalleyGotway Jul 23, 2021
6afa6d7
Per #1673, trying to format G-beta well.
JohnHalleyGotway Jul 23, 2021
2b104e7
Per #1673, beta is formatted as \beta and not \Beta in latex.
JohnHalleyGotway Jul 23, 2021
667d74d
Per #1673, beta is formatted as \beta and not \Beta in latex.
JohnHalleyGotway Jul 23, 2021
26477de
Per #1673, switch to defining beta_value as a function of the input g…
JohnHalleyGotway Jul 26, 2021
d4f04e5
Per #1673, push initial appendix C updates for GBETA.
JohnHalleyGotway Jul 28, 2021
1ca54ea
Per #1673, working on G and GBETA equations.
JohnHalleyGotway Jul 28, 2021
b994b67
Per #1673, working on G and GBETA equations.
JohnHalleyGotway Jul 28, 2021
9cb4c5f
Per 1673, update GBETA details in appendix C.
JohnHalleyGotway Jul 29, 2021
0493fc0
Per #1673, add Eric's info about selecting beta.
JohnHalleyGotway Jul 30, 2021
af8dc4b
Per #1673, update reference for Gilleland-2019 to Gilleland_PartI-202…
JohnHalleyGotway Aug 2, 2021
92168b8
Per 1673, fix typo.
JohnHalleyGotway Aug 2, 2021
e64b74e
Per #1673, making changes requested by Eric.
JohnHalleyGotway Aug 2, 2021
a34c89f
Per #1673, making changes requested by Eric.
JohnHalleyGotway Aug 2, 2021
fab997b
Per #1673, update references based on Eric's feedback.
JohnHalleyGotway Aug 2, 2021
63fac37
Update appendixD.rst (#1869)
ericgilleland Aug 3, 2021
0c90285
Update appendixD.rst
JohnHalleyGotway Aug 3, 2021
ee11fc1
Per #1673, making the formatting of Gilleland-2020 part I and II refe…
JohnHalleyGotway Aug 4, 2021
22e2da8
Per #1673, update Grid-Stat docs to clarify that GBETA is only comput…
JohnHalleyGotway Aug 4, 2021
f1b1317
Per #1673, add DataPlane::n_good_data() function to return a count va…
JohnHalleyGotway Aug 4, 2021
d36789b
Per #1673, update vx_statistics library code to only set gbeta/beta_v…
JohnHalleyGotway Aug 4, 2021
e299004
Per #1673, update grid_stat to mask out bad data between the fcst/obs…
JohnHalleyGotway Aug 4, 2021
0d329b2
Per #1673, since GBETA is only reported when VX_MASK = FULL, updating…
JohnHalleyGotway Aug 5, 2021
4e9f83e
Merge branch 'develop' into feature_1673_gbeta
JohnHalleyGotway Aug 5, 2021
2a16601
Merge branch 'develop' into feature_1673_gbeta
JohnHalleyGotway Aug 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions met/data/config/GridStatConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ distance_map = {
baddeley_max_dist = NA;
fom_alpha = 0.1;
zhu_weight = 0.5;
beta_value(n) = n * n / 2.0;
}

////////////////////////////////////////////////////////////////////////////////
Expand Down
2 changes: 1 addition & 1 deletion met/data/table_files/met_header_columns_V10.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ V10.1 : STAT : NBRCNT : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID
V10.1 : STAT : NBRCTC : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL FY_OY FY_ON FN_OY FN_ON
V10.1 : STAT : NBRCTS : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL BASER BASER_NCL BASER_NCU BASER_BCL BASER_BCU FMEAN FMEAN_NCL FMEAN_NCU FMEAN_BCL FMEAN_BCU ACC ACC_NCL ACC_NCU ACC_BCL ACC_BCU FBIAS FBIAS_BCL FBIAS_BCU PODY PODY_NCL PODY_NCU PODY_BCL PODY_BCU PODN PODN_NCL PODN_NCU PODN_BCL PODN_BCU POFD POFD_NCL POFD_NCU POFD_BCL POFD_BCU FAR FAR_NCL FAR_NCU FAR_BCL FAR_BCU CSI CSI_NCL CSI_NCU CSI_BCL CSI_BCU GSS GSS_BCL GSS_BCU HK HK_NCL HK_NCU HK_BCL HK_BCU HSS HSS_BCL HSS_BCU ODDS ODDS_NCL ODDS_NCU ODDS_BCL ODDS_BCU LODDS LODDS_NCL LODDS_NCU LODDS_BCL LODDS_BCU ORSS ORSS_NCL ORSS_NCU ORSS_BCL ORSS_BCU EDS EDS_NCL EDS_NCU EDS_BCL EDS_BCU SEDS SEDS_NCL SEDS_NCU SEDS_BCL SEDS_BCU EDI EDI_NCL EDI_NCU EDI_BCL EDI_BCU SEDI SEDI_NCL SEDI_NCU SEDI_BCL SEDI_BCU BAGSS BAGSS_BCL BAGSS_BCU
V10.1 : STAT : GRAD : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL FGBAR OGBAR MGBAR EGBAR S1 S1_OG FGOG_RATIO DX DY
V10.1 : STAT : DMAP : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL FY OY FBIAS BADDELEY HAUSDORFF MED_FO MED_OF MED_MIN MED_MAX MED_MEAN FOM_FO FOM_OF FOM_MIN FOM_MAX FOM_MEAN ZHU_FO ZHU_OF ZHU_MIN ZHU_MAX ZHU_MEAN
V10.1 : STAT : DMAP : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL FY OY FBIAS BADDELEY HAUSDORFF MED_FO MED_OF MED_MIN MED_MAX MED_MEAN FOM_FO FOM_OF FOM_MIN FOM_MAX FOM_MEAN ZHU_FO ZHU_OF ZHU_MIN ZHU_MAX ZHU_MEAN G GBETA BETA_VALUE
V10.1 : STAT : ORANK : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT OBS_LON OBS_LVL OBS_ELV OBS PIT RANK N_ENS_VLD (N_ENS) ENS_[0-9]* OBS_QC ENS_MEAN CLIMO_MEAN SPREAD ENS_MEAN_OERR SPREAD_OERR SPREAD_PLUS_OERR CLIMO_STDEV
V10.1 : STAT : PCT : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL (N_THRESH) THRESH_[0-9]* OY_[0-9]* ON_[0-9]*
V10.1 : STAT : PJC : VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL (N_THRESH) THRESH_[0-9]* OY_TP_[0-9]* ON_TP_[0-9]* CALIBRATION_[0-9]* REFINEMENT_[0-9]* LIKELIHOOD_[0-9]* BASER_[0-9]*
Expand Down
25 changes: 24 additions & 1 deletion met/docs/Users_Guide/appendixC.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1134,7 +1134,7 @@ A mathematical metric, :math:`m(A,B)\geq 0`, must have the following three prope

The first establishes that a perfect score is zero and that the only way to obtain a perfect score is if the two sets are identical according to the metric. The second requirement ensures that the order by which the two sets are evaluated will not change the result. The third property ensures that if *C* is closer to *A* than *B* is to *A*, then :math:`m(A,C) < m(A,B)`.

It has been argued in :ref:`Gilleland (2019) <Gilleland-2019>` that the second property of symmetry is not necessarily an important quality to have for a summary measure for verification purposes because lack of symmetry allows for information about false alarms and misses.
It has been argued in :ref:`Gilleland (2017) <Gilleland-2017>` that the second property of symmetry is not necessarily an important quality to have for a summary measure for verification purposes because lack of symmetry allows for information about false alarms and misses.

The results of the distance map verification approaches that are included in the Grid-Stat tool are summarized using a variety of measures. These measures include Baddeley's :math:`\Delta` Metric, the Hausdorff Distance, the Mean-error Distance, Pratt's Figure of Merit, and Zhu's Measure. Their equations are listed below.

Expand Down Expand Up @@ -1205,6 +1205,29 @@ where MED *(A,B)* is as in the Mean-error distance, *N* is the total number of g

The range for ZHU is 0 to infinity, with a score of 0 indicating a perfect forecast.

.. _App_C-gbeta:

:math:`G` and :math:`G_\beta`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Called "G" and "GBETA" in the DMAP output :numref:`table_GS_format_info_DMAP`

See :numref:`grid-stat_gbeta` for a description.

Let :math:`y = {y_1}{y_2}` where :math:`y_1 = n_A + n_B - 2n_{AB}`, and :math:`y_2 = MED(A,B) \cdot n_B + MED(B,A) \cdot n_A`, with the mean-error distance (:math:`MED`) as described above, and where :math:`n_{A}`, :math:`n_{B}`, and :math:`n_{AB}` are the number of events within event areas *A*, *B*, and the intersection of *A* and *B*, respectively.

The :math:`G` performance measure is given by

.. math:: G(A,B) = y^{1/3}

and the :math:`G_\beta` performance measure is given by

.. math:: G_\beta(A,B) = max\{1-\frac{y}{\beta}, 0\}

where :math:`\beta > 0` is a user-chosen parameter with a default value of :math:`n^2 / 2.0` with :math:`n` equal to the number of points in the domain. The square-root of :math:`G` will give units of grid points, where :math:`y^{1/3}` gives units of grid points squared.

The range for :math:`G_\beta` is 0 to 1, with a score of 1 indicating a perfect forecast.

Calculating Percentiles
_______________________

Expand Down
5 changes: 3 additions & 2 deletions met/docs/Users_Guide/appendixD.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,9 @@ All other verification scores with CIs in MET must be obtained through bootstrap

5. Calculate CIs for the parameters directly from the sample (see text below for more details)

Typically, a simple random sample is taken for step 2, and that is how it is done in MET. As an example of what happens in this step, suppose our sample is :math:`X_1,X_2,X_3,X_4`. Then, one possible replicate might be :math:`X_2,X_2,X_2,X_4`. Usually one samples :math:`m = n` points in this step, but there are cases where one should use :math:`m < n`. For example, when the underlying distribution is heavy-tailed, one should use a smaller size m than n (e.g., the closest integer value to the square root of the original sample size).
Typically, a simple random sample is taken for step 2, and that is how it is done in MET. As an example of what happens in this step, suppose our sample is :math:`X_1,X_2,X_3,X_4`. Then, one possible replicate might be :math:`X_2,X_2,X_2,X_4`. Usually one samples :math:`m = n` points in this step, but there are cases where one should use :math:`m < n`. For example, when the underlying distribution is heavy-tailed, one should use a smaller size m than n (e.g., the closest integer value to the square root of the original sample size). See :ref:`Gilleland (2020, part II) <Gilleland_PartII-2020>` for considerably more information about the issues with estimators that follow a heavy tailed distribution and the closely related issue of bootstrapping extreme-valued estimators, such as the maximum, in the atmospheric science domain.

There are numerous ways to construct CIs from the sample obtained in step 4. MET allows for two of these procedures: the percentile and the BCa. The percentile is the most commonly known method, and the simplest to understand. It is merely the :math:`\alpha / 2` and :math:`1 - \alpha / 2` percentiles from the sample of statistics. Unfortunately, however, it has been shown that this interval is too optimistic in practice (i.e., it doesn't have accurate coverage). One solution is to use the BCa method, which is very accurate, but it is also computationally intensive. This method adjusts for bias and non-constant variance, and yields the percentile interval in the event that the sample is unbiased with constant variance.

If there is dependency in the sample, then it is prudent to account for this dependency in some way. One method that does not make a lot of assumptions is circular block bootstrapping. This is not currently implemented in MET, but will be available in a future release. At that time, the method will be explained more fully here, but until then consult :ref:`Gilleland (2010) <Gilleland-2010>` for more details.
If there is dependency in the sample, then it is prudent to account for this dependency in some way. :ref:`Gilleland (2010) <Gilleland-2010>` describes the bootstrap procedure, along with the above-mentioned parametric methods, in more detail specifically for the verification application. If there is dependency in the sample, then it is prudent to account for this dependency in some way (see :ref:`Gilleland (2020, part I) <Gilleland_PartI-2020>` part I for an in-depth discussion of bootstrapping in the competing forecast verification domain). One method that is particularly appropriate for serially dependent data is the circular block resampling procedure for step 2.

132 changes: 1 addition & 131 deletions met/docs/Users_Guide/config_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -413,7 +413,7 @@ e.g. model = "GFS";

model = "WRF";

.._desc:
.. _desc:

:ref:`desc <desc>`

Expand Down Expand Up @@ -1615,136 +1615,6 @@ This dictionary may include the following entries:
];
}

.. _nbrhd:

:ref:`nbrhd <nbrhd>`

The "nbrhd" entry is a dictionary that is very similar to the "interp"
entry. It specifies information for computing neighborhood statistics in
Grid-Stat. This dictionary may include the following entries:

* The "field" entry specifies to which field(s) the computation of
fractional coverage should be applied. Grid-Stat processes each
combination of categorical threshold and neighborhood width to
derive the fractional coverage fields from which neighborhood
statistics are calculated. Users who have computed fractional
coverage fields outside of MET can use this option to disable
these computations. Instead, the raw input values will be
used directly to compute neighborhood statistics:

* "BOTH" to compute fractional coverage for both the forecast
and the observation fields (default).

* "FCST" to only process the forecast field.

* "OBS" to only process the observation field.

* "NONE" to process neither field.

* The "vld_thresh" entry is described above.

* The "shape" entry defines the shape of the neighborhood.
Valid values are "SQUARE" or "CIRCLE"

* The "width" entry is as described above, and must be odd.

* The "cov_thresh" entry is an array of thresholds to be used when
computing categorical statistics for the neighborhood fractional
coverage field.

.. code-block:: none

nbrhd = {
field = BOTH;
vld_thresh = 1.0;
shape = SQUARE;
width = [ 1 ];
cov_thresh = [ >=0.5 ];
}

.. _fourier:

:ref:`fourier <fourier>`

The "fourier" entry is a dictionary which specifies the application of the
Fourier decomposition method. It consists of two arrays of the same length
which define the beginning and ending wave numbers to be included. If the
arrays have length zero, no Fourier decomposition is applied. For each array
entry, the requested Fourier decomposition is applied to the forecast and
observation fields. The beginning and ending wave numbers are indicated in
the MET ASCII output files by the INTERP_MTHD column (e.g. WV1_0-3 for waves
0 to 3 or WV1_10 for only wave 10). This 1-dimensional Fourier decomposition
is computed along the Y-dimension only (i.e. the columns of data). It is only
defined when each grid point contains valid data. If either input field
contains missing data, no Fourier decomposition is computed.

The available wave numbers start at 0 (the mean across each row of data)
and end at (Nx+1)/2 (the finest level of detail), where Nx is the X-dimension
of the verification grid:

* The "wave_1d_beg" entry is an array of integers specifying the first
wave number to be included.

* The "wave_1d_end" entry is an array of integers specifying the last
wave number to be included.

.. code-block:: none

fourier = {
wave_1d_beg = [ 0, 4, 10 ];
wave_1d_end = [ 3, 9, 20 ];
}

.. _gradient:

:ref:`gradient <gradient>`

The "gradient" entry is a dictionary which specifies the number and size of
gradients to be computed. The "dx" and "dy" entries specify the size of the
gradients in grid units in the X and Y dimensions, respectively. dx and dy
are arrays of integers (positive or negative) which must have the same
length, and the GRAD output line type will be computed separately for each
entry. When computing gradients, the value at the (x, y) grid point is
replaced by the value at the (x+dx, y+dy) grid point minus the value at
(x, y).

This configuration option may be set separately in each "obs.field" entry.

.. code-block:: none

gradient = {
dx = [ 1 ];
dy = [ 1 ];
}

.. _distance_map:

:ref:`distance_map <distance_map>`

The "distance_map" entry is a dictionary containing options related to the
distance map statistics in the DMAP output line type. The "baddeley_p" entry
is an integer specifying the exponent used in the Lp-norm when computing the
Baddeley Delta metric. The "baddeley_max_dist" entry is a floating point
number specifying the maximum allowable distance for each distance map. Any
distances larger than this number will be reset to this constant. A value of
NA indicates that no maximum distance value should be used. The "fom_alpha"
entry is a floating point number specifying the scaling constant to be used
when computing Pratt's Figure of Merit. The "zhu_weight" specifies a value
between 0 and 1 to define the importance of the RMSE of the binary fields
(i.e. amount of overlap) versus the mean-error distance (MED). The default
value of 0.5 gives equal weighting.

This configuration option may be set separately in each "obs.field" entry.

.. code-block:: none

distance_map = {
baddeley_p = 2;
baddeley_max_dist = NA;
fom_alpha = 0.1;
zhu_weight = 0.5;
}

.. _land_mask:

:ref:`land_mask <land_mask>`
Expand Down
Binary file added met/docs/Users_Guide/figure/grid-stat_fig6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading