Definition of Nused and N_trusted within obs_diag metrics #759

braczka · 2024-10-29T16:57:41Z

What's the issue?

The definition of Nused within the obs_diag documentation is: The number of observations that were assimilated. However, this is only true if N_trusted is false for all observations. If some observations types are trusted, the Nused is actually: The number of observations that are included in the obs_diag metrics calculation.

Where did you find the issue?

CLM-DART users have reported it, and this is consistent with my understanding.

What needs to be fixed?

At a minimum the definition of Nused should be fixed to state: The number of observation included in the metrics calculation, or something to that effect. This issue might also extend to plotting figures such as plot_rmse_evolution_xxx.m, if Nused is the variable utilized to calculate *=assimilated. Not sure yet.

Suggestions for improvement

The bigger question is what was the intent of the Nused metric and what do users find to be most useful? I don't find the number of observations considered in the metric calculation as particularly useful, and maybe the original intent of the variable was to define Nassim instead of Nused. Not clear to me if this is a coding error or a documentation error.

In general, I do find the use of N_trusted observations to be very helpful, because oftentimes observations are rejected not because they are untrustworthy, but because of outlier threshold rejection criteria based on systematic errors between the obs and model state. As a result, as the model state is adjusted closer to the observations, more observations are considered in the metrics calculation which can cause unexpected temporal behavior in rmse and bias statistics. Sometimes setting the observations to trusted is more intuitive because the temporal statistics of rmse and biase make more sense -- and are based on the same number (spatial region) of observations at all times. In my opinion, the fact that the default behavior is for the metrics to consider a 'moving window' of observations is sub-optimal.

Anything else we should know?

Maybe use this thread as an opportunity to suggest new or different DART statistics, especially now that we hope to make a transition to the pyDART code.

The text was updated successfully, but these errors were encountered:

kdraeder · 2024-11-12T16:05:08Z

I had to remind myself that trusted obs are defined for obs_diag, not for the assimilation. So it's possible for the number used for the metric calculation to be larger than the number assimilated.

I've almost never used trusted obs, but I see their value. I find knowing the number of obs actually assimilated to be useful for judging whether an assimilation is healthy. It also tells me whether the statistics being reported are reliable; a small number tells me "don't pay attention to this". This is usually confirmed by a statistic time series jumping up and down.

I'm not a fan of labeling obs "trusted" just to make the results look like I expect or good. If I truly trust all of them, that's a great position to be in. Otherwise, I could be giving too much weight to obs that shouldn't have it, which could distort my conclusions. I'd usually rather see the spin-up period and the noise caused by small numbers of obs being assimilated. Including more obs in the metric has 2 effects; it reduces the noise because of the larger number of obs, but it may increase the rmse or bias because it adds obs that are farther from the ensemble into the measurement. It would be tricky to disentangle the 2, so we're left not knowing how much of those metrics are from each effect.

Since the number used in the calculation can be a sum of trusted obs and non-trusted (I didn't confirm this in the code yet), I think that 2 numbers should be reported; the number used in the displayed metric and the number of them that are trusted. That's in addition to the "number available". Then it's the responsibility of the user to know that "trusted obs" means that obs were assimilated regardless of outlier threshhold, etc. and that other obs fit within the user's restrictions. Unfortunately, this means adding another symbol to the pictures, so plot_rmse_evolution_xxx.m would have:

rmse; curve, rarely both prior and posterior
xxx; curve, totalspread, bias, ...
number of obs available; open circle
number of obs used in the metric; asterisk (Change name to "Nmetric" or "Nstat"?)
number of trusted obs; new, symbol to be chosen

We might want switches to turn off 1 or more of the "number of" symbols. I think these should be in the plotting routines, so that we don't need to rerun obs_diag and the plotters to see pictures with(out) these symbols.

This issue is related to issue #558 and discussion #438 . The general question is "how do we communicate the way that obs are used or ignored?".

braczka · 2024-11-13T21:58:57Z

Thanks for your input @kdraeder. I wanted to make sure we were on the same page in terms of the definition of 'trusted obs'. I agree with your first statement that "trusted obs are defined in the obs_diag, not for the assimilation". This means that assigning obs as trusted is simply a post-processing decision on how the metrics are calculated and has no impact on what observations were actually assimilated.

I am trying to reconcile that with other statements that "I'm not a fan of labeling obs "trusted" just to make the results look like I expect or good." I find that setting obs to trusted actually decreases the apparent skill of the assimilation, because it accounts for observations that may not have been assimilated at all. Computing statistics on a subset of observations, that changes over time in my experience, can provide noisy and unintuitive behavior when plotting the RMSE and bias time series (e.g. plot_rmse_evolution_xxx.m). On the other hand, using trusted obs so that it is computing statistics on a fixed set of observations (assuming the number of observations are relatively constant through time) especially for assimilations where there is systematic bias between the ensemble estimate and observations, seems like a more intuitive calculation.

In terms of statistics I think we need the number of obs available, number of obs assimilated, and then potentially a flag that indicates if the obs were trusted. I might be oversimpliyfing but, If 'obs are trusted' then Nmetric = obs available and if 'obs not trusted' , then Nmetric = obs assimilated....

kdraeder · 2024-11-14T20:13:24Z

The "I'm not a fan..." comment was somewhat leftover from when I hadn't remembered that "trusted" is a post-assimilation tool. It's good to read that designating them as "trusted" often decreases the apparent skill. Now that I'm thinking about it more I see that the trusted obs that weren't assimilated are similar to withheld observations, and can be used as a quality check.

And I had forgetten that the obs space plots are for 1 ob type for each picture, so your suggestion of simply having a flag about the obs being trusted is more appropriate (and easier) than my suggestion of an additional curve. I suppose the flag could be using a different symbol for the "available obs" - bigger, brighter, more authoritative - and described in the legend as "trusted".

braczka added Documentation Improvements or additions to documentation Discussion Requires team discussion labels Oct 29, 2024

hkershaw-brown added the obs_diag label Oct 30, 2024

hkershaw-brown added the blocking blocking work label Nov 8, 2024

hkershaw-brown mentioned this issue Nov 8, 2024

sci: Definitions for statistics NCAR/pyDARTdiags#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Definition of Nused and N_trusted within obs_diag metrics #759

Definition of Nused and N_trusted within obs_diag metrics #759

braczka commented Oct 29, 2024

kdraeder commented Nov 12, 2024

braczka commented Nov 13, 2024

kdraeder commented Nov 14, 2024

Definition of Nused and N_trusted within obs_diag metrics #759

Definition of Nused and N_trusted within obs_diag metrics #759

Comments

braczka commented Oct 29, 2024

What's the issue?

Where did you find the issue?

What needs to be fixed?

Suggestions for improvement

Anything else we should know?

kdraeder commented Nov 12, 2024

braczka commented Nov 13, 2024

kdraeder commented Nov 14, 2024