-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definition of Nused and N_trusted within obs_diag metrics #759
Comments
I had to remind myself that trusted obs are defined for obs_diag, not for the assimilation. So it's possible for the number used for the metric calculation to be larger than the number assimilated. I've almost never used trusted obs, but I see their value. I find knowing the number of obs actually assimilated to be useful for judging whether an assimilation is healthy. It also tells me whether the statistics being reported are reliable; a small number tells me "don't pay attention to this". This is usually confirmed by a statistic time series jumping up and down. I'm not a fan of labeling obs "trusted" just to make the results look like I expect or good. If I truly trust all of them, that's a great position to be in. Otherwise, I could be giving too much weight to obs that shouldn't have it, which could distort my conclusions. I'd usually rather see the spin-up period and the noise caused by small numbers of obs being assimilated. Including more obs in the metric has 2 effects; it reduces the noise because of the larger number of obs, but it may increase the rmse or bias because it adds obs that are farther from the ensemble into the measurement. It would be tricky to disentangle the 2, so we're left not knowing how much of those metrics are from each effect. Since the number used in the calculation can be a sum of trusted obs and non-trusted (I didn't confirm this in the code yet), I think that 2 numbers should be reported; the number used in the displayed metric and the number of them that are trusted. That's in addition to the "number available". Then it's the responsibility of the user to know that "trusted obs" means that obs were assimilated regardless of outlier threshhold, etc. and that other obs fit within the user's restrictions. Unfortunately, this means adding another symbol to the pictures, so plot_rmse_evolution_xxx.m would have:
We might want switches to turn off 1 or more of the "number of" symbols. I think these should be in the plotting routines, so that we don't need to rerun obs_diag and the plotters to see pictures with(out) these symbols. This issue is related to issue #558 and discussion #438 . The general question is "how do we communicate the way that obs are used or ignored?". |
Thanks for your input @kdraeder. I wanted to make sure we were on the same page in terms of the definition of 'trusted obs'. I agree with your first statement that "trusted obs are defined in the obs_diag, not for the assimilation". This means that assigning obs as trusted is simply a post-processing decision on how the metrics are calculated and has no impact on what observations were actually assimilated. I am trying to reconcile that with other statements that "I'm not a fan of labeling obs "trusted" just to make the results look like I expect or good." I find that setting obs to trusted actually decreases the apparent skill of the assimilation, because it accounts for observations that may not have been assimilated at all. Computing statistics on a subset of observations, that changes over time in my experience, can provide noisy and unintuitive behavior when plotting the RMSE and bias time series (e.g. plot_rmse_evolution_xxx.m). On the other hand, using trusted obs so that it is computing statistics on a fixed set of observations (assuming the number of observations are relatively constant through time) especially for assimilations where there is systematic bias between the ensemble estimate and observations, seems like a more intuitive calculation. In terms of statistics I think we need the number of obs available, number of obs assimilated, and then potentially a flag that indicates if the obs were trusted. I might be oversimpliyfing but, If 'obs are trusted' then Nmetric = obs available and if 'obs not trusted' , then Nmetric = obs assimilated.... |
The "I'm not a fan..." comment was somewhat leftover from when I hadn't remembered that "trusted" is a post-assimilation tool. It's good to read that designating them as "trusted" often decreases the apparent skill. Now that I'm thinking about it more I see that the trusted obs that weren't assimilated are similar to withheld observations, and can be used as a quality check. And I had forgetten that the obs space plots are for 1 ob type for each picture, so your suggestion of simply having a flag about the obs being trusted is more appropriate (and easier) than my suggestion of an additional curve. I suppose the flag could be using a different symbol for the "available obs" - bigger, brighter, more authoritative - and described in the legend as "trusted". |
What's the issue?
The definition of
Nused
within the obs_diag documentation is: The number of observations that were assimilated. However, this is only true ifN_trusted
is false for all observations. If some observations types are trusted, theNused
is actually: The number of observations that are included in the obs_diag metrics calculation.Where did you find the issue?
CLM-DART users have reported it, and this is consistent with my understanding.
What needs to be fixed?
At a minimum the definition of
Nused
should be fixed to state: The number of observation included in the metrics calculation, or something to that effect. This issue might also extend to plotting figures such as plot_rmse_evolution_xxx.m, ifNused
is the variable utilized to calculate*=assimilated
. Not sure yet.Suggestions for improvement
The bigger question is what was the intent of the
Nused
metric and what do users find to be most useful? I don't find the number of observations considered in the metric calculation as particularly useful, and maybe the original intent of the variable was to defineNassim
instead ofNused
. Not clear to me if this is a coding error or a documentation error.In general, I do find the use of
N_trusted
observations to be very helpful, because oftentimes observations are rejected not because they are untrustworthy, but because of outlier threshold rejection criteria based on systematic errors between the obs and model state. As a result, as the model state is adjusted closer to the observations, more observations are considered in the metrics calculation which can cause unexpected temporal behavior in rmse and bias statistics. Sometimes setting the observations to trusted is more intuitive because the temporal statistics of rmse and biase make more sense -- and are based on the same number (spatial region) of observations at all times. In my opinion, the fact that the default behavior is for the metrics to consider a 'moving window' of observations is sub-optimal.Anything else we should know?
Maybe use this thread as an opportunity to suggest new or different DART statistics, especially now that we hope to make a transition to the pyDART code.
The text was updated successfully, but these errors were encountered: