-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cam-fv fails to exclude rttov obs from upper levels when no_obs_assim_above_level > 0 #296
Comments
sounds good, except you want to make a branch with the changes and make a pull request, right? the fixed code should go on the main branch along with release notes/change log. |
Thanks for the git workflow reminder. I lose track of that when I'm in the midst of fixing things. |
Couple of questions on this. What is special about radiance observations, that they are not being excluded? VERTISPRESSURE and height 25 hPa be excluded correctly? Can you get a test case from Gio that we can run on Cheyenne? Or share your test case so we have a reproducer for this bug. Gio is not running the latest version of DART v9.11.11, correct? |
The radiance obs above no_obs_assim_above_level were assimilated because obs_def_rttov_mod.f90 This happens only for radiance obs. Conventional obs at 25 hPa were excluded in Gio's tests. When the comparison in obs_too_high is changed from no_assim_above_level to no_obs_assim_above_level, This is what prompted moving obs_too_high out of model_interpolate and into get_close_{state,obs}, I'm highly confident that Gio is not running the latest DART. |
kevin is right about how radiance obs work, but i'll clarify that the forward operator for any obs type that used any of the available vertical types and looped trying to query the model_mod for "all" values would have failed the same way. the most common use is for a forward operator to construct locations with VERTISLEVEL (since there's no interpolation required), and then query the model_mod for values at all levels and compute something from that column of values. but any forward operators trying to call the code with multiple pressure levels or heights from a forward operator could fail in the same way. it just happened that radiance obs were the first obs with a forward operator that used this strategy to run with cam. (something like total precipitable water would do the same.) the lesson is that as kevin says, checking obs_too_high() in interpolate is a bad strategy, and instead moving that function to the get_close() routines is much better. also - the intent was that the init code would assign: no_assim_above_level = no_obs_assim_above_level, to be consistent with the other no_assim_above_... variables and the check routine would use that. when the no_assim_above_scale_height variable get added, maybe this can be changed? |
Here are instructions from Gio about setting up a test case 2 different ways. The first refers to scripts for running the ensemble of hindcasts in sequence (#53 )
The second way is outlined here.
These variables, or the derived ones, are computed in the assimilation script with NCO command
Since these added variables are not updated by the filter we need to add by hand the inflation for these fields
The code above is always in block 6 just after
Actually the mietable_eos_amsua.dat is not necessary for the test we are doing In order to create the obs_seq_ file with the vertical location I modified
[ KR: This is added functionality, which I believe is not needed for setting up a test case ].
And in fo_forward_model:
|
Is it possible to just get a tar file of the restarts, inflation files and obs_seq.out out files he is using? |
The obs_seq.out file is included in the zip file attached part way through my previous comment. I could probably get the restarts, but we have some locally that should work: Inflation files can be found in |
ok cool. I shall grab these and reproduce this with DART:main. thanks |
Gio gave me access to a github fork of the code that generated the "curiosity", which appears in the obs_seq.final files. Obs which should be excluded by cam-fv/model_mod.nml:no_obs_assim_above_level end up with a DART QC = 0 (assimilated) and differences between the priors and posteriors. The problem arises because the get_close_xxx calls from filter_assim pass back num_close = 0, but do not pass QC information. The get_close routines don't even pass back a status variable, so there's no conventional way for model_mod to signal that an ob has been excluded because of DARTQC_NOT_IN_NAMELIST. This incorrect obs_seq.final data will show up in obs space diagnostics, which will show high observations as being assimilated, when they weren't.
I've set up a small test case (@hkershaw-brown), which uses a single GPS profile (instead of rttov, following @nancycollins suggestion) and 3 members (on 9 nodes):
which includes a potential (partial?) fix. For obs higher than no_obs_assim_above_level,
which is a hard coded parameter because trying to use the QC module leads to circular dependency. Assim_tools_mod:filter_assim is modified to:
Nancy warns that we can't count on this to work because the parallelism prevents(?) the variables from being reliably available. But it worked in my small test case; the obs that are higher than no_obs_assim_above_level have QC = DARTQC_NOT_IN_NAMELIST (unless they have some other non-0 QC). The priors and posteriors are still different, but those obs will be excluded from obs space diagnostics, and for the right reason.
This case has inflation turned off, so that is not the cause of the increments. The differences in (estimated) priors and posteriors probably comes from other, accepted observations just below the no_obs_assim_above_level causing increments in this ob that's above that level. The no-inflation case has slightly different values than the inflation case. I have not explored whether this problem exists in other models. |
hi kevin, that's good sleuthing so far. here's a suggestion for the next test. make the input file only a single GPS obs, not a full GPS profile. make sure you select an obs that is above the threshold. change both the get_close_state and get_close_obs code to return 0 numbers of close state/obs for obs above the threshold (so don't use the new -1 code). yes, that means this obs "is assimilated" but it should not have any impact on any of the state. if you see differences in the state and/or differences in the prior/posterior then something else is going on and that needs to be tracked down. otherwise, it's working as intended. it would be good to establish that the code is working before trying to attribute changes to something else. |
as a design point, even if you can update QCs in the filter loop and make it work under all data layouts (with transpose, without transpose, etc), you still have to deal with the fact that each task has a different list of obs and state variables it is responsible for. without communication between tasks, which is expensive and usually forces a barrier/sync point (bad for speed), it is possible for some tasks to return an error and some not. you can say if that happens it is a bug, but it is bad design to implement something where an error in user code cannot even be detected. if your intent is to know which observations did not contribute to changing the results of the assimilation - for any reason - that might be worth discussing. perhaps the localization distance is too small, perhaps the obs type is explicitly excluded by the obs_impact tool, perhaps there is model specific code in the get_close routines to exclude them. in that case, it might be possible to add an optional call to the global sum routine to count up the overall number of state items impacted by this obs (at a performance hit). any obs with 0 impact could have a new QC value set which indicates that obs had no impact on the results of the assimilation. that could be interesting for understanding what observations do and don't matter to your results. you would have to update the diagnostic routines to take that new value into account, but it's a more general solution. |
The single ob test @nancycollins suggested shows no increments in state space or obs space. When I described the changes to filter_assim I neglected to include that the |
kevin, that code in filter does not solve the parallel problem. it only works because you wrote your model-specific get_close routines so that every task returns the same result. selecting any task would give you the same answer. if another non-owner task returned a different result this code would not catch it (and cannot without a global mpi operation). you did make something work for your very specific use case but it is not a general solution for anyone else and shouldn't be added to the main filter code. one thing i've done in design is to think about how i would document a change and critically, how someone else might use a new feature for a different purpose with a different model. if i can't do that, then it is too narrow a solution. i'd propose the cam model_mod simply return 0 from get_close for high obs. the obs then has no impact on the results. yes, it won't be marked with a different QC but other obs lower down also may not have an impact on the results and they aren't called out in the diagnostics. if that is important, then the more general solution is one i outlined above. |
throwing this out there: wrf-dart's solution to this problem is to to have an obs preprocessor. DART/models/wrf/WRF_DART_utilities/wrf_dart_obs_preprocess.f90 Lines 9 to 29 in 477409d
|
I was coming to the same conclusion about a preprocessor, as unsatisfying as that is. But before that I want to make one more attempt to identify what I'm misunderstanding. In filter_assim the SEQUENTIAL_OBS loop allegedly cycles over all observations:
Each task owns a subset of those. (Ignoring caching) Then each task calls get_close_obs, Is there, or could there be, a model_mod which knows whether an ob is owned by a given task, I understand that get_close returning num_obs = 0 leads to no impact on the assimilation, I can see how the estimated obs for the too-high ob could be needlessly updated by other obs, I'm wondering whether another solution would be to set the distances very large, (Full disclosure; I didn't write these cam-fv get_close variants or define their argument lists). |
see my comments below:
if your goal is to replicate results of previous runs, then i think the preprocessing proposal is the only one that will work. i don't see it as unsatisfying. there are 2 existing preprocessing programs already in existance (wrf and mpas) but they do so much more than you need. you can start with the obs_loop program and add code similar to what's in the cam model_mod to exclude high obs. it would run fast and compared to the overall cycle time of cam/dart, it would be in the noise.
this is not redundant. you're missing that each task has its own copy of the state and obs data. each task takes the current obs and then works independently to apply the increments. the SEQUENTIAL_OBS loop goes through every obs.
yes, because you added code to the part of get_close that is the same on each task. each task is doing exactly the same computation, independently so it works for your specific case. but only the observation is the same in the get_close call. the list of state and obs locations, types, etc are different on each task. so if you wanted to do something different with this code, where some tasks might say 0 and some might say another number, then this strategy doesn't work. so yes, you have a solution to a very narrow problem but it's in no way general.
in filter right now, different tasks do return different number of num_close values. that's where the speedup of parallelism comes in. each task has a different subset of the state. it gets passed the current ob and independently decides which parts of the subset of the state that is on its own task are close enough to be impacted by this obs. there is currently no need to know which tasks found close obs and which did not.
this is slightly the wrong way to look at this, in my opinion. would you say "an obs was assimilated" if it had no impact on the results? there are lots of ways that could happen. the cutoff distance could prevent an obs from impacting the state.
that's true but not relevant. other obs may also not contribute to the change in model state and they are marked with a QC of 0.
i'm not sure what you mean by 'identical to the actual ob'? the forward operator computes a prior and posterior estimate, and the obs itself has a value. the prior and posterior may be changed by nearby obs, or not. that's true for any number of other observations in other locations in the model.
the get_close interfaces are called by filter and must have identical argument lists for all models that are part of dart. they cannot be changed unless all models change their model_mod code to match. this only happens when there is a compelling case to add a feature that cannot be accomplished any other way and is of high priority for multiple models. if you want to schedule a video call, i'd be happy to hash out more about how the parallelism works and what can and cannot be done without communicating between tasks. but i think you shouldn't give this solution to gio because it's not standard. if sometime in the future someone installs a standard assim_tools_mod.f90 with a nonstandard cam model_mod.f90, or vice versa, bad things will happen. better to have a solution that's consistent with the main dart branch of the code. |
Sequential obs do is around all observations
This bit:
Everybody (every processor) is doing the the calculations (get_close, increments, ...) for the same observation 'i', but on their part of the state and their forward operators. This is an example with 4 processors and 5 obs:
There is pseudo code for assim_tools_mod.f90 here: For a non-prerocessor option: DART/models/cam-fv/model_mod.f90 Lines 3270 to 3271 in cb9cb51
This gives you a istatus you can return to filter assim to trigger a 'bad obs' before sequential obs do. You could have convert_vertical_obs log the too high obs.
|
Yes, if convert_all_obs_verticals_first=.true., then convert_vertical_obs already returns a my_status Comments earlier in assim_tools imply that allowing users this control is important for efficiency reasons: By the way, I finally see why returning a QC code from get_close_obs won't work in the current code. |
it seems like either a preprocessor or telling people they have to convert verts first if they are using rttov observations are your two best choices for fixes here. you'd need to put additional code in the vert convert routine to look for high obs because it doesn't do that now. a preprocessor would cleanly flag high obs and then anything that would fail the vert convert for other reasons would continue to work. but you can't use get_close for this.
good insight. the catch is that as each observation is assimilated, it computes an impact and updates not only the state but also the remaining unassimilated obs. this is because the forward operators were computed on the original state, and as each observation is assimilated it changes the state. there's a paper where jeff proves that if you apply the same update to the forward operator results before you assimilate subsequent obs you get the same answer as computing the forward operators inside the main assimilation loop as the state gets updated. bottom line is the main loop must process each ob in order to maintain the correct results. |
p.s. i like the preprocessor solution better. but if you go the vert convert route you have to be sure you only add code to the exact routine called by filter. other parts of the vert convert calling path can't return errors without putting you back into the original problem of failing to compute values for top-of-model locations needed by rttov. |
Describe the bug
Offsetting errors in model_mod allowed assimilations using rttov radiances to mostly appear to work,
but obs that should have been excluded by the namelist variable no_obs_assim_above_level were not.
Set up and run a CAM6 assimilation
No obs should be assimilated
All obs were assimilated
Which model(s) are you working with?
cesm2_1:CAM6 in models/cam-fv
Version of DART
Which version of DART are you using? Manhattan (at CMCC)
Have you modified the DART code?
Yes, to fix the bugs, only in cam-fv/model_mod.f90:
If your code changes are available on GitHub, please provide the repository.
I'm waiting for test results of the fixed code from Gio at CMCC.
Then I'll push the changes to NCAR/DART:Manhattan.
Build information
Please describe:
The text was updated successfully, but these errors were encountered: