Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding capability to compare geovals #177

Closed
asewnath opened this issue Jan 31, 2024 · 3 comments · Fixed by #184
Closed

Adding capability to compare geovals #177

asewnath opened this issue Jan 31, 2024 · 3 comments · Fixed by #184
Assignees

Comments

@asewnath
Copy link
Collaborator

asewnath commented Jan 31, 2024

We want to add the capability for comparing geovals from different systems (jedi, gsi, geos, etc.) This involves adding a new dataset reader and potentially a transform. The reader would require an obs file along with the geoval file to retrieve lat/lon information. The reader would also take in templated filenames so that it may read more than one instrument file at a time.

The new transform takes the lat/lon information from experiment and control, finds a list of indices from control that are the closest match to experiment, and then updates the experiment dataset with variables from the control dataset that are index matched to it. The new fields in the experiment dataset would look something like this: experiment_geovals::amsua_n19_from_control_geovals::vegetation_area_fraction

Potential eva configs for geoval space:

datasets:
  - name: experiment_geovals
    type: GeovalSpace
    obs_file:
      - ${data_experiment_path}/{instrument}_experiment.nc4
    geovals_file:
      - ${data_experiment_path}/{instrument}_experiment_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  - name: control_geovals
    type: GeovalSpace
    obs_file:
      - ${data_control_path}/{instrument}_control.nc4
    geovals_file:
      - ${data_control_path}/{instrument}_control_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  transforms:
    - transform: index_match
      starting_dataset: control_geovals
      match_index_to_this_dataset: experiment_geovals

@CoryMartin-NOAA Please let me know if you have any thoughts or suggestions for this new reader/transform. I had also thought to combine control and experiment into one dataset read and perform the index matching there so that there's no need for a new transform

@asewnath asewnath self-assigned this Jan 31, 2024
@CoryMartin-NOAA
Copy link
Collaborator

@asewnath I think the transform is a necessary thing. I know @weihuang-jedi was looking for something like this.

Beyond geovals, I think the new transform could be useful for two IODA obs spaces. Say you have two experiments of PE counts, so the distributions may be different, but its the same data, so we could re-index to plot. This would also be good for independent GSI vs JEDI h(x) comparisons.

For the new dataset reader, can we make it more generic than geovals? Like something like 'data file' and 'coordinate file' or something like that? This is analogous to how the FV3 RESTART files have data in one file, but the lat/lon info is in another.

@asewnath
Copy link
Collaborator Author

asewnath commented Feb 5, 2024

Thanks for the guidance @CoryMartin-NOAA. Given what you have suggested, I've modified the following proposed config file for an example of reading two sources of geoval files and the new transformer

datasets:
  - name: experiment_geovals
    type: DataFile
    data_file:
      - ${data_experiment_path}/{instrument}_experiment_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  - name: control_geovals
    type: DataFile
    data_file:
      - ${data_control_path}/{instrument}_control_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  - name: experiment_lat_lon
    group: state
    type: LatLon
    filename: ${data_input_path}/{instrument}_experiment.nc4
    variables: [lat, lon]

  - name: control_lat_lon
    group: state
    type: LatLon
    filename: ${data_input_path}/{instrument}_control.nc4
    variables: [lat, lon]

  transforms:
    - transform: index_match
      dataset_1: control_geovals
      lat_lon_1: control_lat_lon
      dataset_2: experiment_geovals
      lat_lon_2: experiment_lat_lon


I'll iterate on what makes the most sense for the transform config. Also, for the transform, lat_lon_1, lat_lon_2 would be optional arguments (case where IodaObsSpace datasets are used, etc)

@CoryMartin-NOAA
Copy link
Collaborator

Looks good, thanks @asewnath

danholdaway pushed a commit that referenced this issue Apr 19, 2024
We are adding a new reader to handle geoval files, but with the idea
that it can be generalized to other types of files. A new latlon match
transform was added using some work that @danholdaway had developed.

List of changes:
- Method to `data_collections.py` to retrieve a collection
- Generic `DataFile` reader
- `latlon_match` transform to match lat/lon coordinates from one
collection to another
- `DataFile` test yaml and some files for testing

Resolves #177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants