Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reading facets from ESGF search results #1920

Merged
merged 3 commits into from
Feb 13, 2023
Merged

Conversation

bouweandela
Copy link
Member

@bouweandela bouweandela commented Feb 8, 2023

Description

Improve the way facets are read from the ESGF search results and use them to create the local directory structure. This results in more cases where the facets are available and fixes an issue with the obs4MIPs directory where a . in the dataset name was interpreted as a / when creating the local file path.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

To try out the improved facets, run

from esmvalcore.esgf import find_files

files = find_files(
    short_name='tas',
    mip='Amon',
    project='CMIP5',
    dataset='*',
    institute='*',
    ensemble='r1i1p1',
    exp='historical',
    timerange='2000/2000',
)

for file in files:
    print(file)
    print(file.facets)

and note that with this pull request all datasets have many facets.

Closes #1873


To help with the number pull requests:

@codecov
Copy link

codecov bot commented Feb 8, 2023

Codecov Report

Merging #1920 (9874227) into main (d61d200) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1920      +/-   ##
==========================================
+ Coverage   92.11%   92.12%   +0.01%     
==========================================
  Files         234      234              
  Lines       12130    12151      +21     
==========================================
+ Hits        11173    11194      +21     
  Misses        957      957              
Impacted Files Coverage Δ
esmvalcore/esgf/facets.py 100.00% <ø> (ø)
esmvalcore/esgf/_download.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@bouweandela bouweandela marked this pull request as ready for review February 9, 2023 12:48
@bouweandela bouweandela added the bug Something isn't working label Feb 9, 2023
@bouweandela bouweandela added this to the v2.8.0 milestone Feb 9, 2023
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, bud (although am no specialist on esgf searches but I do very much appreciate the amount of testing added here!) 🍺

@remi-kazeroni
Copy link
Contributor

@ESMValGroup/tech-reviewers, would one of you have time to do a final check and merge this, please? It would be great to have this merged soon as it also helps resolving issues in #1609. Thanks 👍

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @bouweandela, I have two additional minor comments on this.

Could you also please briefly summarize what the issue was and how this PR fixes that? To be honest, I don't really understand what's going on here, and you are adding a significant amount of code here. I think the linked issue (#1873) is only relevant for a small part of the changes.

esmvalcore/esgf/_download.py Show resolved Hide resolved
@bouweandela
Copy link
Member Author

bouweandela commented Feb 13, 2023

Could you also please briefly summarize what the issue was and how this PR fixes that?

@schlunma The main issue is that the facets on ESGF are often incorrect. I tried to work around this by reading them from the dataset_id (e.g. CMIP6.CMIP.NCAR.CESM2.historical.r4i1p1f1.Amon.tas.gn.v20190308) using the template dataset_id_template_ that defines it (e.g. %(mip_era)s.%(activity_drs)s.%(institution_id)s.%(source_id)s.%(experiment_id)s.%(member_id)s.%(table_id)s.%(variable_id)s.%(grid_label)s). This is the code that is currently in the main branch.

Unfortunately, this does not work in many cases because either the dataset_id_template_ is incorrect or there is a dot in the dataset name (often the case for obs4MIPs) and this is the character that separates the facets in the dataset_id. This results in ESGFFiles with empty or very few facets. The new approach is to read the facets from the json response of the ESGF query and then try to correct them by reading them from the dataset_id (assuming at least that is correct). This is the problem that prevented you from using these facets to create datasets in the recipe 589.yml.txt from this comment #1609 (comment).

If you run the code in the description of the pull request with the current main branch and then with this branch and look at the difference, you can see the improvement.

@schlunma
Copy link
Contributor

Nice, got it, thanks for the explanation @bouweandela! I tested the code from the PR description, and can confirm that the facets look much better with this PR!

Will merge after the tests have passed. Cheers!!

@schlunma schlunma merged commit dbbdbb3 into main Feb 13, 2023
@schlunma schlunma deleted the improve-esgf-facets branch February 13, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download from ESGF saves to wrong path if facet value contains dots
4 participants