-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to ADS-beta #46
Conversation
It appears the GFAS NetCDF file has changed somewhat in the new API. Where previously it appeared to be NetCDF3 generated by This includes a structural change to the Original gfas-download.nc (in Panoply):
New gfas-download.nc from ADS-beta:
Note the |
The change in format from Using the legacy API, fetching data for a period from
CAMS request:
Fetched date range is correct, going from the 2023-01-01 to 2023-01-02:
Running the same command but using the new API, generates the same CDS request, but the
|
I've raised a support request about the off-by-one bug in the ADS beta: https://jira.ecmwf.int/plugins/servlet/desk/portal/1/CUS-26007 |
Well that's unsociable of them :-)
Two options I see here:
1) Switch to using the netCDF4 module completely. I think we already include it in the openmethane container since we use a mixture of xarray and netCDF4 already
2) Use one of the grib conversion tools and go with the grib version. Probably more reliable long term but more work short term. Discuss at 4?
|
bdaee18
to
1e40084
Compare
I've made some fixes that allowed me to re-enable the The fact that it passes despite the dates in the fetching GFAS data being wrong might be an issue. The fact that the e2e In the short term, I believe we should wait to merge this until we hear back from the ADS team about the off-by-one issue. If we must merge this to address the legacy API deprecation on Sept 26, then we may want to shift our ADS API request by one day, and then assert on the dates that we get back so that we can be alerted when/if their bug is fixed. |
downloadPath = pathlib.Path(file_name); | ||
downloadPath.parent.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cdsapi
client is not smart enough to create the containing path before attempting to download. This fixes an issue when running the test_gfas_emis test or running omGFASEmis.py
standalone in a path with no intermediates
folder.
This folder prep might be better handled in the config but I wanted to touch as little as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats fine. This is a noop if the directory already exists
@@ -84,7 +89,8 @@ def processEmissions(config: PriorConfig, startDate, endDate, forceUpdate: bool | |||
gfas_file = download_GFAS( | |||
startDate, endDate, file_name=config.as_intermediate_file("gfas-download.nc") | |||
) | |||
ncin = nc.Dataset(gfas_file, "r", format="NETCDF3") | |||
ncin = nc.Dataset(gfas_file, "r", format="NETCDF4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new dataset format is NetCDFv4.
dates.append(startDate + datetime.timedelta(days=i)) | ||
dates.append(gfasTimes[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change could be problematic. The previous version iterated based on the provided startDate
which would presumably align with the startDate
supplied to other layers. However this didn't appear to work even with the legacy ADS API when running in test scenarios.
This change uses the dates of the data actually returned by the cdsapi call, but there's a risk that these dates are out of alignment from the datasets used by every other layer, if there are bugs in the cdsapi (which I think there are).
Rersponse from Kevin Marsh @ ECMWF Support: I think these variables are accumulated over 24 hours, so looking at the original (GRIB) data for your request : % grib_ls 32e348309ddfe25309b35ad2f8327b13.grib 32e348309ddfe25309b35ad2f8327b13.grib edition centre typeOfLevel level dataDate stepRange dataType shortName packingType gridType 1 ecmf surface 0 20230101 0-24 ga ch4fire grid_simple regular_ll 1 ecmf surface 0 20230102 0-24 ga ch4fire grid_simple regular_ll 2 of 2 messages in 32e348309ddfe25309b35ad2f8327b13.grib So the valid time' is set to the end of the 24 hour period (i.e. 0000 on the following day) 2 of 2 total messages in 1 files % grib_ls -p validityDate,validityTime 32e348309ddfe25309b35ad2f8327b13.grib 32e348309ddfe25309b35ad2f8327b13.grib validityDate validityTime 20230102 0 20230103 0 2 of 2 messages in 32e348309ddfe25309b35ad2f8327b13.grib And this is how the data are encoded in the netCDF file: 2 of 2 total messages in 1 files % cdo info 7146ff2cddb82ae5c09a7705b64cd2cd.nc
cdo info: Processed 12960000 values from 1 variable over 2 timesteps [0.15s 83MB]. So the time (GMT) is set to the end of the 24 hour period (rather than the start/middle) |
So it appears there isn't a bug per-se, we're getting the data we've requested, it's just a change in behaviour from the previous system which encodes the end of the time period in Knowing this, do we need to time-shift the returned dates back by one so they match the rest of our data? |
not a bug perhaps but pretty misleading labelling. Yes we do need to
shift the dates back by a day since the emissions for Jan 1 (which we
divide evenly into the 24 hours for that day) are listed as Jan 2
|
I'm not seeing this one, already approved?
|
I haven't completed the change to shift the dates back by one day so they match the dates requested. Then I think we need to action similar PRs in the other repos that use ADS/CDS data and do some checking of the return dates on those as well to verify the behaviour. We also need to think about how we'll merge the changes. If we want to update the API keys in Github, we'll need to coordinate merges to all the repos at the same time, and then merge |
@prayner and I paired to update this branch and shift the dates from GFAS to match the requested dates. All tests are passing locally, they're only failing in CI because the secret still contains the legacy key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I see you just reran the tests so if they pass then we should be fine
Description
Resolves #19 .
The Copernicus CDS and ADS systems are being migrated to a new architecture, and all clients must update to use the new "ADS beta" and "CDS beta" before the legacy system is decomissioned on 26/09/2024.
This change in the code is accompanied by a new account being created, and the
CDSAPI_ADS_KEY
Organisation secret in Github Actions being replaced with new credentials.Checklist
Please confirm that this pull request has done the following:
Notes