Provide more climate model options with NcML #294

richardarsenault · 2020-07-27T13:49:57Z

@huard To my knowledge, we only have 1 NcML dataset for climate model simulations (day_MPI-ESM-LR), which includes hist, RCP4.5 and RCP8.5. This is a good start, but the variables only contain 'pr' and 'tas'. It would be good to also add tasmax and tasmin, or to add climate models that have that data too so we can apply bias corrections on tasmax and tasmin independently in the bias-correction notebooks for Raven.

Any news on the auto-generation of ncml files for climate model data? Thanks!

huard · 2020-08-25T20:43:34Z

Not much raw model data yet, but you'll find bias corrected series (pr, tasmin, tasmax) for multiple CMIP5 models here: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip5/pcic/catalog.html

tlogan2000 · 2020-08-25T21:02:13Z

The issue with the CMIP5 raw data was when trying to combine all runs (r1i1p1, r2, r3) into a single ncml. This worked great for the test model (MPI) then fell apart on me for the other data sets.

We could potentially try alter the way we create th ncmls (i.e. one run per ncml file)? you then could create the 'ensemble' using xclim.ensembles methods ... Not quite as user friendly but could be simpler for the ncml creation I think

richardarsenault · 2020-08-27T13:59:49Z

Would it be possible to make at least 1 that has a single run, that way we can start working on notebooks that we can share with new users? Then we could figure out how to fix this bug...

Thanks!

tlogan2000 · 2020-08-27T14:05:33Z

ok. What if I try to make some 'test' ncmls (multiple CMIP5 models but separate runs) that you could explore to gauge the level of user-friendliness before officially deploying them on the pavics thredds?
I would like to avoid having a combined and separate version of the same data if possible.

richardarsenault · 2020-08-27T14:33:38Z

yes, sure, that makes sense!

tlogan2000 · 2020-08-27T14:35:55Z

I'll try to get on it in the next few days

richardarsenault · 2020-09-14T12:38:25Z

@tlogan2000 Hi, any news on this front? thanks!

tlogan2000 · 2020-09-14T12:57:48Z

@richardarsenault This has been a bit frustrating... Raw CMIP5 data has desperate need of 'cleaning' in order to make functional NcML datasets (e.g. repeated time steps, and various discrepancies making it difficult to create batches of datasets).
Basically the NcML logic I have been using is fine I think but the data itself is the problem... Honestly I think I got lucky wiht the first test set using MPI data

A possibility would be to mount a North American subsetted version of NASA's downscaled CMIP5 dataset - NEX-GDDP (.25 degrees, tasmin, tasmax, pr) instead? I mentionned to @huard as an option too. Would this be a good in-between?

From memory dataset has 22 GCMs, rcp45 & 85, r1i1p1 only though

tlogan2000 · 2020-09-14T13:01:13Z

The data is actually already available on a Nasa thredds server however I tested access and it seems a bit slow (chunking on disk probably not ideal for our needs)
https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp

richardarsenault · 2020-09-14T13:01:28Z

Thanks for the info! Yes, that would be good for now. The idea is that we would like to be able to prototype a fully functional system, and so just having one example where we can post-process tasmax, tasmin and pr for CC impact studies would be great!

tlogan2000 · 2020-09-14T13:02:51Z

In this case post-processing is already finished though? Nasa dataset already downscaled to 0.25 deg ... problematic for you?

richardarsenault · 2020-09-14T13:04:59Z

ah I see, didn't catch that. The thing is that we want to allow users to interact with the bias-correction capabilities of Xclim. Basically a user selects a hydrology model, does their thing in PAVICS-Hydro, then selects a climate model run (for now there could be just one) where we do bias-correction/downscaling with xclim, and then drive the hydrological model with that to see the impacts on the hydrograph. So if it's already post-processed, that means xclim would not be a required part of the process...

tlogan2000 · 2020-09-14T13:15:15Z

Ok. I suppose it wouldn't be impossible for you to 're-correct' the nasa data (the reference data set is certainly not what we are using) but the raw cmip5 data would likely be best. @huard @richardarsenault let me know if it's a 'go' for the NASA data otherwise I will look into creating a 'cleaned' cmip5 repository on the thredds that we can use for the NcML aggregations

huard · 2020-09-14T13:30:24Z

I suggest we focus our development efforts on CMIP6 rather than CMIP5. In that sense, using NEX-GDDP sounds like a good compromise to get a diverse model ensemble on disk rapidly, while we progressively build expertise in designing analysis-ready NcML virtual datasets.

richardarsenault · 2020-09-14T13:32:14Z

Sounds good to me!

tlogan2000 · 2020-09-14T13:32:18Z

Sounds good... ok for you @richardarsenault ?

tlogan2000 · 2020-09-14T13:34:58Z

I'll get on this then... Should be relatively similar to what we have done with other climate scenario data so I don't anticipate too many issues

Zeitsperre · 2020-09-14T13:45:47Z

@tlogan2000 I'm pretty sure we already have NEX-GDDP housed on our internal server. You can probably work off of that to rechunk if need arises?

tlogan2000 · 2020-09-14T13:48:24Z

@Zeitsperre yes we have them somewhere... I will almost definitely rechunk as am pretty sure they are currently chunked spatially ... i.e. single time step for the entire domain

tlogan2000 · 2020-10-01T12:24:04Z

nasa nex-gddp ncmls are live on thredds server

tlogan2000 · 2020-10-01T12:24:50Z

@huard @richardarsenault Let me know if there are issues / concerns

richardarsenault · 2020-10-04T20:48:19Z

@tlogan2000 OK so I was able to find the ncmls on the THREDDS server. However I am hitting a wall here to extract time subsets. I have been working on the bias_correct_notebook branch, the notebook is called Bias_correcting_climate_data.ipynb

See for example, this works so I know the dataset is beign read:

fut_data='https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/simulations/bias_adjusted/cmip5/nasa/nex-gddp-1.0/day_inmcm4_historical+rcp85_nex-gddp.ncml'

ds=xr.open_dataset(fut_data)

Next, let's subset by latitude/longitude:

lat=54.484
lon=230   
ds.sel(lat=slice(lat - 1, lat + 1),lon=slice(lon - 1, lon + 1))

This also works, I can see the data has been subset according to a smaller section around the lat/long point I provided.

Finally, add the time slice (these dates are just for testing, not actually going to use them:

ds.sel(lat=slice(lat - 1, lat + 1),lon=slice(lon - 1, lon + 1), time=slice(dt.datetime(2000,1,1), dt.datetime(2040,12,31)))

This generates an error, strangely showing a DatetimeNoLeap with 2025-07-2 12:00:00 time that is unused in the code:

TypeError: cannot compare cftime.DatetimeNoLeap(2025, 7, 2, 12, 0, 0, 0) and datetime.datetime(2000, 1, 1, 0, 0) (different calendars)

I don't know how exactly to deal with this, it worked perfectly with the other CMIP5 datasets as well as the ERA5/NRCan dataset. Could you please look into it and tell me if there's something that I'm missing?

Thanks!

tlogan2000 · 2020-10-05T12:29:23Z

Will look at your example in more detail but quick fix would be to use xclim.subset I think. start and end dates can use just the year string if entire year is desired.

from xclim import subset
lat=54.484
lon=230   

ds_sub =subset.subset_bbox(ds, lon_bnds=[lon-1, lon+1], lat_bnds=[lat-1, lat+1], start_date='2000', end_date='2040')
print(ds_sub.time)

tlogan2000 · 2020-10-05T12:34:41Z

So looking at your code the netcdf dataset.time is a cftime object versus your dt.datetime which is a numpy datetime64. xarray doesn't know what to do... Again in this case using a string date in your slice will solve the problem (xarray will adapt itself to the type of object in the dataset and is not forced to try to comapre cftime to datetime64)

ds.sel(lat=slice(lat - 1, lat + 1),lon=slice(lon - 1, lon + 1), time=slice('2000','2040'))

tlogan2000 · 2020-10-05T12:39:27Z

Note this is pretty much exactly what xclim.subset does. However xclim.subset functions can automatically detect negative versus postive longitudes and deal with it :
e.g. I can send in negative lon_bnds even though the dataset in quesiton is all positive and it still 'works'

from xclim import subset
lat=54.484
lon=-130
print(lon)
ds_sub =subset.subset_bbox(ds, lon_bnds=[lon-1, lon+1], lat_bnds=[lat-1, lat+1], start_date='2000', end_date='2040')
print(ds_sub.lon)

richardarsenault · 2020-10-05T12:47:43Z

Amazing, thanks! It works, and it also solves my longitude reference problem!

I'll continue with this and if anything else pops up I'll let you know!

tlogan2000 · 2020-10-05T12:51:19Z

Note as well if you are running on directly on pavics jupyter you can avoid a performance slowdown bug (bird-house/twitcher#97 ) by changing the url(s) to :
'http://pavics.ouranos.ca:8083/twitcher/ows/proxy/thredds/dodsC/datasets/simulations/bias_adjusted/cmip5/nasa/nex-gddp-1.0/day_inmcm4_historical+rcp85_nex-gddp.ncml'

Note the change of 'https' to 'http' and the port id '8083' ... If working on your own machine you're stuck *will work but more slowly.. Currently trying to figure out a better solution but havn't got it yet

edit: updated to say 'if you are running on pavics'

huard · 2021-01-29T15:49:26Z

@richardarsenault Can this be closed ?

richardarsenault · 2021-01-29T18:00:46Z

Not sure. I think for now we have 1 GCM only to work with? The idea here was to have at least a small set that would allow sampling GCM uncertainty, for example. Maybe there are ready though, I'd have to check. @tlogan2000 do you have any more info?

huard · 2021-01-29T20:05:20Z

I think the solution proposed was to used NASA, with many pre-downscaled models.

richardarsenault · 2021-01-29T20:21:15Z

Yes, but I think for now the only ones we are hosting that use the ncml functionality are for MPI-ESM_LR? Do we want to add any or should we point to the other thredds server, if they have ncml?

tlogan2000 · 2021-01-29T20:22:05Z

I think the solution proposed was to used NASA, with many pre-downscaled models.

Raw CMIP5 data was a bit of a nightmare for NcML creation (mssing, repeating dates) anyways quite a bit of time with little success other than the one funcitonal ncml... which I think I just got lucky with clean data on the first one.

So yes the proposed solution for now was to use the NASA-nex-gddp instead.

At this point if there is a need for some raw GCM data I almost feel like we should jump to CMIP6. But even then there is already access possible via google or amazon cloud or even ESGF THREDDS server so maybe not required to host directly on pavics...

tlogan2000 · 2021-01-29T21:03:25Z

Much more than one model using the Nasa dataset

NASA_NEX-GDDP CMIP5 runs are here (21 models 2 RCPS, daily tmin, tmax, precip)

https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip5/nasa/nex-gddp-1.0/catalog.html

richardarsenault · 2021-01-29T21:39:11Z

Well, I'm convinced! Thanks! Didn't even know this existed. It will do the job indeed.

huard · 2021-04-19T12:36:53Z

Climex now on thredds:
https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/climex/catalog.html

richardarsenault added the blocker label Jul 27, 2020

huard assigned tlogan2000 Aug 3, 2020

richardarsenault added this to the Demo HQ milestone Jan 14, 2021

huard closed this as completed Apr 19, 2021

Provide more climate model options with NcML #294

Provide more climate model options with NcML #294

Comments

richardarsenault commented Jul 27, 2020

huard commented Aug 25, 2020

tlogan2000 commented Aug 25, 2020

richardarsenault commented Aug 27, 2020

tlogan2000 commented Aug 27, 2020

richardarsenault commented Aug 27, 2020

tlogan2000 commented Aug 27, 2020

richardarsenault commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020 • edited Loading

richardarsenault commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020

richardarsenault commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020

huard commented Sep 14, 2020

richardarsenault commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020

Zeitsperre commented Sep 14, 2020

tlogan2000 commented Sep 14, 2020

tlogan2000 commented Oct 1, 2020

tlogan2000 commented Oct 1, 2020

richardarsenault commented Oct 4, 2020

tlogan2000 commented Oct 5, 2020

tlogan2000 commented Oct 5, 2020

tlogan2000 commented Oct 5, 2020 • edited Loading

richardarsenault commented Oct 5, 2020

tlogan2000 commented Oct 5, 2020 • edited Loading

huard commented Jan 29, 2021

richardarsenault commented Jan 29, 2021

huard commented Jan 29, 2021

richardarsenault commented Jan 29, 2021

tlogan2000 commented Jan 29, 2021 • edited Loading

tlogan2000 commented Jan 29, 2021

richardarsenault commented Jan 29, 2021

huard commented Apr 19, 2021

tlogan2000 commented Sep 14, 2020 •

edited

Loading

tlogan2000 commented Oct 5, 2020 •

edited

Loading

tlogan2000 commented Oct 5, 2020 •

edited

Loading

tlogan2000 commented Jan 29, 2021 •

edited

Loading