Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more climate model options with NcML #294

Closed
richardarsenault opened this issue Jul 27, 2020 · 35 comments
Closed

Provide more climate model options with NcML #294

richardarsenault opened this issue Jul 27, 2020 · 35 comments
Assignees
Labels
Milestone

Comments

@richardarsenault
Copy link
Contributor

@huard To my knowledge, we only have 1 NcML dataset for climate model simulations (day_MPI-ESM-LR), which includes hist, RCP4.5 and RCP8.5. This is a good start, but the variables only contain 'pr' and 'tas'. It would be good to also add tasmax and tasmin, or to add climate models that have that data too so we can apply bias corrections on tasmax and tasmin independently in the bias-correction notebooks for Raven.

Any news on the auto-generation of ncml files for climate model data? Thanks!

@huard
Copy link
Contributor

huard commented Aug 25, 2020

Not much raw model data yet, but you'll find bias corrected series (pr, tasmin, tasmax) for multiple CMIP5 models here: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip5/pcic/catalog.html

@tlogan2000
Copy link

The issue with the CMIP5 raw data was when trying to combine all runs (r1i1p1, r2, r3) into a single ncml. This worked great for the test model (MPI) then fell apart on me for the other data sets.

We could potentially try alter the way we create th ncmls (i.e. one run per ncml file)? you then could create the 'ensemble' using xclim.ensembles methods ... Not quite as user friendly but could be simpler for the ncml creation I think

@richardarsenault
Copy link
Contributor Author

Would it be possible to make at least 1 that has a single run, that way we can start working on notebooks that we can share with new users? Then we could figure out how to fix this bug...

Thanks!

@tlogan2000
Copy link

ok. What if I try to make some 'test' ncmls (multiple CMIP5 models but separate runs) that you could explore to gauge the level of user-friendliness before officially deploying them on the pavics thredds?
I would like to avoid having a combined and separate version of the same data if possible.

@richardarsenault
Copy link
Contributor Author

yes, sure, that makes sense!

@tlogan2000
Copy link

I'll try to get on it in the next few days

@richardarsenault
Copy link
Contributor Author

@tlogan2000 Hi, any news on this front? thanks!

@tlogan2000
Copy link

@richardarsenault This has been a bit frustrating... Raw CMIP5 data has desperate need of 'cleaning' in order to make functional NcML datasets (e.g. repeated time steps, and various discrepancies making it difficult to create batches of datasets).
Basically the NcML logic I have been using is fine I think but the data itself is the problem... Honestly I think I got lucky wiht the first test set using MPI data

A possibility would be to mount a North American subsetted version of NASA's downscaled CMIP5 dataset - NEX-GDDP (.25 degrees, tasmin, tasmax, pr) instead? I mentionned to @huard as an option too. Would this be a good in-between?

From memory dataset has 22 GCMs, rcp45 & 85, r1i1p1 only though

@tlogan2000
Copy link

tlogan2000 commented Sep 14, 2020

The data is actually already available on a Nasa thredds server however I tested access and it seems a bit slow (chunking on disk probably not ideal for our needs)
https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp

@richardarsenault
Copy link
Contributor Author

Thanks for the info! Yes, that would be good for now. The idea is that we would like to be able to prototype a fully functional system, and so just having one example where we can post-process tasmax, tasmin and pr for CC impact studies would be great!

@tlogan2000
Copy link

In this case post-processing is already finished though? Nasa dataset already downscaled to 0.25 deg ... problematic for you?

@richardarsenault
Copy link
Contributor Author

ah I see, didn't catch that. The thing is that we want to allow users to interact with the bias-correction capabilities of Xclim. Basically a user selects a hydrology model, does their thing in PAVICS-Hydro, then selects a climate model run (for now there could be just one) where we do bias-correction/downscaling with xclim, and then drive the hydrological model with that to see the impacts on the hydrograph. So if it's already post-processed, that means xclim would not be a required part of the process...

@tlogan2000
Copy link

Ok. I suppose it wouldn't be impossible for you to 're-correct' the nasa data (the reference data set is certainly not what we are using) but the raw cmip5 data would likely be best. @huard @richardarsenault let me know if it's a 'go' for the NASA data otherwise I will look into creating a 'cleaned' cmip5 repository on the thredds that we can use for the NcML aggregations

@huard
Copy link
Contributor

huard commented Sep 14, 2020

I suggest we focus our development efforts on CMIP6 rather than CMIP5. In that sense, using NEX-GDDP sounds like a good compromise to get a diverse model ensemble on disk rapidly, while we progressively build expertise in designing analysis-ready NcML virtual datasets.

@richardarsenault
Copy link
Contributor Author

Sounds good to me!

@tlogan2000
Copy link

Sounds good... ok for you @richardarsenault ?

@tlogan2000
Copy link

I'll get on this then... Should be relatively similar to what we have done with other climate scenario data so I don't anticipate too many issues

@Zeitsperre
Copy link
Contributor

@tlogan2000 I'm pretty sure we already have NEX-GDDP housed on our internal server. You can probably work off of that to rechunk if need arises?

@tlogan2000
Copy link

@Zeitsperre yes we have them somewhere... I will almost definitely rechunk as am pretty sure they are currently chunked spatially ... i.e. single time step for the entire domain

@tlogan2000
Copy link

nasa nex-gddp ncmls are live on thredds server

@tlogan2000
Copy link

@huard @richardarsenault Let me know if there are issues / concerns

@richardarsenault
Copy link
Contributor Author

@tlogan2000 OK so I was able to find the ncmls on the THREDDS server. However I am hitting a wall here to extract time subsets. I have been working on the bias_correct_notebook branch, the notebook is called Bias_correcting_climate_data.ipynb

See for example, this works so I know the dataset is beign read:

fut_data='https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/simulations/bias_adjusted/cmip5/nasa/nex-gddp-1.0/day_inmcm4_historical+rcp85_nex-gddp.ncml'

ds=xr.open_dataset(fut_data)

Next, let's subset by latitude/longitude:

lat=54.484
lon=230   
ds.sel(lat=slice(lat - 1, lat + 1),lon=slice(lon - 1, lon + 1))

This also works, I can see the data has been subset according to a smaller section around the lat/long point I provided.

Finally, add the time slice (these dates are just for testing, not actually going to use them:

ds.sel(lat=slice(lat - 1, lat + 1),lon=slice(lon - 1, lon + 1), time=slice(dt.datetime(2000,1,1), dt.datetime(2040,12,31)))

This generates an error, strangely showing a DatetimeNoLeap with 2025-07-2 12:00:00 time that is unused in the code:

TypeError: cannot compare cftime.DatetimeNoLeap(2025, 7, 2, 12, 0, 0, 0) and datetime.datetime(2000, 1, 1, 0, 0) (different calendars)

I don't know how exactly to deal with this, it worked perfectly with the other CMIP5 datasets as well as the ERA5/NRCan dataset. Could you please look into it and tell me if there's something that I'm missing?

Thanks!

@tlogan2000
Copy link

Will look at your example in more detail but quick fix would be to use xclim.subset I think. start and end dates can use just the year string if entire year is desired.

from xclim import subset
lat=54.484
lon=230   

ds_sub =subset.subset_bbox(ds, lon_bnds=[lon-1, lon+1], lat_bnds=[lat-1, lat+1], start_date='2000', end_date='2040')
print(ds_sub.time)

@tlogan2000
Copy link

So looking at your code the netcdf dataset.time is a cftime object versus your dt.datetime which is a numpy datetime64. xarray doesn't know what to do... Again in this case using a string date in your slice will solve the problem (xarray will adapt itself to the type of object in the dataset and is not forced to try to comapre cftime to datetime64)

ds.sel(lat=slice(lat - 1, lat + 1),lon=slice(lon - 1, lon + 1), time=slice('2000','2040'))

@tlogan2000
Copy link

tlogan2000 commented Oct 5, 2020

Note this is pretty much exactly what xclim.subset does. However xclim.subset functions can automatically detect negative versus postive longitudes and deal with it :
e.g. I can send in negative lon_bnds even though the dataset in quesiton is all positive and it still 'works'

from xclim import subset
lat=54.484
lon=-130
print(lon)
ds_sub =subset.subset_bbox(ds, lon_bnds=[lon-1, lon+1], lat_bnds=[lat-1, lat+1], start_date='2000', end_date='2040')
print(ds_sub.lon)

@richardarsenault
Copy link
Contributor Author

Amazing, thanks! It works, and it also solves my longitude reference problem!

I'll continue with this and if anything else pops up I'll let you know!

@tlogan2000
Copy link

tlogan2000 commented Oct 5, 2020

Note as well if you are running on directly on pavics jupyter you can avoid a performance slowdown bug (bird-house/twitcher#97 ) by changing the url(s) to :
'http://pavics.ouranos.ca:8083/twitcher/ows/proxy/thredds/dodsC/datasets/simulations/bias_adjusted/cmip5/nasa/nex-gddp-1.0/day_inmcm4_historical+rcp85_nex-gddp.ncml'

Note the change of 'https' to 'http' and the port id '8083' ... If working on your own machine you're stuck *will work but more slowly.. Currently trying to figure out a better solution but havn't got it yet

edit: updated to say 'if you are running on pavics'

@richardarsenault richardarsenault added this to the Demo HQ milestone Jan 14, 2021
@huard
Copy link
Contributor

huard commented Jan 29, 2021

@richardarsenault Can this be closed ?

@richardarsenault
Copy link
Contributor Author

Not sure. I think for now we have 1 GCM only to work with? The idea here was to have at least a small set that would allow sampling GCM uncertainty, for example. Maybe there are ready though, I'd have to check. @tlogan2000 do you have any more info?

@huard
Copy link
Contributor

huard commented Jan 29, 2021

I think the solution proposed was to used NASA, with many pre-downscaled models.

@richardarsenault
Copy link
Contributor Author

Yes, but I think for now the only ones we are hosting that use the ncml functionality are for MPI-ESM_LR? Do we want to add any or should we point to the other thredds server, if they have ncml?

@tlogan2000
Copy link

tlogan2000 commented Jan 29, 2021

I think the solution proposed was to used NASA, with many pre-downscaled models.

Raw CMIP5 data was a bit of a nightmare for NcML creation (mssing, repeating dates) anyways quite a bit of time with little success other than the one funcitonal ncml... which I think I just got lucky with clean data on the first one.

So yes the proposed solution for now was to use the NASA-nex-gddp instead.

At this point if there is a need for some raw GCM data I almost feel like we should jump to CMIP6. But even then there is already access possible via google or amazon cloud or even ESGF THREDDS server so maybe not required to host directly on pavics...

@tlogan2000
Copy link

Much more than one model using the Nasa dataset

NASA_NEX-GDDP CMIP5 runs are here (21 models 2 RCPS, daily tmin, tmax, precip)

https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip5/nasa/nex-gddp-1.0/catalog.html

@richardarsenault
Copy link
Contributor Author

Well, I'm convinced! Thanks! Didn't even know this existed. It will do the job indeed.

@huard
Copy link
Contributor

huard commented Apr 19, 2021

@huard huard closed this as completed Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants