-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide more climate model options with NcML #294
Comments
Not much raw model data yet, but you'll find bias corrected series (pr, tasmin, tasmax) for multiple CMIP5 models here: https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip5/pcic/catalog.html |
The issue with the CMIP5 raw data was when trying to combine all runs (r1i1p1, r2, r3) into a single ncml. This worked great for the test model (MPI) then fell apart on me for the other data sets. We could potentially try alter the way we create th ncmls (i.e. one run per ncml file)? you then could create the 'ensemble' using xclim.ensembles methods ... Not quite as user friendly but could be simpler for the ncml creation I think |
Would it be possible to make at least 1 that has a single run, that way we can start working on notebooks that we can share with new users? Then we could figure out how to fix this bug... Thanks! |
ok. What if I try to make some 'test' ncmls (multiple CMIP5 models but separate runs) that you could explore to gauge the level of user-friendliness before officially deploying them on the pavics thredds? |
yes, sure, that makes sense! |
I'll try to get on it in the next few days |
@tlogan2000 Hi, any news on this front? thanks! |
@richardarsenault This has been a bit frustrating... Raw CMIP5 data has desperate need of 'cleaning' in order to make functional NcML datasets (e.g. repeated time steps, and various discrepancies making it difficult to create batches of datasets). A possibility would be to mount a North American subsetted version of NASA's downscaled CMIP5 dataset - NEX-GDDP (.25 degrees, tasmin, tasmax, pr) instead? I mentionned to @huard as an option too. Would this be a good in-between? From memory dataset has 22 GCMs, rcp45 & 85, r1i1p1 only though |
The data is actually already available on a Nasa thredds server however I tested access and it seems a bit slow (chunking on disk probably not ideal for our needs) |
Thanks for the info! Yes, that would be good for now. The idea is that we would like to be able to prototype a fully functional system, and so just having one example where we can post-process tasmax, tasmin and pr for CC impact studies would be great! |
In this case post-processing is already finished though? Nasa dataset already downscaled to 0.25 deg ... problematic for you? |
ah I see, didn't catch that. The thing is that we want to allow users to interact with the bias-correction capabilities of Xclim. Basically a user selects a hydrology model, does their thing in PAVICS-Hydro, then selects a climate model run (for now there could be just one) where we do bias-correction/downscaling with xclim, and then drive the hydrological model with that to see the impacts on the hydrograph. So if it's already post-processed, that means xclim would not be a required part of the process... |
Ok. I suppose it wouldn't be impossible for you to 're-correct' the nasa data (the reference data set is certainly not what we are using) but the raw cmip5 data would likely be best. @huard @richardarsenault let me know if it's a 'go' for the NASA data otherwise I will look into creating a 'cleaned' cmip5 repository on the thredds that we can use for the NcML aggregations |
I suggest we focus our development efforts on CMIP6 rather than CMIP5. In that sense, using NEX-GDDP sounds like a good compromise to get a diverse model ensemble on disk rapidly, while we progressively build expertise in designing analysis-ready NcML virtual datasets. |
Sounds good to me! |
Sounds good... ok for you @richardarsenault ? |
I'll get on this then... Should be relatively similar to what we have done with other climate scenario data so I don't anticipate too many issues |
@tlogan2000 I'm pretty sure we already have NEX-GDDP housed on our internal server. You can probably work off of that to rechunk if need arises? |
@Zeitsperre yes we have them somewhere... I will almost definitely rechunk as am pretty sure they are currently chunked spatially ... i.e. single time step for the entire domain |
nasa nex-gddp ncmls are live on thredds server |
@huard @richardarsenault Let me know if there are issues / concerns |
@tlogan2000 OK so I was able to find the ncmls on the THREDDS server. However I am hitting a wall here to extract time subsets. I have been working on the bias_correct_notebook branch, the notebook is called Bias_correcting_climate_data.ipynb See for example, this works so I know the dataset is beign read:
Next, let's subset by latitude/longitude:
This also works, I can see the data has been subset according to a smaller section around the lat/long point I provided. Finally, add the time slice (these dates are just for testing, not actually going to use them:
This generates an error, strangely showing a DatetimeNoLeap with 2025-07-2 12:00:00 time that is unused in the code:
I don't know how exactly to deal with this, it worked perfectly with the other CMIP5 datasets as well as the ERA5/NRCan dataset. Could you please look into it and tell me if there's something that I'm missing? Thanks! |
Will look at your example in more detail but quick fix would be to use xclim.subset I think. start and end dates can use just the year string if entire year is desired.
|
So looking at your code the netcdf
|
Note this is pretty much exactly what
|
Amazing, thanks! It works, and it also solves my longitude reference problem! I'll continue with this and if anything else pops up I'll let you know! |
Note as well if you are running on directly on pavics jupyter you can avoid a performance slowdown bug (bird-house/twitcher#97 ) by changing the url(s) to : Note the change of 'https' to 'http' and the port id '8083' ... If working on your own machine you're stuck *will work but more slowly.. Currently trying to figure out a better solution but havn't got it yet edit: updated to say 'if you are running on pavics' |
@richardarsenault Can this be closed ? |
Not sure. I think for now we have 1 GCM only to work with? The idea here was to have at least a small set that would allow sampling GCM uncertainty, for example. Maybe there are ready though, I'd have to check. @tlogan2000 do you have any more info? |
I think the solution proposed was to used NASA, with many pre-downscaled models. |
Yes, but I think for now the only ones we are hosting that use the ncml functionality are for MPI-ESM_LR? Do we want to add any or should we point to the other thredds server, if they have ncml? |
Raw CMIP5 data was a bit of a nightmare for NcML creation (mssing, repeating dates) anyways quite a bit of time with little success other than the one funcitonal ncml... which I think I just got lucky with clean data on the first one. So yes the proposed solution for now was to use the NASA-nex-gddp instead. At this point if there is a need for some raw GCM data I almost feel like we should jump to CMIP6. But even then there is already access possible via google or amazon cloud or even ESGF THREDDS server so maybe not required to host directly on pavics... |
Much more than one model using the Nasa dataset NASA_NEX-GDDP CMIP5 runs are here (21 models 2 RCPS, daily tmin, tmax, precip) |
Well, I'm convinced! Thanks! Didn't even know this existed. It will do the job indeed. |
@huard To my knowledge, we only have 1 NcML dataset for climate model simulations (day_MPI-ESM-LR), which includes hist, RCP4.5 and RCP8.5. This is a good start, but the variables only contain 'pr' and 'tas'. It would be good to also add tasmax and tasmin, or to add climate models that have that data too so we can apply bias corrections on tasmax and tasmin independently in the bias-correction notebooks for Raven.
Any news on the auto-generation of ncml files for climate model data? Thanks!
The text was updated successfully, but these errors were encountered: