-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collapsing cubes with weights kwarg very inefficient (full realization of data, twice) #47
Comments
OK - I found something very useful: here is a little function:
if you run this on a 500MB file with weights in it will chuck in a max mem=1.3GB, if you don't apply the weights it will be blindingly fast and max mem=20MB. Note that with weights in the result cube will NOT have lazy data - @bjlittle why is it that using weights as keyword arg for |
argh! I see now - in |
OK - good resources: @ledm how important are those weights in the area average calculations? ie is it something that alters the results by a significant margin that justifies their use? |
ah nevermind! Lack of Area weights may introduce biases of avg 10degs for temperature (just tested with/without weights) |
I'm currently trying to resolve this in c3s_511: A code snippet to work around weighting memory while increasing cpu time is:
You can call the function like:
You see, I'm still debugging and trying to catch everything that kills the function. It works in this way for 1 coord (supposed to work with all usual dimensions in a cube in ESMValTool) staying lazy. You can call it twice, if latitude goes first, in my tests (this is why I need the except... does not work on scalar coordinates!). |
Plus: It does not work for std_dev, but I could not find out why, yet. |
hi @BenMGeo good stuff, man! There is an open SciTools/iris gitHub issue SciTools/iris#3129 that @bjlittle and me we talked about last week, you may want to comment on that one so maybe you and the iris guys can work together on it - we will want a solution straight into iris rather than something in esmvaltool 🍺 |
It works now for STD_DEV, but of course, the weighted standard deviation is not the standard deviation of the weighted values (doh). I can provide you with a clean function of the above for having a weighted mean, though. I do not see any other solution to this than writing actual iris aggregators with - based on the iris version we use - xarray or daskarray supported lazy functions yourself. I'll have to deep dive into the writing of custom aggregators and the packages xarray and dask to provide these aggregators to our codes. Whenever I got there, I'll have them proposed to iris. |
great stuff! 🍺 I suggest getting in touch with the iris guys via the associated SciTools issue and present them with the solution, mention @bjlittle and myslef please so we can keep track of the implementation, it's best to have it straight in iris than in esmvaltool - but yeah, cool stuff so far! |
@bouweandela good stuff - but have you tested the implementation of SciTools/iris#3299 within ESMValTool? |
we can now pass lazy weights to collapse operations eg:
and the input data and the output data will be LAZY (iris2.4) but the execution time of the collapse operation with or without weights still differ by a factor of 50 and memory increases by 2-3x |
@valeriupredoi Is this still an issue? |
I have extensively tested this in SciTools/iris#5341, so this should not be an issue anymore. Please re-open if necessary. |
two used cases from running a recipe and profiling it via debug and resource file:
I am assigning this to me since if I can't optimize it myself then I can talk to Bill next week.
The text was updated successfully, but these errors were encountered: