-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not realize data and use dask in derivation functions #42
Comments
introduction to lazy data: https://scitools.org.uk/iris/docs/latest/userguide/real_and_lazy_data.html If you need array functions to do things, use |
See here for an example on multiplying a cube with a number without realizing data: And here for an example on using dask arrays: |
just want to say, I ❤️ this thread 😁 |
Great! Then it's yours 👍 |
yay, more crap for me! |
|
@zklaus would you be too angry with me if I asked you to look at this? I have a metric ton of crap that I need to take care of and I feel I am going to sideline this - plus you are working closely with the iris stuff anyways. Beer from me when we next meet 🍺 |
Similar to this discussion, is there a way to switch off writing the derived variables to disk? It seems to slow everything down and shouldn't be necessary. |
You probably mean the input variables needed to derive a variable? In that case the answer is no. |
Is it not possible to load the cubes into dask arrays, instead of saving them? In the case of the derivation of OHC, it loads a 4D variable and saves it exactly as it is. It basically copies 20GB of data into the working directory for each dataset before doing any calculations! All I want is a scalar field, it should be a few kb! Furthermore, we only have 100GB space in our home directories on jasmin, this means that there's only space for a few models using this method. (I will move my working directory somewhere with more space, but this still doesn't seem like a great method!) |
Maybe in the future, but not at the moment. Do you feel like implementing this yourself?
The Jasmin user guide recommends using a group workspace for storing large amounts of data: https://help.jasmin.ac.uk/article/176-storage, not your home directory. I started on pull request #265. which will make it possible to store preprocessor and other temporary data on a special temporary file system, but this is not ready yet. |
Just a comment that gtfgco2 may still be needed. I've commented on the merged PR here #418 (comment) but happy to continue the discussion here if needed. |
Up-to-date overview and discussion in #2451. |
As reported by @bouweandela, many derivation functions still realize the data and use
numpy
instead ofdask
. This is detrimental for the performance and should be changed.Affected variables:
amoc
gtfgco2
should not be needed anymore, there is a preprocessor function for thissm
toz
The text was updated successfully, but these errors were encountered: