-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask options for Iris processing #2511
Conversation
Iris is running single-threaded using `dask.async.get_sync`. This default | ||
ensures that running Iris under "normal" conditions will not use up all | ||
available computational resource. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could usefully explain what dask does by default, that we don't want.
E.G. "Otherwise, by default, dask.array will use a threaded scheduler and grab all available CPUs."
set_options = 'dask.set_options' | ||
self.patch_set_options = self.patch(set_options) | ||
get_sync = 'dask.async.get_sync' | ||
self.patch_get_sync = self.patch(get_sync) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need to patch this at all.
It's just a value, defined by dask, we can just use it anyway.
Then instead of assert_called_once_with(get=self.patch_get_sync)
we would have
assert_called_once_with(get=dask.async.get_sync)
|
||
def test_no_user_options(self): | ||
test_dict = {} | ||
with self.patch(self.context, _globals=test_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to use "with" with the IrisTest.patch() method.
The point of it is to automatically remove itself after each test, so you avoid lots of extra indentation ...
Actually I think that the tests can be much simplified. |
Thanks @dkillick |
Thanks for all your input @pp-mo! 🎉 |
|
||
|
||
# Run this at import time to set dask options for Iris. | ||
_iris_dask_defaults() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkillick and @pp-mo I'm assuming that the user can simply override this default behaviour by simply using dask.set_options()
themselves in their own script ? ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! That's the idea. You could do something like the following...
from multiprocessing.pool import ThreadPool
import dask
import iris # This will set default options for dask processing in Iris.
dask.set_options(pool=ThreadPool(5)) # Now the user can apply case-specific options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. As long as the user can override the default.
I love the async
default. No surprises or wiggle room for odd behaviour 👍
Also, and most importantly for me, if the user does override the default, then they are taking explicit accountability for their actions ... 💥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @dkillick and @pp-mo 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I'll make sure examples like the above are included in the user documentation when we come around to writing it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkillick If it helps, add a note in the Dask Documentation project, so that we don't forget 👍
Dask opts controlled by simple function
A much simplified replacement for #2462 and thus also a fix for #2403.
Instead of having a highly complex Iris option class that places an Iris API around the "non-API" of
dask.set_options
and potentially tying ourselves into a very tight coupling todask.set_options
, this PR just callsdask.set_options
from withiniris._lazy_data
at import time. The call todask.set_options
is conditional on the user having not already specified dask options so that importing Iris does not change dask options that are not the default.The only dask option that may be set by Iris is setting the
get
function to the synchronous schedulerdask.async.get_sync
. This scheduler only runs on a single thread, which matches to the requirement suggested in #2403. In testing this scheduler was shown to be faster than using the conceptually equivalent dask options{'get': dask.threaded.get, 'pool': ThreadPool(1)}
, and simpler to construct/maintain.