Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask scheduler num_workers #2403

Closed
bjlittle opened this issue Feb 22, 2017 · 6 comments
Closed

dask scheduler num_workers #2403

bjlittle opened this issue Feb 22, 2017 · 6 comments
Milestone

Comments

@bjlittle
Copy link
Member

See here for context.

We need to consider how we easily control the number of threads, processes or workers that the dask scheduler uses - particularly for users targeting a shared resource such as a server or cluster.

Ping @marqh

@bjlittle bjlittle added the dask label Feb 22, 2017
@bjlittle bjlittle added this to the dask milestone Feb 22, 2017
@AlexHilson
Copy link
Contributor

Fwiw if a Distributed scheduler is defined then that will be used by default, so the current implementation will work with either the threaded / distributed scheduler depending on what the user has done.

Defining a way to pass arguments into the scheduler is definitely useful. I feel like I would want the default settings to use 100% of available resources and let dask / my cpu worry about the consequences, but maybe that's naive.

@AlexHilson
Copy link
Contributor

I briefly discussed this with @marqh, and he mentioned some of the multi-user systems where Iris is deployed and used.

I agree that we should test how dask behaves on these systems before going 'live', but I still believe that if you're running a multi-user system it's your responsibility to ensure that your users cpu access is managed. Making our code slower by default seems wrong to me, but I appreciate that we may need to be pragmatic...

@DPeterK
Copy link
Member

DPeterK commented Mar 24, 2017

My feeling about this is that Iris should run single thread/process by default, with users needing to opt in if they wish to use the multiprocess goodness that dask offers. This needn't be the default long-term, but I think it is the most appropriate introductory approach.

I have a couple of reasons to back up this assertion:

  • many Iris users may not be used to parallel processing and may be put out if an innocent processing command given to Iris uses up all the available compute resource and physical memory on their machine, causing it to crash.
  • Python multiprocessing seems to disregard the amount of resource offered by slurm, so if we open up unrestricted multiprocess capability (both in terms of being on by default in Iris and in terms of not heeding slurm limits) then we may enter a realm where Iris is regularly taking out scalable compute resource through numerous, concurrent, resource unrestricted uses of Iris. This would look bad.

Of course once the two concerns above are resolved then we can reconsider the default multiprocess behaviour in Iris.

@DPeterK
Copy link
Member

DPeterK commented Mar 24, 2017

In #2457 @bjlittle suggested the pattern of having iris.options. I wonder if we could make use of that here, both for setting the number of workers and setting the scheduler. Consider:

with iris.options.parallel(num_workers=6, scheduler='multiprocessing'):
    iris.load('my_dataset.nc')

Or

iris.options.parallel(num_workers=6, scheduler='multiprocessing')
iris.load('my_dataset.nc')

We can get allowed values for scheduler from the options available in dask. We could even set this to point to the IP address and port of a running distributed scheduler and use that in all following user code:

iris.options.parallel(scheduler='192.168.0.219:8786')

In _lazy_data we could interface with these options as follows:

def as_concrete_data(array):
    if is_lazy_data(array):
        num_workers = iris.options.parallel.get('num_workers')
        scheduler = iris.options.parallel.get('scheduler')
        result = array.compute(num_workers=num_workers, get=scheduler.get)
        ...

Alternatively, we could use dask.set_options to apply these options and call this in iris.options to globally set the state of dask for the lifetime of this session.

Thoughts please people!

@marqh
Copy link
Member

marqh commented Apr 7, 2017

My feeling about this is that Iris should run single thread/process by default, with users needing to opt in if they wish to use the multiprocess goodness that dask offers. This needn't be the default long-term, but I think it is the most appropriate introductory approach.

I think this represents a reasonable approach

I would advocate a '1' being explicitly set somewhere and I would like to make time to investigate the potential implications of selecting a different, suitably small number, such as 'Three' (it's a magic number ;)

as this might provide some neat benefit with limited risk

Alternatively, we could use dask.set_options to apply these options and call this in iris.options to globally set the state of dask for the lifetime of this session.

I don't know how clear this would be

As part of the documentation for the release, I think we must provide a page on dask. As part of this, we should explore scenarios where I want to reconfigure dask in a certain way. A couple come to mind:

  • I want to have dask use all available resources on this host
  • I want to have dask use most of the available resources on this host, but it is my laptop, so leave enough to keep my browser session and chat windows working smoothly

So, all in favour, keen to contribute to the thought process and implementation

@DPeterK
Copy link
Member

DPeterK commented Apr 10, 2017

such as 'Three' (it's a magic number ;)

Unbelievable 🎵

Alternatively, we could use dask.set_options

I don't know how clear this would be

The methodology preferred by dask to set runtime options is dask.set_options, so I think using it is worthwhile. It is the approach I ended up following in #2462, which does hide calling dask.set_options behind an Iris API, but I'm not convinced that's a bad thing, especially if we document what Iris does and how users can also make use of this. I also like the idea of the documented parallel-processing examples 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants