[dask] Search for open ports only once per IP #3768

jameslamb · 2021-01-15T17:09:20Z

Summary

In LightGBM distributed training (documented here and here), each worker needs access to a list of all other workers' IPs + a port to communicate with them over.

#3766 updated lightgbm.dask to search for open ports on each worker when creating this list, instead of just assuming a fixed range of ports would be available.

This works well, but it's a blocking operation that has to be done sequentially, so it slows down training.

The following pseudocode illustrates the process:

worker_to_port = {}
ports_seen_so_far = set()

for worker_address in worker_addresses:
    port = client.submit(
        _find_an_open_port,
       worker_address,
       ports_seen_so_far
    )
    worker_to_port[worker_address] = port
    ports_seen_so_far.add(port)

This is done sequentially because multiple Dask worker processes can live on the same IP address. So if you use a LocalCluster with 3 workers, for example, all 3 of those workers will be on your local machine. Or if you use a multi-machine cluster with nprocs > 1, multiple worker processes will run on each physical machine in the cluster.

As a result of this change, the time complexity of that "find open ports" step is O(num_worker_processes). If instead we only did the search once per IP address, then this check could be safely parallelized, and the time complexity would be more like O(nprocs).

To close this issue, change lightgbm.dask._find_ports_for_workers() (

LightGBM/python-package/lightgbm/dask.py

Line 72 in f6d2dce

    
           def _find_ports_for_workers(client: Client, worker_addresses: Iterable[str], local_listen_port: int) -> Dict[str, int]:

) to instead dispatch a function to each worker machine that returns a list of as many open ports as there are Dask workers on that machine.

Motivation

This optimization would reduce the overhead introduced by using Dask for distributed training, which should make training faster.

References

This could be done following something like the code @ffineis provided in #3766 (comment).

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-01-15T17:11:14Z

Adding this to #2302 and closing it, per our practice for managing features. Anyone is welcome to contribute this feature! Please leave a comment here if you're interested in contributing it.

jameslamb · 2021-02-24T04:14:32Z

This was fixed by #3823

jameslamb added the dask label Jan 15, 2021

jameslamb mentioned this issue Jan 15, 2021

Feature Requests & Voting Hub #2302

Open

jameslamb added the feature request label Jan 15, 2021

jameslamb closed this as completed Jan 15, 2021

jameslamb mentioned this issue Jan 15, 2021

[dask] [python-package] Search for available ports when setting up network (fixes #3753) #3766

Merged

jmoralez mentioned this issue Jan 23, 2021

[dask] use random ports in network setup #3823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] Search for open ports only once per IP #3768

[dask] Search for open ports only once per IP #3768

jameslamb commented Jan 15, 2021

jameslamb commented Jan 15, 2021

jameslamb commented Feb 24, 2021

[dask] Search for open ports only once per IP #3768

[dask] Search for open ports only once per IP #3768

Comments

jameslamb commented Jan 15, 2021

Summary

Motivation

References

jameslamb commented Jan 15, 2021

jameslamb commented Feb 24, 2021