Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark WorkerState._ensure_communicating #50

Merged
merged 3 commits into from
Jun 27, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Jun 24, 2022

Partially closes dask/distributed#6497

This line in https://asv.readthedocs.io/en/stable/writing_benchmarks.html:

The setup method is run multiple times, for each benchmark and for each repeat.

is false - the setup is not rerun for each repeat. So there's no way to run without the --quick flag, short of implementing pickling/unpickling in the test itself. Upstream issue to follow.

Highlights

Benchmark highlights from my computer (with an additional 1k workers, 10k tasks parameter which I removed in the PR out of mercy for the CI hosts), of 88e1fe08086f41e771aa6317ec4de7a2a356b278 vs. c82bba52070093e4bf3ffe7da36dbfdd18974d81, which adds

Note: the drastic speedup in some benchmarks from 49 to 50 workers is because total_out_connections=50.

Create 10k tasks and transition them to fetch

workers main new notes
5 90ms 78ms We halved the calls to HeapSet.add()
49 102ms 89ms idem
50 102ms 88ms idem
1000 367ms 341ms idem

Call _ensure_communicating from idle, while there are 10k tasks in fetch state

workers main new notes
5 29ms 1ms 😎
49 41ms 1ms 😎
50 2ms 3ms
1000 20ms 46ms Here we're paying for the new heapify() on 1k workers

Call _ensure_communicating when all workers are already in flight (10k tasks)

workers main new notes
5 29ms >10us 😎
49 30ms >10us 😎
50 >10us >10us
1000 >10us >10us

Call _ensure_communicating when all but one workers are in flight (10k tasks)

workers main new notes
5 29ms 1ms 😎
49 30ms 1ms 😎
50 1ms 1ms
1000 4ms 5ms

Note that this last benchmark indicates that further work to avoid the O(n) iteration on the busy workers would be superfluous.

Raw output

(asv) crusaderky@melody:~/github/dask-benchmarks/distributed$ asv run
· Fetching recent changes
· Creating environments
· Discovering benchmarks
· Running 14 total benchmarks (2 commits * 1 environments * 7 benchmarks)
[  0.00%] · For distributed commit b834918a <WSMR/pickle>:
[  0.00%] ·· Benchmarking conda-py3.8-cloudpickle-distributed-lz4-numpy-pandas-partd-s3fs-toolz
[  3.57%] ··· Running (worker_state_machine.EnsureCommunicatingFromIdle.time_from_idle--)..
[ 10.71%] ··· Running (worker_state_machine.EnsureCommunicatingNoop.time_noop--)..
[ 17.86%] ··· Running (worker_state_machine.EnsureCommunicatingOneWorker.time_unpickle--)..
[ 25.00%] ··· Running (worker_state_machine.PopulateDataNeeded.time_unpickle--).
[ 28.57%] ··· worker_state_machine.EnsureCommunicatingFromIdle.time_from_idle                                                                                                                                                   ok
[ 28.57%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5         738±1μs     6.16±0.02ms   35.8±0.3ms   77.6±0.3ms 
                   49      1.44±0.02ms   7.66±0.02ms   40.4±0.9ms   87.8±0.8ms 
                   50      1.46±0.02ms    5.90±0.1ms   27.9±0.3ms   58.8±0.9ms 
                  250      1.64±0.01ms   10.1±0.04ms   45.1±0.3ms   88.1±0.7ms 
                  1000      4.94±0.3ms    28.9±0.1ms   112±0.4ms     204±2ms   
              =========== ============= ============= ============ ============

[ 32.14%] ··· worker_state_machine.EnsureCommunicatingFromIdle.time_unpickle                                                                                                                                                    ok
[ 32.14%] ··· =========== ========== ============= ============= ============
              --                                n_tasks                      
              ----------- ---------------------------------------------------
               n_workers     100          1000          5000        10000    
              =========== ========== ============= ============= ============
                   5       364±5μs    3.48±0.03ms    23.3±0.3ms   48.6±0.6ms 
                   49      489±1μs    4.06±0.04ms    26.1±0.3ms   56.4±0.2ms 
                   50      473±5μs    4.00±0.03ms    26.2±0.2ms   56.9±0.2ms 
                  250      735±20μs   6.33±0.04ms   40.2±0.05ms   81.4±0.2ms 
                  1000     2.26±0ms    16.2±0.4ms    92.5±0.1ms   184±0.4ms  
              =========== ========== ============= ============= ============

[ 35.71%] ··· worker_state_machine.EnsureCommunicatingNoop.time_noop                                                                                                                                                            ok
[ 35.71%] ··· =========== ============= ============= ============= =============
              --                                  n_tasks                        
              ----------- -------------------------------------------------------
               n_workers       100           1000          5000         10000    
              =========== ============= ============= ============= =============
                   5         186±3μs     2.37±0.05ms    12.4±0.1ms    27.5±0.5ms 
                   49        20.1±2μs    1.98±0.01ms   12.3±0.06ms    28.6±0.1ms 
                   50        15.6±3μs    4.34±0.06μs   4.35±0.04μs   4.30±0.09μs 
                  250      4.31±0.05μs   4.34±0.04μs   4.33±0.04μs   4.35±0.05μs 
                  1000     4.26±0.02μs   4.33±0.02μs   4.21±0.06μs   4.27±0.06μs 
              =========== ============= ============= ============= =============

[ 39.29%] ··· worker_state_machine.EnsureCommunicatingOneWorker.time_one_worker                                                                                                                                                 ok
[ 39.29%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5         598±2μs     5.96±0.04ms   36.1±0.3ms    78.5±1ms  
                   49        456±3μs     6.00±0.02ms   39.1±0.2ms   85.5±0.5ms 
                   50        462±4μs     4.08±0.02ms   26.1±0.4ms   57.7±0.8ms 
                  250        616±5μs     4.62±0.06ms   38.9±0.2ms    81.1±1ms  
                  1000     1.36±0.01ms   7.17±0.07ms   80.2±0.5ms   174±0.5ms  
              =========== ============= ============= ============ ============

[ 42.86%] ··· worker_state_machine.EnsureCommunicatingOneWorker.time_unpickle                                                                                                                                                   ok
[ 42.86%] ··· =========== ========== ============= ============ ============
              --                               n_tasks                      
              ----------- --------------------------------------------------
               n_workers     100          1000         5000        10000    
              =========== ========== ============= ============ ============
                   5       370±2μs    3.57±0.02ms   23.6±0.3ms   49.6±0.3ms 
                   49      408±5μs    3.92±0.01ms   26.5±0.3ms   55.7±0.7ms 
                   50      395±2μs    3.93±0.02ms   26.6±0.2ms   56.8±0.5ms 
                  250      594±5μs    4.58±0.03ms   38.4±0.2ms    81.5±1ms  
                  1000     1.33±0ms   7.04±0.06ms   78.2±0.6ms    170±1ms   
              =========== ========== ============= ============ ============

[ 46.43%] ··· worker_state_machine.PopulateDataNeeded.time_populate_data_needed                                                                                                                                                 ok
[ 46.43%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5         754±3μs     7.26±0.05ms   41.0±0.2ms   89.9±0.2ms 
                   49        964±3μs     8.30±0.06ms   46.5±0.4ms   103±0.4ms  
                   50        971±20μs     8.31±0.1ms   46.2±0.4ms   102±0.5ms  
                  250      1.40±0.01ms   12.6±0.03ms   73.2±0.5ms    153±1ms   
                  1000     4.08±0.03ms    30.8±0.1ms    176±1ms      367±1ms   
              =========== ============= ============= ============ ============

[ 50.00%] ··· worker_state_machine.PopulateDataNeeded.time_unpickle                                                                                                                                                             ok
[ 50.00%] ··· =========== ============= ============= ============= =============
              --                                  n_tasks                        
              ----------- -------------------------------------------------------
               n_workers       100           1000          5000         10000    
              =========== ============= ============= ============= =============
                   5        13.9±0.4μs    13.7±0.1μs    13.8±0.1μs   13.6±0.09μs 
                   49       14.2±0.9μs    13.8±0.1μs   13.8±0.03μs    13.5±0.1μs 
                   50       14.0±0.6μs    13.6±0.1μs   13.7±0.03μs    13.7±0.2μs 
                  250      13.7±0.07μs   13.8±0.06μs    13.7±0.2μs   13.9±0.06μs 
                  1000      13.9±0.1μs   13.6±0.08μs    13.6±0.1μs   13.4±0.05μs 
              =========== ============= ============= ============= =============

[ 50.00%] · For distributed commit 65cb5ce0 <staging>:
[ 50.00%] ·· Building for conda-py3.8-cloudpickle-distributed-lz4-numpy-pandas-partd-s3fs-toolz...
[ 50.00%] ·· Benchmarking conda-py3.8-cloudpickle-distributed-lz4-numpy-pandas-partd-s3fs-toolz
[ 53.57%] ··· Running (worker_state_machine.EnsureCommunicatingFromIdle.time_from_idle--)..
[ 60.71%] ··· Running (worker_state_machine.EnsureCommunicatingNoop.time_noop--)..
[ 67.86%] ··· Running (worker_state_machine.EnsureCommunicatingOneWorker.time_unpickle--)..
[ 75.00%] ··· Running (worker_state_machine.PopulateDataNeeded.time_unpickle--).
[ 78.57%] ··· worker_state_machine.EnsureCommunicatingFromIdle.time_from_idle                                                                                                                                                   ok
[ 78.57%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5         568±7μs       3.34±0ms    22.1±0.4ms   45.5±0.3ms 
                   49      1.37±0.01ms   5.24±0.04ms   26.3±0.3ms   54.9±0.6ms 
                   50        1.35±0ms    5.29±0.06ms   26.3±0.3ms   56.6±0.3ms 
                  250      2.20±0.01ms    11.8±0.1ms   46.6±0.2ms   87.0±0.9ms 
                  1000     7.29±0.09ms   41.7±0.08ms    131±2ms      226±1ms   
              =========== ============= ============= ============ ============

[ 82.14%] ··· worker_state_machine.EnsureCommunicatingFromIdle.time_unpickle                                                                                                                                                    ok
[ 82.14%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5         416±2μs     3.27±0.01ms   22.0±0.3ms   45.7±0.8ms 
                   49        546±2μs       3.81±0ms    25.0±0.2ms   53.6±0.8ms 
                   50        536±9μs     3.77±0.02ms   25.2±0.2ms   53.3±0.2ms 
                  250        796±10μs    6.08±0.04ms   38.9±0.2ms   81.0±0.1ms 
                  1000     2.31±0.03ms    16.3±0.1ms   91.6±0.3ms    180±2ms   
              =========== ============= ============= ============ ============

[ 85.71%] ··· worker_state_machine.EnsureCommunicatingNoop.time_noop                                                                                                                                                            ok
[ 85.71%] ··· =========== ============= ============= ============= =============
              --                                  n_tasks                        
              ----------- -------------------------------------------------------
               n_workers       100           1000          5000         10000    
              =========== ============= ============= ============= =============
                   5        7.68±0.1μs   7.65±0.06μs   7.93±0.07μs   8.10±0.07μs 
                   49       7.76±0.2μs   13.5±0.09μs   13.4±0.07μs    13.7±0.2μs 
                   50       7.64±0.2μs   4.22±0.01μs   4.18±0.01μs   4.19±0.01μs 
                  250      4.17±0.06μs   4.16±0.05μs   4.21±0.02μs    4.05±0.1μs 
                  1000     4.17±0.01μs   4.10±0.04μs   4.16±0.03μs   4.17±0.02μs 
              =========== ============= ============= ============= =============

[ 89.29%] ··· worker_state_machine.EnsureCommunicatingOneWorker.time_one_worker                                                                                                                                                 ok
[ 89.29%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5         472±10μs    3.38±0.03ms   21.9±0.4ms   46.9±0.2ms 
                   49        509±8μs     3.83±0.01ms   25.5±0.1ms    55.0±2ms  
                   50        506±4μs      3.75±0.1ms   25.2±0.1ms    53.7±1ms  
                  250        669±1μs     4.98±0.06ms   37.8±0.6ms   79.2±0.7ms 
                  1000     1.06±0.01ms   8.17±0.05ms    79.4±1ms     169±2ms   
              =========== ============= ============= ============ ============

[ 92.86%] ··· worker_state_machine.EnsureCommunicatingOneWorker.time_unpickle                                                                                                                                                   ok
[ 92.86%] ··· =========== ========= ============= ============ ============
              --                               n_tasks                     
              ----------- -------------------------------------------------
               n_workers     100         1000         5000        10000    
              =========== ========= ============= ============ ============
                   5       419±6μs   3.29±0.03ms   21.5±0.4ms   45.8±0.7ms 
                   49      462±4μs   3.76±0.01ms   25.0±0.4ms   54.3±0.1ms 
                   50      466±2μs   3.74±0.02ms   25.0±0.2ms   53.8±0.8ms 
                  250      552±1μs   4.41±0.06ms   37.2±0.4ms   78.5±0.3ms 
                  1000     896±3μs   6.59±0.07ms    76.0±1ms     164±2ms   
              =========== ========= ============= ============ ============

[ 96.43%] ··· worker_state_machine.PopulateDataNeeded.time_populate_data_needed                                                                                                                                                 ok
[ 96.43%] ··· =========== ============= ============= ============ ============
              --                                 n_tasks                       
              ----------- -----------------------------------------------------
               n_workers       100           1000         5000        10000    
              =========== ============= ============= ============ ============
                   5        755±0.9μs     6.42±0.2ms   33.7±0.2ms   78.0±0.7ms 
                   49        963±2μs     7.24±0.04ms   39.4±0.7ms   89.1±0.7ms 
                   50        928±4μs     7.25±0.01ms   39.4±0.7ms   88.4±0.9ms 
                  250      1.35±0.01ms    11.5±0.1ms   67.1±0.8ms   137±0.8ms  
                  1000     3.99±0.06ms    29.3±0.6ms    170±2ms      341±2ms   
              =========== ============= ============= ============ ============

[100.00%] ··· worker_state_machine.PopulateDataNeeded.time_unpickle                                                                                                                                                             ok
[100.00%] ··· =========== ============ ============ ============ ============
              --                                n_tasks                      
              ----------- ---------------------------------------------------
               n_workers      100          1000         5000        10000    
              =========== ============ ============ ============ ============
                   5       85.4±0.3μs    84.1±1μs    85.0±0.7μs   84.8±0.2μs 
                   49      83.3±0.4μs    85.0±1μs    85.4±0.5μs   85.1±0.5μs 
                   50      83.6±0.4μs   83.6±0.4μs   84.1±0.3μs    84.7±1μs  
                  250      83.1±0.4μs    85.0±1μs    84.8±0.3μs   85.3±0.2μs 
                  1000     85.5±0.7μs    85.0±1μs    85.0±0.4μs   86.2±0.5μs 
              =========== ============ ============ ============ ============

distributed/asv.conf.json Outdated Show resolved Hide resolved
@crusaderky
Copy link
Collaborator Author

@fjetter please review

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending leftover "change before merging"

Thanks @crusaderky for doing this!

@crusaderky crusaderky force-pushed the ensure_communicating branch from c0b2054 to cf56a3c Compare June 26, 2022 11:06
@crusaderky
Copy link
Collaborator Author

Ready for merge.
I've removed the pickle hack as I realised that CI runs everything with --quick anyway.
I'll raise an issue upstream about it.

@fjetter fjetter merged commit 9956e67 into dask:main Jun 27, 2022
@crusaderky crusaderky deleted the ensure_communicating branch June 27, 2022 19:21
@crusaderky crusaderky self-assigned this Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternatives for current ensure_communicating
2 participants