Refactoring of the dispatch logic to improve performance #1809

mboutet · 2021-07-07T21:51:34Z

NOTE: Still a work in progress, tests are not yet all updated so they fail.

This PR addresses the performance issues introduced by this other PR when one or more of the following is true:

Lots of workers (> 200)
Lots of user classes (> 25)
High spawn rate (> 100)

I completely refactored the UsersDispatcher. In fact, it is almost completely different from before and a lot simpler. I also ditched the distribution.py logic as I'm now using this great little library which allows for a nginx-like weighted round-robin dispatch of the users. Ramp-down is also now supported (i.e. stopping users at a given rate).

I've implemented most of the tests to validate this new implementation in test_dispatch.py, but I've not yet updated the runners module.

I'm still missing the logic to ensure that all workers run the expected users prior to beginning a ramp-up/down. I'll implement it in the next few days.

I implemented a small benchmark with the following config:

Ramp-up from 0 to 100 000 users
1000 workers
50 user classes with varying weights
5000 spawn rate

Each dispatch iteration takes around 130ms to compute which is very good. It's orders of magnitude faster than before. The performance is similar for the inverse scenario from 100 000 to 0 users.

I'm committing this just to have trace of it even though I will completely delete the current dispatch code in an upcoming commit.

locust/dispatch.py

Ditching deepcopy has improved the perf a lot.

- Reduce size of users generator - Do not copy users_on_workers if not needed - Optimize _fast_users_on_workers_copy

benchmarks/dispatch.py

locust/runners.py

cyberw · 2021-07-12T10:42:19Z

locust/dispatch.py

+    def remove_worker(self, worker_node_id: str) -> None:
+        self._worker_nodes = [w for w in self._worker_nodes if w.id != worker_node_id]
+        if len(self._worker_nodes) == 0:
+            # TODO: Test this


There is a test for this now, right? Remove the todo :)

I don't think there's a test for the zero worker case yet. I'll add one.

locust/dispatch.py

locust/runners.py

cyberw · 2021-07-12T11:24:14Z

locust/runners.py

@@ -816,6 +837,9 @@ def heartbeat_worker(self):
                    logger.info("Worker %s failed to send heartbeat, setting state to missing." % str(client.id))
                    client.state = STATE_MISSING
                    client.user_classes_count = {}
+                    if self._users_dispatcher is not None:
+                        self._users_dispatcher.remove_worker(client.id)
+                        # TODO: If status is `STATE_RUNNING`, call self.start()


lets talk about this after merge

locust/runners.py

cyberw · 2021-07-12T11:37:27Z

(deleted)

Edit: never mind, both the issues I saw are in 1.6 as well :)

cyberw · 2021-07-12T12:17:27Z

Lets discuss my proposed changes after merging.

Improve logging messages and clean up code after dispatch refactoring (#1809)

mboutet added 5 commits July 7, 2021 09:19

Optimization of dispatch

694e83e

I'm committing this just to have trace of it even though I will completely delete the current dispatch code in an upcoming commit.

WIP on new dispatch code + distribution refactor

59f77db

[WIP] Remove unused code + remove distribution logic

5c996b3

Comment print in dispatcher

be8e004

Include roundrobin in the dependencies

4868c9a

mboutet marked this pull request as draft July 7, 2021 21:51

Change wording in TODO

86b7923

cyberw reviewed Jul 7, 2021

View reviewed changes

locust/dispatch.py Outdated Show resolved Hide resolved

cyberw reviewed Jul 7, 2021

View reviewed changes

locust/dispatch.py Outdated Show resolved Hide resolved

Simplify if-elif-else

3036c62

cyberw reviewed Jul 7, 2021

View reviewed changes

locust/dispatch.py Show resolved Hide resolved

cyberw reviewed Jul 7, 2021

View reviewed changes

locust/dispatch.py Outdated Show resolved Hide resolved

mboutet added 18 commits July 8, 2021 18:18

Improve handling of ramp-down

dce5f30

Remove commented code

73c8f92

Handle disconnecting/connecting workers

9f84e78

Handle initialized users dispatcher when worker goes missing

7c4bd3f

Fix ramp-down when spawn rate greater than number of users to stop

a47963b

Improve perf of _user_gen by using itertools.cycle

fc6f886

Remove TODOs that are no longer applicable

b7c2d3b

Update some tests to work with refactor

32ba957

Implement more test cases for dispatch

0309453

Perf improvement for dispatch

fd14a0b

Ditching deepcopy has improved the perf a lot.

Small script to benchmark dispatcher

b9c5dd4

Some scripts I use to manually test locust locally

f9adde1

Improve benchmark script

72ed364

Small perf improvement in dispatch

a0360e2

- Reduce size of users generator - Do not copy users_on_workers if not needed - Optimize _fast_users_on_workers_copy

Add TODO

15b76a8

Update tests to make them pass

6a8130c

Relax pass criteria

06d8498

Ensure there's at least one user class when instantiating the dispatcher

0c1bc35

Instantiating the dispatcher in start for the LocalRunner

55f0a75