-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of the dispatch logic to improve performance #1809
Refactoring of the dispatch logic to improve performance #1809
Conversation
I'm committing this just to have trace of it even though I will completely delete the current dispatch code in an upcoming commit.
Ditching deepcopy has improved the perf a lot.
- Reduce size of users generator - Do not copy users_on_workers if not needed - Optimize _fast_users_on_workers_copy
def remove_worker(self, worker_node_id: str) -> None: | ||
self._worker_nodes = [w for w in self._worker_nodes if w.id != worker_node_id] | ||
if len(self._worker_nodes) == 0: | ||
# TODO: Test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test for this now, right? Remove the todo :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a test for the zero worker case yet. I'll add one.
@@ -816,6 +837,9 @@ def heartbeat_worker(self): | |||
logger.info("Worker %s failed to send heartbeat, setting state to missing." % str(client.id)) | |||
client.state = STATE_MISSING | |||
client.user_classes_count = {} | |||
if self._users_dispatcher is not None: | |||
self._users_dispatcher.remove_worker(client.id) | |||
# TODO: If status is `STATE_RUNNING`, call self.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets talk about this after merge
(deleted) Edit: never mind, both the issues I saw are in 1.6 as well :) |
Lets discuss my proposed changes after merging. |
Improve logging messages and clean up code after dispatch refactoring (#1809)
NOTE: Still a work in progress, tests are not yet all updated so they fail.
This PR addresses the performance issues introduced by this other PR when one or more of the following is true:
I completely refactored the
UsersDispatcher
. In fact, it is almost completely different from before and a lot simpler. I also ditched thedistribution.py
logic as I'm now using this great little library which allows for a nginx-like weighted round-robin dispatch of the users. Ramp-down is also now supported (i.e. stopping users at a given rate).I've implemented most of the tests to validate this new implementation in
test_dispatch.py
, but I've not yet updated therunners
module.I'm still missing the logic to ensure that all workers run the expected users prior to beginning a ramp-up/down. I'll implement it in the next few days.
I implemented a small benchmark with the following config:
Each dispatch iteration takes around 130ms to compute which is very good. It's orders of magnitude faster than before. The performance is similar for the inverse scenario from 100 000 to 0 users.