Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of the dispatch logic to improve performance #1809

Merged
merged 26 commits into from
Jul 12, 2021

Conversation

mboutet
Copy link
Contributor

@mboutet mboutet commented Jul 7, 2021

NOTE: Still a work in progress, tests are not yet all updated so they fail.

This PR addresses the performance issues introduced by this other PR when one or more of the following is true:

  • Lots of workers (> 200)
  • Lots of user classes (> 25)
  • High spawn rate (> 100)

I completely refactored the UsersDispatcher. In fact, it is almost completely different from before and a lot simpler. I also ditched the distribution.py logic as I'm now using this great little library which allows for a nginx-like weighted round-robin dispatch of the users. Ramp-down is also now supported (i.e. stopping users at a given rate).

I've implemented most of the tests to validate this new implementation in test_dispatch.py, but I've not yet updated the runners module.

I'm still missing the logic to ensure that all workers run the expected users prior to beginning a ramp-up/down. I'll implement it in the next few days.

I implemented a small benchmark with the following config:

  • Ramp-up from 0 to 100 000 users
  • 1000 workers
  • 50 user classes with varying weights
  • 5000 spawn rate

Each dispatch iteration takes around 130ms to compute which is very good. It's orders of magnitude faster than before. The performance is similar for the inverse scenario from 100 000 to 0 users.

@mboutet mboutet marked this pull request as draft July 7, 2021 21:51
locust/dispatch.py Outdated Show resolved Hide resolved
locust/dispatch.py Outdated Show resolved Hide resolved
locust/dispatch.py Outdated Show resolved Hide resolved
def remove_worker(self, worker_node_id: str) -> None:
self._worker_nodes = [w for w in self._worker_nodes if w.id != worker_node_id]
if len(self._worker_nodes) == 0:
# TODO: Test this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for this now, right? Remove the todo :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a test for the zero worker case yet. I'll add one.

@@ -816,6 +837,9 @@ def heartbeat_worker(self):
logger.info("Worker %s failed to send heartbeat, setting state to missing." % str(client.id))
client.state = STATE_MISSING
client.user_classes_count = {}
if self._users_dispatcher is not None:
self._users_dispatcher.remove_worker(client.id)
# TODO: If status is `STATE_RUNNING`, call self.start()
Copy link
Collaborator

@cyberw cyberw Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets talk about this after merge

@cyberw
Copy link
Collaborator

cyberw commented Jul 12, 2021

(deleted)

Edit: never mind, both the issues I saw are in 1.6 as well :)

@cyberw cyberw marked this pull request as ready for review July 12, 2021 12:16
@cyberw
Copy link
Collaborator

cyberw commented Jul 12, 2021

Lets discuss my proposed changes after merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants