-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable statistically meaningful constant RPS load generation distributions #1281
Conversation
Is that s a bit out of the scope of locust, no? Running and monitoring ntpd or chronyd should be the responsibility of the system, not locust. |
Hi! Thanks for a well described PR. What happens when some users are killed or spawned during a test (e.g. if a new slave node connects during a test). Are Locust user instances assigned new IDs? There is an existing issue where new Locust are spawned in bursts when when running Locust with a large number of slave nodes (#896). One proposed solution to this is to introduce a different delay for each slave node when spawning locust (to spread out the spawning). Would these changes be compatible with that behaviour? |
If this feature causes unexpected/weird behaviour if the clocks are out of sync then I think we need to at least fail the run or log a warning. I would not want to be the one debugging that :) (unless of course the behaviour would only impact the in-second distribution of requests in which case it doesnt really matter that much) |
Oh, and speaking of time syncing, I think maybe we should use time.monotonic() instead of time.time() for this timer and all others as well. |
I agree.
Each slave maintains the IDs for its own locusts (starting from zero for every slave). Moreover, each slave gets a "timeslot" from the master (which is updated everytime slave connects/quits). This slave timeslot makes sure that requests from different slaves are interleaved nicely to get the required distribution. I have included a screenshot of what happens if a slave quits (slave connects around 150 requests and quits around 700 requests):
This PR works to decouple the locust hatch time from the request generation pattern by using locust IDs instead. As long as locusts have consecutive IDs at all times, we should get expected results. On the other hand, I think the
Out of sync clocks between slaves should affect the in-second distribution only. A task will be executed at the same interval, just the position in time for the task will be skewed.
Makes sense. I will update the new wait_time functions to use time.monotonic() One comment I want to make is that these distributions can be disturbed if a task takes longer than the wait_time specified. I generally specify a timeout for requests which is less than the specified wait_time to avoid this. |
Additionally, I just pulled the lastest changes from master and looks like #1266 removes the global The information required in wait_time functions for this PR is:
What is the recommended way to bring in this information now? If I add arguments to wait_time function it will break existing wait_time functions. |
The runner instance is now accessible through |
@timran1 can you have a look at the conflicts? Having this in 1.0 would be nice! |
That's already the case for Locust though, due to how response time stats aggregation works. There is some commented out code from back in 2012 that seemingly is supposed to check for this, but those lines seems to have almost been commited by mistake judging from the commit message from yours truly... Lines 461 to 463 in 2ac0a84
|
Codecov Report
@@ Coverage Diff @@
## master #1281 +/- ##
==========================================
- Coverage 80.21% 79.25% -0.97%
==========================================
Files 23 23
Lines 2138 2198 +60
Branches 322 332 +10
==========================================
+ Hits 1715 1742 +27
- Misses 344 373 +29
- Partials 79 83 +4
Continue to review full report at Codecov.
|
I have fixed the conflicts and updated test cases to account for additional messages sent by |
LGTM. Ok to merge @heyman ? |
Is it possible to write tests for the wait_time functions as well? |
We may need to run the tests for a while as the wait functions actually waits but this should be doable. I will work on these tests in next couple of days. |
That would be great! I can totally see that it might be far from trivial to write good tests for this, but I think it would be really good to have them since it might be hard to catch regressions in the future otherwise. |
Would freezegun (with tick) or mock of time/monotonic be helpful for the running of tests which require long runs? Or am I mistaken and the code would actually have to run for a certain amount of time for the test to be effective? |
@timran1 Will you have time to look at adding the tests & resolving the conflicts? Sorry for the slow response. Personally I'm not very invested in this change (so I havent really put the time in to determine if it is good nor not). If there are no further updates i will decline this PR in a week or so (we can always open a new one or reopen it) |
Closing due to inactivity. Feel free to reopen if someone (@timran1 ?) has the time to fix the conflicts & add tests. |
Locust is very userful for stress testing web services at constant RPS. Prior work has added support for
constant_pacing
basedwait_time
which ensures that RPS is independent of how long the web services takes to respond back. However, how the individual requests are spread out within a single second of say a 10 RPS load is not currently considered at all in Locust. This can impose undue bursty pressure of requests on the system under test. The situation worsens if a slave is added or removed during load generation.To measure and understand the existing work, I developed a small web service which simply records and plots the request arrival time, inter-arrival time between consecutive requests and the frequency distribution of inter-arrival times. Refer to the screenshots below for particular examples.
The following commands are used to invoke locust with different
wait_time
schemes implemented in this PR.Similar commands are used for distributed master/slave configurations.
Existing Support
Using
![locust-constant_pacing-single](https://user-images.githubusercontent.com/14994891/76275557-f7f71d00-6259-11ea-82bd-05569e9d28ff.png)
constant_pacing
on a single slave. Notice the bursts followed by periods of no requests. The exact pattern depends on when the locusts/users are hatched:Using
![locust-constant_pacing-multiple](https://user-images.githubusercontent.com/14994891/76275556-f7f71d00-6259-11ea-8738-de6fc6019b7a.png)
constant_pacing
on a multiple slaves. Notice the inter-arrival time is affected when new slave joins after around 100 requests:Added Support
An open-loop load generation behavior has been added which considers the clock time, user ID and slave ID to decide when to trigger tasks, such that a statistical distribution of task/request arrival time is achieved.
Constant Inter-Arrival Time
Using
![locust-constant_uniform-single](https://user-images.githubusercontent.com/14994891/76275558-f7f71d00-6259-11ea-9ba5-c1dcc5572214.png)
constant_uniform
on a single slave:Using
![locust-constant_uniform-multiple](https://user-images.githubusercontent.com/14994891/76275555-f7f71d00-6259-11ea-9e7f-f5cdab5bd433.png)
constant_uniform
on multiple slaves. Notice the system adjusts itself after a slave is added after around 100 requests.Poisson Distribution Inter-Arrival Time
Using
![locust-poisson-single](https://user-images.githubusercontent.com/14994891/76275559-f88fb380-6259-11ea-9a6d-7818aa027b3b.png)
poisson
on a single slave:Using
![locust-poisson-multiple](https://user-images.githubusercontent.com/14994891/76275553-f75e8680-6259-11ea-80ba-a841d210c7ce.png)
poisson
on multiple slaves:Implementation Details
Most of the edits in core locusts files are related to maintaining an ID for each locust such that IDs are consecutive at all times (even after random locust kills). Secondly, a similar scheme is used to track and communicate slave/client ID and hence the timeslots each client is supposed to issue requests at.
Currently the implementation does not account for difference in wall clock times of different nodes. However, if this support gets through, I can work on using a clock synchronization protocol to account for this affect as well.
Comments and suggestions are welcome.