-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load Testing Framework for internal and external usage #412
Comments
I would vote for locust and we could build some pre-build plan working against the kubernetes api. |
Maybe we start with a simple example using something like xonotic/simple-udp? and then we can look at how to make it a bit more customisable? I've not got strong opinions personally on load testing systems. Side thought - we may also want to think about how we can automate some of this. Wondering if we should have nightly jobs for autoscale testing and load testing??? (but that can be a phase two) |
Definitely +1 having a standard "small" load that can be assigned a port, generates logs, simulates and emits metrics, can defer response to SIGTERM (upto and beyond termination grace period) is really useful. We use sample PSU curves from real data that we scale and/or offset to simulate daily, as well as compress timeline to accelerate growth. |
I'd be interested in using this. It'd be good to be able to show stats from a cluster at scale. |
I have started looking into using Locust for this. From a quick scan, it seems to be straight-forward. We can start by writing a client to start Agones game servers. We will also need to define the test scenarios. In the first step, we can run the test from a single machine. We can then extend it spin up master and slave nodes: https://docs.locust.io/en/stable/running-locust-distributed.html. A single docker image can run as standalone, master or slave: https://docs.locust.io/en/latest/running-locust-docker.html |
Should we make a plan for what types of load tests we should have in our system?
There are likely more? Then there is likely also load testing for real CPU & network metrics for determining limits etc for the gameserver itself - to which I'm not sure how to tackle. |
Makes sense. So, two categories of tests:
Let's start by 1 since it seems to be more straight-forward using Locust. I will think about what approach we can take for 2. |
Adding my notes on the design: DesignWe will focus on two categories of tests, performance tests and load tests. These two categories have different requirements and goals which implies different test approaches. Performance TestsThe goal of performance tests is to provide metrics on various operations such as fleet scaling up/down. The existing Agones e2e test framework can be used for performance tests. Test CasesFleet scaling up. Create a fleet of size 1, increase the size to 100/1000/100000, and measure the time it takes to fully scale up the fleet. In addition to the time it takes to fully scale up the fleet, the test should also emit continues metrics on game servers. This includes how many game servers are in different states (PortAllocation, Creating, Starting, Scheduled, RequestReady, Ready). If tested with GKE, this test should be repeated with GKE cluster Autoscaling enabled and disabled. When GKE cluster Autoscaling is disabled we should test two scenarios. One where the cluster has sufficient capacity and one where it does not. Fleet scaling down. Create a fleet of size 100/1000/100000, scale down to 1, and measure the time it takes to fully scale down the fleet. In addition to the time it takes to fully scale down the fleet, the test should also emit continues metrics on game servers. This includes how many game servers are in different states (PortAllocation, Creating, Starting, Scheduled, RequestReady, Ready). If tested with GKE, this test should be repeated with GKE cluster Autoscaling enabled and disabled. When GKE cluster Autoscaling is disabled we should test two scenarios. One where the cluster has sufficient capacity and one where it does not. Load TestsLoad tests aim to test the performance of the system under heavy load. Game server allocation is an example where multiple parallel operations should be tested. Locust is a good option for load tests. Unfortunately, Locust integration with go is not stable so the only options are raw HTTP requests, or the Python client library. Locust can be easily integrated with other open source tools for storage and visualization. I have tested integration with Graphite and Grafana. Prometheus is more powerful that Graphite and is therefore a better option to Graphite. The final Locust tasks that are for running the test, and the server that is being tested should be containerized for easy adoption. Test CasesGameServerAllocation. Create a fleet of size 10/100/1000, and allocate multiple game servers in parallel. Measure the time it takes to allocate a game server. This test includes two scenarios, one in which the number of allocations exceeds the fleet size and one that it doesn’t. The tests should evaluate whether allocation time depends on the number of ready GameServers. |
ObservationsTesting Fleet scaling up/down with GKE Cluster Autoscaling enabledTest EnvironmentGKE cluster with the following configurations:
Results - Fleet ScalingI have observed that with GKE Cluster Autoscaling enabled, scaling up the Fleet gets stuck at some point.
Testing Fleet scaling up/down with GKE Cluster Autoscaling disabled
On Fleet scaling down, I have observed that there are cases where the Fleet scales down (all game servers are deleted), but the Fleet is not updated and still shows 1000 ready GameServers. Test EnvironmentGKE cluster with the following configurations:
Results - Fleet AutoscalingTest Scenario. Spin up a fleet, scale it up to 100 replicas, and then scale down to 0. Repeat multiple times. Results - Fleet AllocationTest Scenario. Spin up a fleet, scale it up to 100 replicas, and then start a Locust test where 100 users try to do a game server allocation in parallel. |
@markmandel - I see that this is marked as part of the 0.12.0 milestone (but it was also in 0.11.0, 0.10.0, 0.9.0, and 0.8.0). Is it part of the milestone optimistically (hoping for someone to finish it)? Also, for @markmandel or @pm7h - can you summarize what we think remains for this task? I know that @ilkercelikyilmaz has another test harness that does some load testing that maybe falls under this area as well. |
Reading through the other issues in the 0.12.0 milestone, I see that this is referenced from the top level plan for the 1.0 release, which at least partially answers my questions here:
|
I think the main remaining item is providing automation and dashboards for running these tests. |
@roberthbailey @ilkercelikyilmaz do we feel we can close this, now that we have the scenario load tests? |
I think so. We now have the locust load tests, the allocation load tests (gRPC and k8s API), and now the scenario tests as well. We have been using the allocation load tests to verify that new k8s versions don't introduce memory leaks and the scenario tests can be used to verify performance over a long period of time. |
CLOSING! |
Problem
We need some way to (a) be able to load test Agones at scale and (b) help users or Agones load test it for their own workloads.
Notes
I feel like we should be able to work both of these out at the same time - if we can create a framework for load testing we can use internally, such that it can also be used externally, that would be ideal.
Thoughts, feeling and opinions are appreciated 😄
Research
The text was updated successfully, but these errors were encountered: