-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-place Agones Upgrades: Testing #3795
Comments
Some guidance would be helpful, but I've started coding for the Wanderer component. I'm not sure if I'm heading in the right direction. |
// Wanderer component import ( type Config struct { func main() {
} func generateRandomConfig() Config { // Producer component import ( func main() {
} func scaleFleets(client *agones.Client) error { func allocateGameServers(client *agones.Client) error { // Monitor component import ( func main() {
} |
'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions ' |
Note
Milestone of #3766, which we are seeking feedback on. We will move forward with pieces that seem non-contentious, though.
Testing
To vet In-place Agones Upgrades, we need a combination of unit testing (which are assumed in all PRs), and end-to-end (e2e) testing. The problem with our current e2es is that they assume a particular configuration and test against it. That's good for what they're testing, but we need a different style of test to test upgrades.
From past experience, the best way I have seen to test upgrades is to be doing something and upgrade the system in-place. I propose a system where we keep a cluster under fairly active load, and mutate configuration continuously. As a starting point, imagine three subsystems:
Wanderer: Every ${PERIOD}, changes configurations randomly within a defined space, e.g. upgrade/downgrade within upgrade horizon, change feature flags, etc. I'm thinking every half hour, giving us a reasonable soak period in between as well, which may help with things like soak testing as well (at least for detecting bigger leaks).
Producer: Generates load continuously - scale up/down Fleets at random, interact with GameServers via allocation, etc. Some way to ensure there's load on the system. The Producer should keep a continuous metric for operations it does (e.g. allocations) that we can monitor SLO, so if an upgrade causes dips, we notice.
Monitor: Either configured as alerts, or as an active process monitoring the state of the system, we need to verify that e.g. Agones is healthy, Fleets are healthy, etc.
If we do this right, we could even set this up in a couple of different modes - one with a fast wanderer (e.g. 30 minutes) to make sure we cover the most possible configuration space), and one with a slow wanderer (e.g. a day, a week) to soak test more. The Producer/Monitor for each will look rather similar, just the rate of change will be different.
The text was updated successfully, but these errors were encountered: