In-place Agones Upgrades: Testing #3795

zmerlynn · 2024-04-19T20:14:47Z

Note

Milestone of #3766, which we are seeking feedback on. We will move forward with pieces that seem non-contentious, though.

Testing

To vet In-place Agones Upgrades, we need a combination of unit testing (which are assumed in all PRs), and end-to-end (e2e) testing. The problem with our current e2es is that they assume a particular configuration and test against it. That's good for what they're testing, but we need a different style of test to test upgrades.

From past experience, the best way I have seen to test upgrades is to be doing something and upgrade the system in-place. I propose a system where we keep a cluster under fairly active load, and mutate configuration continuously. As a starting point, imagine three subsystems:

Wanderer: Every ${PERIOD}, changes configurations randomly within a defined space, e.g. upgrade/downgrade within upgrade horizon, change feature flags, etc. I'm thinking every half hour, giving us a reasonable soak period in between as well, which may help with things like soak testing as well (at least for detecting bigger leaks).
Producer: Generates load continuously - scale up/down Fleets at random, interact with GameServers via allocation, etc. Some way to ensure there's load on the system. The Producer should keep a continuous metric for operations it does (e.g. allocations) that we can monitor SLO, so if an upgrade causes dips, we notice.
Monitor: Either configured as alerts, or as an active process monitoring the state of the system, we need to verify that e.g. Agones is healthy, Fleets are healthy, etc.

If we do this right, we could even set this up in a couple of different modes - one with a fast wanderer (e.g. 30 minutes) to make sure we cover the most possible configuration space), and one with a slow wanderer (e.g. a day, a week) to soak test more. The Producer/Monitor for each will look rather similar, just the rate of change will be different.

1804devs · 2024-05-09T18:40:54Z

Some guidance would be helpful, but I've started coding for the Wanderer component. I'm not sure if I'm heading in the right direction.

1804devs · 2024-05-09T18:42:50Z

// Wanderer component
package main

import (
"bytes"
"encoding/json"
"log"
"net/http"
"time"
)

type Config struct {
// Define configuration struct
}

func main() {
// Define the Agones API endpoint
agonesEndpoint := "http://agones-api.example.com/config"

// Implement logic to trigger configuration changes
for {
    // Generate random configuration changes
    config := generateRandomConfig()

    // Convert configuration to JSON
    configJSON, err := json.Marshal(config)
    if err != nil {
        log.Println("Error marshalling configuration:", err)
        continue
    }

    // Perform configuration change request to Agones API
    _, err = http.Post(agonesEndpoint, "application/json", bytes.NewBuffer(configJSON))
    if err != nil {
        log.Println("Error sending configuration request:", err)
        continue
    }

    log.Println("Configuration change successful:", config)

    // Sleep for a defined interval before next configuration change
    time.Sleep(30 * time.Minute)
}

}

func generateRandomConfig() Config {
// Implement logic to generate random configuration changes
return Config{}
}

// Producer component
package main

import (
"agones.dev/agones"
"context"
"log"
"time"
)

func main() {
// Initialize Agones SDK client
client, err := agones.NewClient()
if err != nil {
log.Fatal("Error initializing Agones client:", err)
}

// Continuously scale Fleets and allocate GameServers
for {
    // Scale Fleets
    err := scaleFleets(client)
    if err != nil {
        log.Println("Error scaling Fleets:", err)
    }

    // Allocate GameServers
    err = allocateGameServers(client)
    if err != nil {
        log.Println("Error allocating GameServers:", err)
    }

    // Sleep for a defined interval before next action
    time.Sleep(1 * time.Minute)
}

}

func scaleFleets(client *agones.Client) error {
// Implement logic to scale Fleets
return nil
}

func allocateGameServers(client *agones.Client) error {
// Implement logic to allocate GameServers
return nil
}

// Monitor component
package main

import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"log"
"net/http"
)

func main() {
// Register Prometheus metrics
requestDuration := prometheus.NewSummaryVec(
prometheus.SummaryOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds.",
},
[]string{"handler", "method"},
)
prometheus.MustRegister(requestDuration)

// Define HTTP handler to expose metrics
http.Handle("/metrics", promhttp.Handler())

// Start HTTP server to expose metrics
log.Fatal(http.ListenAndServe(":8080", nil))

}

github-actions · 2024-10-15T10:00:29Z

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

zmerlynn added the kind/feature New features for Agones label Apr 19, 2024

zmerlynn mentioned this issue Apr 19, 2024

RFC: In-place Agones Upgrades #3766

Open

igooch self-assigned this May 24, 2024

igooch mentioned this issue Aug 19, 2024

Adds basic framework for the in place Agones upgrades test controller #3956

Merged

This was referenced Sep 10, 2024

Updates upgrade test to install multiple versions of Agones on a cluster in succession #3982

Merged

Adds game server template with containerized sdk-client-test #3987

Merged

This was referenced Sep 20, 2024

Adds clusters for the in place upgrades tests #3990

Merged

Test in place upgrades run tests #3991

Merged

github-actions bot added the stale Pending closure unless there is a strong objection. label Oct 15, 2024

igooch mentioned this issue Nov 5, 2024

Add Shutdown Delay Seconds to the sdk-client-test containers #4030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-place Agones Upgrades: Testing #3795

In-place Agones Upgrades: Testing #3795

zmerlynn commented Apr 19, 2024 •

edited

Loading

1804devs commented May 9, 2024

1804devs commented May 9, 2024

github-actions bot commented Oct 15, 2024

In-place Agones Upgrades: Testing #3795

In-place Agones Upgrades: Testing #3795

Comments

zmerlynn commented Apr 19, 2024 • edited Loading

Testing

1804devs commented May 9, 2024

1804devs commented May 9, 2024

github-actions bot commented Oct 15, 2024

zmerlynn commented Apr 19, 2024 •

edited

Loading