Skip to content

Commit

Permalink
Address many of Robby's comments
Browse files Browse the repository at this point in the history
  • Loading branch information
zmerlynn committed Jan 27, 2023
1 parent 8bc2a55 commit 6c078ce
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 17 deletions.
14 changes: 5 additions & 9 deletions site/content/en/docs/Advanced/controlling-disruption.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ description: >

[A `Pod` in Kubernetes may be disrupted](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#voluntary-and-involuntary-disruptions) for involuntary reasons, e.g. hardware failure, or voluntary reasons, such as when nodes are drained for upgrades.

By default, Agones assumes your game server should never be disrupted and configures the `Pod` appropriately - but this isn't always the ideal setting. Here we discuss how Agones allows you to control the two most significant sources of voluntary `Pod` disruptions (`Pod` evictions), node upgrades and Cluster Autoscaler, using the `eviction` API on the `GameServer` object.
By default, Agones assumes your game server should never be disrupted voluntarily and configures the `Pod` appropriately - but this isn't always the ideal setting. Here we discuss how Agones allows you to control the two most significant sources of voluntary `Pod` evictions, node upgrades and Cluster Autoscaler, using the `eviction` API on the `GameServer` object.

{{< alpha title="`eviction` API" gate="SafeToEvict" >}}

Expand All @@ -19,11 +19,7 @@ By default, Agones assumes your game server should never be disrupted and config
When discussing game server pod disruption, it's important to keep two factors in mind:

* **Session Length:** What is your game servers session length (the time from when the `GameServer` is allocated to when its shutdown)? In general, we bucket the session lengths into "less than 10 minutes", "10 minutes to an hour", and "greater than an hour". (See [below](#whats-special-about-ten-minutes-and-one-hour) if you are curious about session length considerations.)
* **Disruption Tolerance:** Is your game server tolerant of disruption in a timely fashion? If your game server can handle the `TERM` signal and has [terminationGracePeriodSeconds](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution) configured for a period less than 10 minutes, meaning that the game server can either complete or checkpoint state within 10 minutes, we consider the game server tolerant of disruption in a timely fashion.

## Agones `eviction` API

Agones offers a simple way to control disruption of your game servers that should be applicable to most cloud products with default configurations: [`GameServerSpec.eviction`]({{< ref "/docs/Reference/agones_crd_api_reference.html#agones.dev/v1.GameServerSpec" >}})
* **Disruption Tolerance:** Is your game server tolerant of disruption in a timely fashion? If your game server can handle the `TERM` signal and has [terminationGracePeriodSeconds](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution) configured for a period less than 10 minutes, meaning that the game way to control disruption of your game servers that should be applicable to most cloud products with default configurations: [`GameServerSpec.eviction`]({{< ref "/docs/Reference/agones_crd_api_reference.html#agones.dev/v1.GameServerSpec" >}})

### `safe: Never` (default)

Expand All @@ -34,7 +30,7 @@ eviction:
safe: Never
```

Note: If your session length is greater than an hour, Agones may not be able to protect your game server from disruption. See [below](#considerations-for-long-sessions).
Note: If your session length is greater than an hour, Agones may not be able to fully protect your game server from disruption. See [below](#considerations-for-long-sessions).

### `safe: Always`

Expand Down Expand Up @@ -98,8 +94,8 @@ We welcome contributions to this section for other products!
## Implementation / Under the hood

Each option uses a slightly different permutation of:
* the `safe-to-evict` annotation to block [Cluster Autoscaler based eviction](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node)
* and the `agones.dev/safe-to-evict` label selector to select the `agones-gameserver-safe-to-evict-false` `PodDisruptionBudget`. This blocks [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node) and (for a limited time) [disruption from node upgrades](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets).
* the `safe-to-evict` annotation to block [Cluster Autoscaler based eviction](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node), and
* the `agones.dev/safe-to-evict` label selector to select the `agones-gameserver-safe-to-evict-false` `PodDisruptionBudget`. This blocks [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node) and (for a limited time) [disruption from node upgrades](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets).
* Note that PDBs do influence pod preemption as well, but it's not guaranteed.

As a quick reference:
Expand Down
13 changes: 7 additions & 6 deletions site/content/en/docs/Advanced/scheduling-and-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,16 +97,17 @@ This affects the Cluster autoscaler, Allocation Scheduling, Pod Scheduling and F
#### Cluster Autoscaler
{{< alert title="SafeToEvict Feature Gate" color="info" >}}
The [Alpha]({{< ref "/docs/Guides/feature-stages.md#alpha" >}}) `SafeToEvict` feature allows
[controlling disruption]({{< relref "controlling-disruption.md" >}}) on the `GameServerSpec`. The sample
configuration below will soon use the `eviction` API instead.
{{< /alert >}}

When using the “Packed” strategy, Agones will ensure that the Cluster Autoscaler doesn't attempt to evict and move `GameServer` `Pods` onto new Nodes during
gameplay by adding the annotation [`"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"`](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node)
to the backing Pod.
{{< alert title="SafeToEvict Feature Gate" color="info" >}}
The [Alpha]({{< ref "/docs/Guides/feature-stages.md#alpha" >}}) `SafeToEvict` feature allows
[controlling disruption]({{< relref "controlling-disruption.md" >}}) in a more holistic way.
Please consider enabling `SafeToEvict` and using the new `eviction` API - we welcome your
early feedback!
{{< /alert >}}

However, if a gameserver can tolerate [being evicted](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/#how-api-initiated-eviction-works)
(generally in combination with setting an appropriate graceful termination period on the gameserver pod) and you
want the Cluster Autoscaler to compact your cluster by evicting game servers when it would allow the Cluster
Expand Down
2 changes: 1 addition & 1 deletion site/layouts/shortcodes/alpha.html
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $gate := split (.Get "gate") "," }}
{{- $len_gate := len $gate }}
{{- $title := .Get "title" }}
{{- $title := .Get "title" | markdownify }}
<div class="alert alert-warning" role="alert">
<h4 class="alert-heading">Warning</h4>
<p>The {{ $title }} {{- if gt $len_gate 1 }} features are{{ else }} feature is{{ end }} currently <strong><a href="{{ ref . "/docs/Guides/feature-stages.md#alpha" }}">Alpha</a></strong>,
Expand Down
2 changes: 1 addition & 1 deletion site/layouts/shortcodes/beta.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{- $gate := .Get "gate" }}
{{- $title := .Get "title" }}
{{- $title := .Get "title" | markdownify }}
<div class="alert alert-warning" role="alert">
<h4 class="alert-heading">Warning</h4>
<p>The {{ $title }} feature is currently <strong><a href="{{ ref . "/docs/Guides/feature-stages.md#beta" }}">Beta</a></strong>,
Expand Down

0 comments on commit 6c078ce

Please sign in to comment.