Skip to content

Commit

Permalink
feat!: add gpu node autoscaling support (#807) (#944)
Browse files Browse the repository at this point in the history
* feature: add gpu node autoscaling support (#807)

* add gpu node autoscaling support for top level module

* add gpu node autoscaling support for beta-private-cluster module

* feature: add gpu node autoscaling support for all modules (#807)

* add gpu node autoscaling support for all modules

* feature: add gpu node autoscaling support for all modules (#807)

* updater example/node_pool cluster_autoscaling var to work with
  gpu_resources

* feature: add gpu node autoscaling support for all modules (#807)

* fix example/node_pool formatting error

* feature: add gpu node autoscaling support for all modules (#807)

* Format gpu_resource to meet linter requirements
* Update examples/node_pool/

* feature: add gpu node autoscaling support for all modules (#807)

* Add v16.0 upgrade guide
* Update node_pool test to specify `gpu_resources`

* feature: add gpu node autoscaling support for all modules (#807)

* updates upgrade guide

Co-authored-by: Bharath KKB <bharathkrishnakb@gmail.com>

Co-authored-by: Bharath KKB <bharathkrishnakb@gmail.com>
  • Loading branch information
JamesTimms and bharathkkb committed Jul 9, 2021
1 parent 433ab2f commit e53a949
Show file tree
Hide file tree
Showing 27 changed files with 71 additions and 24 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Then perform the following commands on the root folder:
| add\_shadow\_firewall\_rules | Create GKE shadow firewall (the same as default firewall rules with firewall logs enabled). | `bool` | `false` | no |
| basic\_auth\_password | The password to be used with Basic Authentication. | `string` | `""` | no |
| basic\_auth\_username | The username to be used with Basic Authentication. An empty value will disable Basic Authentication, which is the recommended configuration. | `string` | `""` | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> })</pre> | <pre>{<br> "enabled": false,<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))<br> })</pre> | <pre>{<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_ipv4\_cidr | The IP address range of the kubernetes pods in this cluster. Default is an automatically assigned CIDR. | `any` | `null` | no |
| cluster\_resource\_labels | The GCE resource labels (a map of key/value pairs) to be applied to the cluster | `map(string)` | `{}` | no |
| configure\_ip\_masq | Enables the installation of ip masquerading, which is usually no longer required when using aliasied IP addresses. IP masquerading uses a kubectl call, so when you have a private cluster, you will need access to the API server. | `bool` | `false` | no |
Expand Down
4 changes: 2 additions & 2 deletions autogen/main/main.tf.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ locals {

release_channel = var.release_channel != null ? [{ channel : var.release_channel }] : []

autoscaling_resource_limits = var.cluster_autoscaling.enabled ? [{
autoscaling_resource_limits = var.cluster_autoscaling.enabled ? concat([{
resource_type = "cpu"
minimum = var.cluster_autoscaling.min_cpu_cores
maximum = var.cluster_autoscaling.max_cpu_cores
}, {
resource_type = "memory"
minimum = var.cluster_autoscaling.min_memory_gb
maximum = var.cluster_autoscaling.max_memory_gb
}] : []
}], var.cluster_autoscaling.gpu_resources) : []


custom_kube_dns_config = length(keys(var.stub_domains)) > 0
Expand Down
2 changes: 2 additions & 0 deletions autogen/main/variables.tf.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ variable "cluster_autoscaling" {
max_cpu_cores = number
min_memory_gb = number
max_memory_gb = number
gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))
})
default = {
enabled = false
Expand All @@ -261,6 +262,7 @@ variable "cluster_autoscaling" {
min_cpu_cores = 0
max_memory_gb = 0
min_memory_gb = 0
gpu_resources = []
}
description = "Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling)"
}
Expand Down
24 changes: 24 additions & 0 deletions docs/upgrading_to_v16.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Upgrading to v16.0

The v16.0 release of *kubernetes-engine* is a backwards incompatible release.

### cluster_autoscaling modified
The `cluster_autoscaling` variable has been modified to require a `gpu_resources` value. If you have enabled `cluster_autoscaling` and do not require `gpu_resources`, you can set it to an empty list as shown below.

```diff
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
- version = "~> 15.0"
+ version = "~> 16.0"

cluster_autoscaling = {
enabled = true
autoscaling_profile = "BALANCED"
min_cpu_cores = 1
max_cpu_cores = 100
min_memory_gb = 1
max_memory_gb = 1000
+ gpu_resources = []
}
}
```
2 changes: 1 addition & 1 deletion examples/node_pool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This example illustrates how to create a cluster with multiple custom node-pool

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({<br> resource_type = string<br> minimum = number<br> maximum = number<br> }))<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_name\_suffix | A suffix to append to the default cluster name | `string` | `""` | no |
| compute\_engine\_service\_account | Service account to associate to the nodes in the cluster | `any` | n/a | yes |
| ip\_range\_pods | The secondary ip range to use for pods | `any` | n/a | yes |
Expand Down
6 changes: 6 additions & 0 deletions examples/node_pool/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@ variable "cluster_autoscaling" {
max_cpu_cores = number
min_memory_gb = number
max_memory_gb = number
gpu_resources = list(object({
resource_type = string
minimum = number
maximum = number
}))
})
default = {
enabled = false
Expand All @@ -68,6 +73,7 @@ variable "cluster_autoscaling" {
min_cpu_cores = 0
max_memory_gb = 0
min_memory_gb = 0
gpu_resources = []
}
description = "Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling)"
}
4 changes: 2 additions & 2 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ locals {

release_channel = var.release_channel != null ? [{ channel : var.release_channel }] : []

autoscaling_resource_limits = var.cluster_autoscaling.enabled ? [{
autoscaling_resource_limits = var.cluster_autoscaling.enabled ? concat([{
resource_type = "cpu"
minimum = var.cluster_autoscaling.min_cpu_cores
maximum = var.cluster_autoscaling.max_cpu_cores
}, {
resource_type = "memory"
minimum = var.cluster_autoscaling.min_memory_gb
maximum = var.cluster_autoscaling.max_memory_gb
}] : []
}], var.cluster_autoscaling.gpu_resources) : []


custom_kube_dns_config = length(keys(var.stub_domains)) > 0
Expand Down
2 changes: 1 addition & 1 deletion modules/beta-private-cluster-update-variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ Then perform the following commands on the root folder:
| basic\_auth\_username | The username to be used with Basic Authentication. An empty value will disable Basic Authentication, which is the recommended configuration. | `string` | `""` | no |
| cloudrun | (Beta) Enable CloudRun addon | `bool` | `false` | no |
| cloudrun\_load\_balancer\_type | (Beta) Configure the Cloud Run load balancer type. External by default. Set to `LOAD_BALANCER_TYPE_INTERNAL` to configure as an internal load balancer. | `string` | `""` | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_ipv4\_cidr | The IP address range of the kubernetes pods in this cluster. Default is an automatically assigned CIDR. | `any` | `null` | no |
| cluster\_resource\_labels | The GCE resource labels (a map of key/value pairs) to be applied to the cluster | `map(string)` | `{}` | no |
| cluster\_telemetry\_type | Available options include ENABLED, DISABLED, and SYSTEM\_ONLY | `string` | `null` | no |
Expand Down
4 changes: 2 additions & 2 deletions modules/beta-private-cluster-update-variant/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ locals {

release_channel = var.release_channel != null ? [{ channel : var.release_channel }] : []

autoscaling_resource_limits = var.cluster_autoscaling.enabled ? [{
autoscaling_resource_limits = var.cluster_autoscaling.enabled ? concat([{
resource_type = "cpu"
minimum = var.cluster_autoscaling.min_cpu_cores
maximum = var.cluster_autoscaling.max_cpu_cores
}, {
resource_type = "memory"
minimum = var.cluster_autoscaling.min_memory_gb
maximum = var.cluster_autoscaling.max_memory_gb
}] : []
}], var.cluster_autoscaling.gpu_resources) : []


custom_kube_dns_config = length(keys(var.stub_domains)) > 0
Expand Down
2 changes: 2 additions & 0 deletions modules/beta-private-cluster-update-variant/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ variable "cluster_autoscaling" {
max_cpu_cores = number
min_memory_gb = number
max_memory_gb = number
gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))
})
default = {
enabled = false
Expand All @@ -249,6 +250,7 @@ variable "cluster_autoscaling" {
min_cpu_cores = 0
max_memory_gb = 0
min_memory_gb = 0
gpu_resources = []
}
description = "Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling)"
}
Expand Down
2 changes: 1 addition & 1 deletion modules/beta-private-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ Then perform the following commands on the root folder:
| basic\_auth\_username | The username to be used with Basic Authentication. An empty value will disable Basic Authentication, which is the recommended configuration. | `string` | `""` | no |
| cloudrun | (Beta) Enable CloudRun addon | `bool` | `false` | no |
| cloudrun\_load\_balancer\_type | (Beta) Configure the Cloud Run load balancer type. External by default. Set to `LOAD_BALANCER_TYPE_INTERNAL` to configure as an internal load balancer. | `string` | `""` | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_ipv4\_cidr | The IP address range of the kubernetes pods in this cluster. Default is an automatically assigned CIDR. | `any` | `null` | no |
| cluster\_resource\_labels | The GCE resource labels (a map of key/value pairs) to be applied to the cluster | `map(string)` | `{}` | no |
| cluster\_telemetry\_type | Available options include ENABLED, DISABLED, and SYSTEM\_ONLY | `string` | `null` | no |
Expand Down
4 changes: 2 additions & 2 deletions modules/beta-private-cluster/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ locals {

release_channel = var.release_channel != null ? [{ channel : var.release_channel }] : []

autoscaling_resource_limits = var.cluster_autoscaling.enabled ? [{
autoscaling_resource_limits = var.cluster_autoscaling.enabled ? concat([{
resource_type = "cpu"
minimum = var.cluster_autoscaling.min_cpu_cores
maximum = var.cluster_autoscaling.max_cpu_cores
}, {
resource_type = "memory"
minimum = var.cluster_autoscaling.min_memory_gb
maximum = var.cluster_autoscaling.max_memory_gb
}] : []
}], var.cluster_autoscaling.gpu_resources) : []


custom_kube_dns_config = length(keys(var.stub_domains)) > 0
Expand Down
2 changes: 2 additions & 0 deletions modules/beta-private-cluster/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ variable "cluster_autoscaling" {
max_cpu_cores = number
min_memory_gb = number
max_memory_gb = number
gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))
})
default = {
enabled = false
Expand All @@ -249,6 +250,7 @@ variable "cluster_autoscaling" {
min_cpu_cores = 0
max_memory_gb = 0
min_memory_gb = 0
gpu_resources = []
}
description = "Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling)"
}
Expand Down
2 changes: 1 addition & 1 deletion modules/beta-public-cluster-update-variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Then perform the following commands on the root folder:
| basic\_auth\_username | The username to be used with Basic Authentication. An empty value will disable Basic Authentication, which is the recommended configuration. | `string` | `""` | no |
| cloudrun | (Beta) Enable CloudRun addon | `bool` | `false` | no |
| cloudrun\_load\_balancer\_type | (Beta) Configure the Cloud Run load balancer type. External by default. Set to `LOAD_BALANCER_TYPE_INTERNAL` to configure as an internal load balancer. | `string` | `""` | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
| cluster\_ipv4\_cidr | The IP address range of the kubernetes pods in this cluster. Default is an automatically assigned CIDR. | `any` | `null` | no |
| cluster\_resource\_labels | The GCE resource labels (a map of key/value pairs) to be applied to the cluster | `map(string)` | `{}` | no |
| cluster\_telemetry\_type | Available options include ENABLED, DISABLED, and SYSTEM\_ONLY | `string` | `null` | no |
Expand Down
4 changes: 2 additions & 2 deletions modules/beta-public-cluster-update-variant/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ locals {

release_channel = var.release_channel != null ? [{ channel : var.release_channel }] : []

autoscaling_resource_limits = var.cluster_autoscaling.enabled ? [{
autoscaling_resource_limits = var.cluster_autoscaling.enabled ? concat([{
resource_type = "cpu"
minimum = var.cluster_autoscaling.min_cpu_cores
maximum = var.cluster_autoscaling.max_cpu_cores
}, {
resource_type = "memory"
minimum = var.cluster_autoscaling.min_memory_gb
maximum = var.cluster_autoscaling.max_memory_gb
}] : []
}], var.cluster_autoscaling.gpu_resources) : []


custom_kube_dns_config = length(keys(var.stub_domains)) > 0
Expand Down
2 changes: 2 additions & 0 deletions modules/beta-public-cluster-update-variant/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ variable "cluster_autoscaling" {
max_cpu_cores = number
min_memory_gb = number
max_memory_gb = number
gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))
})
default = {
enabled = false
Expand All @@ -249,6 +250,7 @@ variable "cluster_autoscaling" {
min_cpu_cores = 0
max_memory_gb = 0
min_memory_gb = 0
gpu_resources = []
}
description = "Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling)"
}
Expand Down
Loading

0 comments on commit e53a949

Please sign in to comment.