Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for gpu_partition_size #1072

Merged
merged 3 commits into from
Nov 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
1 change: 1 addition & 0 deletions autogen/main/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
2 changes: 2 additions & 0 deletions autogen/main/cluster.tf.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -598,9 +598,11 @@ resource "google_container_node_pool" "pools" {
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
10 changes: 6 additions & 4 deletions cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -308,11 +308,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
27 changes: 14 additions & 13 deletions examples/node_pool/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ locals {
}

provider "google-beta" {
version = "~> 3.79.0"
version = "~> 3.90.0"
region = var.region
}

Expand Down Expand Up @@ -55,18 +55,19 @@ module "gke" {
auto_upgrade = true
},
{
name = "pool-02"
machine_type = "n1-standard-2"
min_count = 1
max_count = 2
local_ssd_count = 0
disk_size_gb = 30
disk_type = "pd-standard"
accelerator_count = 1
accelerator_type = "nvidia-tesla-p4"
image_type = "COS"
auto_repair = false
service_account = var.compute_engine_service_account
name = "pool-02"
machine_type = "a2-highgpu-1g"
min_count = 1
max_count = 2
local_ssd_count = 0
disk_size_gb = 30
disk_type = "pd-standard"
accelerator_count = 1
accelerator_type = "nvidia-tesla-a100"
gpu_partition_size = "1g.5gb"
image_type = "COS"
auto_repair = false
service_account = var.compute_engine_service_account
},
{
name = "pool-03"
Expand Down
1 change: 1 addition & 0 deletions modules/beta-private-cluster-update-variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
10 changes: 6 additions & 4 deletions modules/beta-private-cluster-update-variant/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -535,11 +535,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
1 change: 1 addition & 0 deletions modules/beta-private-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
10 changes: 6 additions & 4 deletions modules/beta-private-cluster/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -450,11 +450,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
1 change: 1 addition & 0 deletions modules/beta-public-cluster-update-variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
10 changes: 6 additions & 4 deletions modules/beta-public-cluster-update-variant/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -516,11 +516,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
1 change: 1 addition & 0 deletions modules/beta-public-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
10 changes: 6 additions & 4 deletions modules/beta-public-cluster/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -431,11 +431,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
1 change: 1 addition & 0 deletions modules/private-cluster-update-variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
10 changes: 6 additions & 4 deletions modules/private-cluster-update-variant/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -406,11 +406,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
1 change: 1 addition & 0 deletions modules/private-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,7 @@ The node_pools variable takes the following parameters:
| effect | Effect for the taint | | Required |
| enable_integrity_monitoring | Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. | true | Optional |
| enable_secure_boot | Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. | false | Optional |
| gpu_partition_size | Size of partitions to create on the GPU | null | Optional |
| image_type | The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool | COS | Optional |
| initial_node_count | The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count | " " | Optional |
| key | The key required for the taint | | Required |
Expand Down
10 changes: 6 additions & 4 deletions modules/private-cluster/cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -321,11 +321,13 @@ resource "google_container_node_pool" "pools" {

guest_accelerator = [
for guest_accelerator in lookup(each.value, "accelerator_count", 0) > 0 ? [{
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
type = lookup(each.value, "accelerator_type", "")
count = lookup(each.value, "accelerator_count", 0)
gpu_partition_size = lookup(each.value, "gpu_partition_size", null)
}] : [] : {
type = guest_accelerator["type"]
count = guest_accelerator["count"]
type = guest_accelerator["type"]
count = guest_accelerator["count"]
gpu_partition_size = guest_accelerator["gpu_partition_size"]
}
]

Expand Down
7 changes: 4 additions & 3 deletions test/integration/node_pool/controls/gcloud.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
cluster_name = attribute('cluster_name')

expected_accelerators_count = "1"
expected_accelerators_type = "nvidia-tesla-p4"
expected_accelerators_type = "nvidia-tesla-a100"

control "gcloud" do
title "Google Compute Engine GKE configuration"
Expand Down Expand Up @@ -207,7 +207,7 @@
including(
"name" => "pool-02",
"config" => including(
"machineType" => "n1-standard-2",
"machineType" => "a2-highgpu-1g",
),
)
)
Expand Down Expand Up @@ -252,7 +252,8 @@
"name" => "pool-02",
"config" => including(
"accelerators" => [{"acceleratorCount" => expected_accelerators_count,
"acceleratorType" => expected_accelerators_type}],
"acceleratorType" => expected_accelerators_type,
"gpuPartitionSize" => "1g.5gb"}],
),
)
)
Expand Down