Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU-VPC module #3391

Merged
merged 4 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ fi

# Exit if GPU isn't H100
GPU_MODEL=$(nvidia-smi --query-gpu=name --format=csv,noheader)
if [[ "$GPU_MODEL" != *"H100"* ]]; then
echo "Non-H100 GPU detected" >&2
if ! [[ "$GPU_MODEL" =~ H[1-2]00 ]]; then
echo "No H100 or H200 GPU detected" >&2
exit 0
fi

Expand Down Expand Up @@ -80,7 +80,7 @@ if [ $NUMGPUS -gt 0 ]; then
if [ $DCGM_FAILED -eq 0 ] || \
[ $ECC_ERRORS -gt 0 ] || \
[ $NVLINK_ERRORS -gt 0 ]; then
REASON="H100 GPU issues detected: "
REASON="GPU issues detected: "
[ $DCGM_FAILED -eq 0 ] && REASON+="DCGM test failed, "
[ $ECC_ERRORS -gt 0 ] && REASON+="ECC errors found ($ECC_ERRORS double-bit errors), "
[ $NVLINK_ERRORS -gt 0 ] && REASON+="NVLink errors detected ($NVLINK_ERRORS errors), "
Expand Down
143 changes: 143 additions & 0 deletions modules/network/gpu-rdma-vpc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
## Description

This module accomplishes the following:

* Creates one [VPC network][cft-network]
* Each VPC contains a variable number of subnetworks as specified in the
`subnetworks_template` variable
* Each subnetwork contains distinct IP address ranges
* Outputs the following unique parameters
* `subnetwork_interfaces` which is compatible with Slurm and vm-instance
modules
* `subnetwork_interfaces_gke` which is compatible with GKE modules

This module is a simplified version of the VPC module and its main difference
is the variable `subnetwork_template` which is the template for all subnetworks
created within the network. This template contains the following values:

1. `count`: The number of subnetworks to be created
1. `name_prefix`: The prefix for the subnetwork names
1. `ip_range`: [CIDR-formatted IP range][cidr]
1. `region`: The region where the subnetwork will be deployed

> [!WARNING]
> The `ip_range` should be always be large enough to split into `count`
> subnetworks and the number of required connections within.

[cft-network]: https://github.com/terraform-google-modules/terraform-google-network/tree/v10.0.0
[cidr]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation

### Example

This snippet uses the gpu-vpc module to create a new VPC network named
`test-rdma-net` with 8 subnetworks named `test-mrdma-sub-#` where # ranges from
0 to 7. The subnetworks will split the `ip_range` evenly, starting from bit 16
(0 indexed). The networks are ingested by the Slurm nodeset within the
`additional_networks` setting.

```yaml
- id: rdma-net
source: modules/network/gpu-rdma-vpc
settings:
network_name: test-rdma-net
network_profile: https://www.googleapis.com/compute/beta/projects/$(vars.project_id)/global/networkProfiles/$(vars.zone)-vpc-roce
network_routing_mode: REGIONAL
subnetworks_template:
name_prefix: test-mrdma-sub
count: 8
ip_range: 192.168.0.0/16
region: $(vars.region)

- id: a3_nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use: [network0]
settings:
machine_type: a3-ultragpu-8g
additional_networks:
$(concat(
[{
network=null,
subnetwork=network1.subnetwork_self_link,
subnetwork_project=vars.project_id,
nic_type="GVNIC",
queue_count=null,
network_ip="",
stack_type=null,
access_config=[],
ipv6_access_config=[],
alias_ip_range=[]
}],
rdma-net.subnetwork_interfaces
))
...
```

## License

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 0.15.0 |

## Providers

No providers.

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_vpc"></a> [vpc](#module\_vpc) | terraform-google-modules/network/google | ~> 10.0 |

## Resources

No resources.

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_delete_default_internet_gateway_routes"></a> [delete\_default\_internet\_gateway\_routes](#input\_delete\_default\_internet\_gateway\_routes) | If set, ensure that all routes within the network specified whose names begin with 'default-route' and with a next hop of 'default-internet-gateway' are deleted | `bool` | `false` | no |
| <a name="input_deployment_name"></a> [deployment\_name](#input\_deployment\_name) | The name of the current deployment | `string` | n/a | yes |
| <a name="input_enable_internal_traffic"></a> [enable\_internal\_traffic](#input\_enable\_internal\_traffic) | Enable a firewall rule to allow all internal TCP, UDP, and ICMP traffic within the network | `bool` | `true` | no |
| <a name="input_firewall_log_config"></a> [firewall\_log\_config](#input\_firewall\_log\_config) | Firewall log configuration for Toolkit firewall rules (var.enable\_iap\_ssh\_ingress and others) | `string` | `"DISABLE_LOGGING"` | no |
| <a name="input_firewall_rules"></a> [firewall\_rules](#input\_firewall\_rules) | List of firewall rules | `any` | `[]` | no |
| <a name="input_mtu"></a> [mtu](#input\_mtu) | The network MTU (default: 8896). Recommended values: 0 (use Compute Engine default), 1460 (default outside HPC environments), 1500 (Internet default), or 8896 (for Jumbo packets). Allowed are all values in the range 1300 to 8896, inclusively. | `number` | `8896` | no |
| <a name="input_network_description"></a> [network\_description](#input\_network\_description) | An optional description of this resource (changes will trigger resource destroy/create) | `string` | `""` | no |
| <a name="input_network_name"></a> [network\_name](#input\_network\_name) | The name of the network to be created (if unsupplied, will default to "{deployment\_name}-net") | `string` | `null` | no |
| <a name="input_network_profile"></a> [network\_profile](#input\_network\_profile) | A full or partial URL of the network profile to apply to this network.<br/>This field can be set only at resource creation time. For example, the<br/>following are valid URLs:<br/>- https://www.googleapis.com/compute/beta/projects/{projectId}/global/networkProfiles/{network_profile_name}<br/>- projects/{projectId}/global/networkProfiles/{network\_profile\_name}} | `string` | n/a | yes |
| <a name="input_network_routing_mode"></a> [network\_routing\_mode](#input\_network\_routing\_mode) | The network routing mode (default "REGIONAL") | `string` | `"REGIONAL"` | no |
| <a name="input_nic_type"></a> [nic\_type](#input\_nic\_type) | NIC type for use in modules that use the output | `string` | `"MRDMA"` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project in which the HPC deployment will be created | `string` | n/a | yes |
| <a name="input_region"></a> [region](#input\_region) | The default region for Cloud resources | `string` | n/a | yes |
| <a name="input_shared_vpc_host"></a> [shared\_vpc\_host](#input\_shared\_vpc\_host) | Makes this project a Shared VPC host if 'true' (default 'false') | `bool` | `false` | no |
| <a name="input_subnetworks_template"></a> [subnetworks\_template](#input\_subnetworks\_template) | Specifications for the subnetworks that will be created within this VPC.<br/><br/>count (number, required, number of subnets to create, default is 8)<br/>name\_prefix (string, required, subnet name prefix, default is deployment name)<br/>ip\_range (string, required, range of IPs for all subnets to share (CIDR format), default is 192.168.0.0/16)<br/>region (string, optional, region to deploy subnets to, defaults to vars.region) | <pre>object({<br/> count = number<br/> name_prefix = string<br/> ip_range = string<br/> region = optional(string)<br/> })</pre> | <pre>{<br/> "count": 8,<br/> "ip_range": "192.168.0.0/16",<br/> "name_prefix": null,<br/> "region": null<br/>}</pre> | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_network_id"></a> [network\_id](#output\_network\_id) | ID of the new VPC network |
| <a name="output_network_name"></a> [network\_name](#output\_network\_name) | Name of the new VPC network |
| <a name="output_network_self_link"></a> [network\_self\_link](#output\_network\_self\_link) | Self link of the new VPC network |
| <a name="output_subnetwork_interfaces"></a> [subnetwork\_interfaces](#output\_subnetwork\_interfaces) | Full list of subnetwork objects belonging to the new VPC network (compatible with vm-instance and Slurm modules) |
| <a name="output_subnetwork_interfaces_gke"></a> [subnetwork\_interfaces\_gke](#output\_subnetwork\_interfaces\_gke) | Full list of subnetwork objects belonging to the new VPC network (compatible with gke-node-pool) |
| <a name="output_subnetwork_name_prefix"></a> [subnetwork\_name\_prefix](#output\_subnetwork\_name\_prefix) | Prefix of the RDMA subnetwork names |
| <a name="output_subnetworks"></a> [subnetworks](#output\_subnetworks) | Full list of subnetwork objects belonging to the new VPC network |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
114 changes: 114 additions & 0 deletions modules/network/gpu-rdma-vpc/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

locals {
autoname = replace(var.deployment_name, "_", "-")
network_name = var.network_name == null ? "${local.autoname}-net" : var.network_name
subnet_prefix = var.subnetworks_template.name_prefix == null ? "${local.autoname}-subnet" : var.subnetworks_template.name_prefix

new_bits = ceil(log(var.subnetworks_template.count, 2))
template_subnetworks = [for i in range(var.subnetworks_template.count) :
{
subnet_name = "${local.subnet_prefix}-${i}"
subnet_region = try(var.subnetworks_template.region, var.region)
subnet_ip = cidrsubnet(var.subnetworks_template.ip_range, local.new_bits, i)
}
]

firewall_log_api_values = {
"DISABLE_LOGGING" = null
"INCLUDE_ALL_METADATA" = { metadata = "INCLUDE_ALL_METADATA" },
"EXCLUDE_ALL_METADATA" = { metadata = "EXCLUDE_ALL_METADATA" },
}
firewall_log_config = lookup(local.firewall_log_api_values, var.firewall_log_config, null)

allow_internal_traffic = {
name = "${local.network_name}-fw-allow-internal-traffic"
priority = null
description = "allow traffic between nodes of this VPC"
direction = "INGRESS"
ranges = [var.subnetworks_template.ip_range]
source_tags = null
source_service_accounts = null
target_tags = null
target_service_accounts = null
allow = [{
protocol = "tcp"
ports = ["0-65535"]
}, {
protocol = "udp"
ports = ["0-65535"]
}, {
protocol = "icmp"
ports = null
},
]
deny = []
log_config = local.firewall_log_config
}

firewall_rules = concat(
var.firewall_rules,
var.enable_internal_traffic ? [local.allow_internal_traffic] : [],
)

output_subnets = [
for subnet in module.vpc.subnets : {
network = null
subnetwork = subnet.self_link
subnetwork_project = null # will populate from subnetwork_self_link
network_ip = null
nic_type = var.nic_type
stack_type = null
queue_count = null
access_config = []
ipv6_access_config = []
alias_ip_range = []
}
]

output_subnets_gke = [
for i in range(length(module.vpc.subnets)) : {
network = local.network_name
subnetwork = local.template_subnetworks[i].subnet_name
subnetwork_project = var.project_id
network_ip = null
nic_type = var.nic_type
stack_type = null
queue_count = null
access_config = []
ipv6_access_config = []
alias_ip_range = []
}
]
}

module "vpc" {
source = "terraform-google-modules/network/google"
version = "~> 10.0"

network_name = local.network_name
project_id = var.project_id
auto_create_subnetworks = false
subnets = local.template_subnetworks
routing_mode = var.network_routing_mode
mtu = var.mtu
description = var.network_description
shared_vpc_host = var.shared_vpc_host
delete_default_internet_gateway_routes = var.delete_default_internet_gateway_routes
firewall_rules = local.firewall_rules
network_profile = var.network_profile
}
19 changes: 19 additions & 0 deletions modules/network/gpu-rdma-vpc/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Copyright 2023 "Google LLC"
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

spec:
requirements:
services:
- compute.googleapis.com
59 changes: 59 additions & 0 deletions modules/network/gpu-rdma-vpc/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

output "network_name" {
description = "Name of the new VPC network"
value = module.vpc.network_name
depends_on = [module.vpc]
}

output "network_id" {
description = "ID of the new VPC network"
value = module.vpc.network_id
depends_on = [module.vpc]
}

output "network_self_link" {
description = "Self link of the new VPC network"
value = module.vpc.network_self_link
depends_on = [module.vpc]
}

output "subnetworks" {
description = "Full list of subnetwork objects belonging to the new VPC network"
value = module.vpc.subnets
depends_on = [module.vpc]
}

output "subnetwork_interfaces" {
description = "Full list of subnetwork objects belonging to the new VPC network (compatible with vm-instance and Slurm modules)"
value = local.output_subnets
depends_on = [module.vpc]
}

# The output subnetwork_interfaces is compatible with vm-instance module but not with gke-node-pool
# See https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/99493df21cecf6a092c45298bf7a45e0343cf622/modules/compute/vm-instance/variables.tf#L220
# So, we need a separate output that makes the network and subnetwork names available
output "subnetwork_interfaces_gke" {
description = "Full list of subnetwork objects belonging to the new VPC network (compatible with gke-node-pool)"
value = local.output_subnets_gke
depends_on = [module.vpc]
}

output "subnetwork_name_prefix" {
description = "Prefix of the RDMA subnetwork names"
value = var.subnetworks_template.name_prefix
}
Loading
Loading