Skip to content

Commit

Permalink
Add parallelstore module skeleton code for HPC Rocky 8 image
Browse files Browse the repository at this point in the history
  • Loading branch information
harshthakkar01 committed Jun 24, 2024
1 parent 5629f4b commit bfe86aa
Show file tree
Hide file tree
Showing 7 changed files with 432 additions and 0 deletions.
35 changes: 35 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1863,3 +1863,38 @@ To avoid these issues, the `ghpc_stage` function can be used to copy a file (or

The `ghpc_stage` function will always look first in the path specified in the blueprint. If the file is not found at this path then `ghpc_stage` will look for the staged file in the deployment folder, if a deployment folder exists.
This means that you can redeploy a blueprint (`ghpc deploy <blueprint> -w`) so long as you have the deployment folder from the original deployment, even if locally referenced files are not available.

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

No requirements.

## Providers

| Name | Version |
|------|---------|
| <a name="provider_google-beta"></a> [google-beta](#provider\_google-beta) | n/a |

## Modules

No modules.

## Resources

| Name | Type |
|------|------|
| [google-beta_google_compute_global_address.private_ip_alloc](https://registry.terraform.io/providers/hashicorp/google-beta/latest/docs/resources/google_compute_global_address) | resource |
| [google-beta_google_compute_network.network](https://registry.terraform.io/providers/hashicorp/google-beta/latest/docs/resources/google_compute_network) | resource |
| [google-beta_google_parallelstore_instance.instance](https://registry.terraform.io/providers/hashicorp/google-beta/latest/docs/resources/google_parallelstore_instance) | resource |
| [google-beta_google_service_networking_connection.default](https://registry.terraform.io/providers/hashicorp/google-beta/latest/docs/resources/google_service_networking_connection) | resource |

## Inputs

No inputs.

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_access_points"></a> [access\_points](#output\_access\_points) | Output access points |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
74 changes: 74 additions & 0 deletions modules/file-system/parallelstore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Copyright 2024 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 0.13 |
| <a name="requirement_google-beta"></a> [google-beta](#requirement\_google-beta) | >= 5.25.0 |
| <a name="requirement_null"></a> [null](#requirement\_null) | ~> 3.0 |
| <a name="requirement_random"></a> [random](#requirement\_random) | ~> 3.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_google-beta"></a> [google-beta](#provider\_google-beta) | >= 5.25.0 |
| <a name="provider_null"></a> [null](#provider\_null) | ~> 3.0 |
| <a name="provider_random"></a> [random](#provider\_random) | ~> 3.0 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_startup_script"></a> [startup\_script](#module\_startup\_script) | github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script | v1.34.0&depth=1 |

## Resources

| Name | Type |
|------|------|
| [google-beta_google_parallelstore_instance.instance](https://registry.terraform.io/providers/hashicorp/google-beta/latest/docs/resources/google_parallelstore_instance) | resource |
| [null_resource.hydration](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [random_id.resource_name_suffix](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/id) | resource |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_deployment_name"></a> [deployment\_name](#input\_deployment\_name) | Name of the HPC deployment. | `string` | n/a | yes |
| <a name="input_destination_hydration_parallelstore"></a> [destination\_hydration\_parallelstore](#input\_destination\_hydration\_parallelstore) | The name of local path to import data on parallelstore instance from GCS bucket. | `string` | `"/"` | no |
| <a name="input_labels"></a> [labels](#input\_labels) | Labels to add to parallel store instance. | `map(string)` | n/a | yes |
| <a name="input_local_mount"></a> [local\_mount](#input\_local\_mount) | The mount point where the contents of the device may be accessed after mounting. | `string` | `"/parallelstore"` | no |
| <a name="input_mount_options"></a> [mount\_options](#input\_mount\_options) | Options describing various aspects of the parallelstore instance. | `string` | `"disable-wb-cache,thread-count=16,eq-count=8"` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of parallelstore instance. | `string` | `null` | no |
| <a name="input_network_id"></a> [network\_id](#input\_network\_id) | The ID of the GCE VPC network to which the instance is connected given in the format:<br>`projects/<project_id>/global/networks/<network_name>`" | `string` | n/a | yes |
| <a name="input_private_vpc_connection_peering"></a> [private\_vpc\_connection\_peering](#input\_private\_vpc\_connection\_peering) | The name of the VPC Network peering connection. | `string` | n/a | yes |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project in which the HPC deployment will be created. | `string` | n/a | yes |
| <a name="input_region"></a> [region](#input\_region) | The region in which HPC deployment will be created. | `string` | n/a | yes |
| <a name="input_size_gb"></a> [size\_gb](#input\_size\_gb) | Storage size of the parallelstore instance in GB. | `number` | `12000` | no |
| <a name="input_source_gcs_bucket_uri"></a> [source\_gcs\_bucket\_uri](#input\_source\_gcs\_bucket\_uri) | The name of the GCS bucket to import data from to parallelstore. | `string` | `""` | no |
| <a name="input_user_mode"></a> [user\_mode](#input\_user\_mode) | User mode to connect to daos container. can be single-user/ multi-user | `string` | `"multi-user"` | no |
| <a name="input_zone"></a> [zone](#input\_zone) | Location for parallelstore instance. | `string` | n/a | yes |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_compute_startup_script"></a> [compute\_startup\_script](#output\_compute\_startup\_script) | Script for installing and mounting parallelstore instance to compute node. |
| <a name="output_controller_startup_script"></a> [controller\_startup\_script](#output\_controller\_startup\_script) | Script for installing and mounting parallelstore instance to controller. |
| <a name="output_login_startup_script"></a> [login\_startup\_script](#output\_login\_startup\_script) | Script for installing and mounting parallelstore instance to login node. |
| <a name="output_startup_script"></a> [startup\_script](#output\_startup\_script) | Script for installing and mounting parallelstore instance. |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -e

# Parse access_points, local_mount, mount_options, user_mode from argument.
# Format mount-options string to be compatible to dfuse mount command.
# e.g. "disable-wb-cache,eq-count=8" --> --disable-wb-cache --eq-count=8.
for arg in "$@"; do
if [[ $arg == --access_points=* ]]; then
access_points="${arg#*=}"
fi
if [[ $arg == --local_mount=* ]]; then
local_mount="${arg#*=}"
fi
if [[ $arg == --mount_options=* ]]; then
mount_options="${arg#*=}"
mount_options="--${mount_options//,/ --}"
fi
if [[ $arg == --user_mode=* ]]; then
user_mode="${arg#*=}"
fi
done

## Install the DAOS client library
## The following commands should be executed on each client vm.
# For Rocky linux 8.
if grep -q "ID=\"rocky\"" /etc/os-release && lsb_release -rs | grep -q "8\.[0-9]"; then

# 1) Add the Parallelstore package repository
tee /etc/yum.repos.d/parallelstore-v2-4-el8.repo <<EOF
[parallelstore-v2-4-el8]
name=Parallelstore EL8 v2.4
baseurl=https://us-central1-yum.pkg.dev/projects/parallelstore-packages/v2-4-el8
enabled=1
repo_gpgcheck=0
gpgcheck=0
EOF
dnf makecache

# 2) Install daos-client
dnf install -y epel-release # needed for capstone
dnf install -y daos-client

# 3) Upgrade libfabric
dnf upgrade -y libfabric

else
echo "Unsupported operating system. This script only supports Rocky Linux 8."
exit 1
fi

# Edit agent config
daos_config=/etc/daos/daos_agent.yml
sed -i "s/#.*transport_config/transport_config/g" $daos_config
sed -i "s/#.*allow_insecure:.*false/ allow_insecure: true/g" $daos_config
sed -i "s/.*access_points.*/access_points: $access_points/g" $daos_config

# Start service
systemctl start daos_agent.service

# Mount parallelstore instance to client vm.
mkdir -p "$local_mount"
chmod 777 "$local_mount"

## For single-user mode, mount container with slurm user access.
if [[ $user_mode == single-user ]]; then
sudo -u slurm dfuse -m "$local_mount" --pool default-pool --container default-container "$mount_options"
fi

## Mount container for multi-user.
if [[ $user_mode == multi-user ]]; then
fuse_config=/etc/fuse.conf
sed -i "s/#.*user_allow_other/user_allow_other/g" $fuse_config
dfuse -m "$local_mount" --pool default-pool --container default-container --"$user_mode" "$mount_options"
fi

exit 0
63 changes: 63 additions & 0 deletions modules/file-system/parallelstore/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

locals {
labels = merge(var.labels, { ghpc_module = "parallelstore", ghpc_role = "file-system" })
}

locals {
access_points = google_parallelstore_instance.instance.access_points

mount_runner = [{
"type" = "shell"
"source" = "${path.module}/install_and_mount_parallelstore.sh"
"args" = format("--access_points=%s --local_mount=%s --mount_options=%s --user_mode=%s", jsonencode(local.access_points), var.local_mount, var.mount_options, var.user_mode)
"destination" = "mount.sh"
}]
}

resource "random_id" "resource_name_suffix" {
byte_length = 4
}

resource "google_parallelstore_instance" "instance" {
instance_id = var.name != null ? var.name : "${var.deployment_name}-${random_id.resource_name_suffix.hex}"
location = var.zone
capacity_gib = var.size_gb
network = var.network_id

provider = google-beta
depends_on = [var.private_vpc_connection_peering]
}

module "startup_script" {
source = "github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script?ref=v1.34.0&depth=1"

labels = local.labels
project_id = var.project_id
deployment_name = var.deployment_name
region = var.region
runners = local.mount_runner
}

resource "null_resource" "hydration" {
count = var.source_gcs_bucket_uri != "" ? 1 : 0

depends_on = [resource.google_parallelstore_instance.instance]
provisioner "local-exec" {
command = "curl -X POST -H \"Content-Type: application/json\" -H \"Authorization: Bearer $(gcloud auth print-access-token)\" -d '{\"source_gcs_bucket\": {\"uri\":\"${var.source_gcs_bucket_uri}\"}, \"destination_parallelstore\": {\"path\":\"${var.destination_hydration_parallelstore}\"}}' https://parallelstore.googleapis.com/v1beta/projects/${var.project_id}/locations/${var.zone}/instances/${var.name}:importData"
}
}
35 changes: 35 additions & 0 deletions modules/file-system/parallelstore/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

output "startup_script" {
description = "Script for installing and mounting parallelstore instance."
value = module.startup_script.startup_script
}

output "controller_startup_script" {
description = "Script for installing and mounting parallelstore instance to controller."
value = module.startup_script.startup_script
}

output "login_startup_script" {
description = "Script for installing and mounting parallelstore instance to login node."
value = module.startup_script.startup_script
}

output "compute_startup_script" {
description = "Script for installing and mounting parallelstore instance to compute node."
value = module.startup_script.startup_script
}
Loading

0 comments on commit bfe86aa

Please sign in to comment.