diff --git a/README.md b/README.md index 333f140..c389d06 100644 --- a/README.md +++ b/README.md @@ -12,16 +12,16 @@ This repository contains: ``` . -├── docs Miscellaneousc documentation and Cloud Shell tutorials +├── docs Documentation and Cloud Shell tutorials ├── images Cloud Build config files and Packer templates -│ └── scripts Scripts that Packer runs to configure images +│ └── scripts Scripts that Packer runs to build images ├── terraform Terraform content │ ├── examples Examples that demonstrate how to use the DAOS Terraform modules │ └── modules Terraform modules for deploying DAOS server and client instances └── tools Tools used by pre-commit ``` -### Prerequisites +## Prerequisites In order to deploy DAOS on GCP you will need @@ -37,61 +37,84 @@ In order to deploy DAOS on GCP you will need The documentation in this repository assumes that you will use [Cloud Shell](https://cloud.google.com/shell). - If you use [Cloud Shell](https://cloud.google.com/shell), you do not need to install any software on your system. + With [Cloud Shell](https://cloud.google.com/shell), there is no need to install any software on your system. If you do not want to use Cloud Shell, you will need to install - [Git](https://git-scm.com/) - [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) - [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli) -### Deploy DAOS on GCP +## Deploying DAOS on GCP -Steps to deploy DAOS on GCP +### Pre-Deployment Steps + +The following pre-deployment steps are required 1. **Set defaults for Google Cloud CLI (```gcloud```)** Only needs to be done once in your shell (Cloud Shell or local shell). -2. **Create a Packer image in your GCP project** +2. **Enable service APIs and grant permissions** + + Enabling APIs and granting service account permissions only needs to be done once for a GCP project. + +3. **Create a Packer image in your GCP project** In order to build DAOS images with Cloud Build your GCP project must contain a Packer image. - Building the Packer images only needs to be done once for a GCP project. + Building the Packer image only needs to be done once for a GCP project. -3. **Build DAOS Server and Client images** +4. **Build DAOS Server and Client images** DAOS Server and Client instances are deployed using images that have DAOS pre-installed. - Therefore, the images need to be built prior to running Terraform. + Therefore, the images need to be built prior to running Terraform to deploy a DAOS cluster. + +Click the button below to open a Cloud Shell tutorial which will guide you through the pre-deployment steps listed above. If you lose your Cloud Shell session you can always come back to this README and click the button again. + +[![DAOS on GCP Pre-Deployment](http://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/daos-stack/google-cloud-daos&cloudshell_git_branch=main&shellonly=true&tutorial=docs/tutorials/pre-deployment.md) + +### Deploy a DAOS Cluster with Terraform - > Click the button to open an interactive walk-through in Cloud Shell which will guide you through the steps listed above. - > - > [![DAOS on GCP Setup](http://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/daos-stack/google-cloud-daos&cloudshell_git_branch=main&shellonly=true&tutorial=docs/tutorials/daosgcp_setup.md) +After completing the pre-deployment steps listed above, you will need to write your own Terraform configuration for your particular use case. -4. **Use DAOS Terraform modules in your Terraform code** +The [terraform/modules](terraform/modules) in this repo can be used in your Terraform configuration to deploy DAOS server and client instances. - You will need to write your own Terraform code for your particular use case. +The [terraform/examples/daos_cluster](terraform/examples/daos_cluster/README.md) example serves as both a reference and a quick way to deploy a DAOS cluster. - Your Terraform code can use the modules in ```terraform/modules``` to deploy DAOS server and client instances. +Click the button below to open a Cloud Shell tutorial that will walk you through using the [terraform/examples/daos_cluster](terraform/examples/daos_cluster/README.md) example to deploy a DAOS cluster. - The example Terraform configurations provided in ```terraform/examples``` can be used as a reference. +[![DAOS Cluster Example](http://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/daos-stack/google-cloud-daos&cloudshell_git_branch=main&shellonly=true&tutorial=docs/tutorials/example_daos_cluster.md) - See the [DAOS Cluster](terraform/examples/daos_cluster/README.md) example to learn more about how to use the ```terraform/modules```. +### Deploy a DAOS Cluster with the Google HPC Toolkit + +The [HPC Toolkit](https://github.com/GoogleCloudPlatform/hpc-toolkit) is an open-source software offered by Google Cloud which makes it easy for customers to deploy HPC environments on Google Cloud. + +The HPC Toolkit allows customers to deploy turnkey HPC environments (compute, networking, storage, etc) following Google Cloud best-practices, in a repeatable manner. It is designed to be highly customizable and extensible, and intends to address the HPC deployment needs of a broad range of customers. + +The HPC Toolkit includes the following community examples which use the Terraform modules in this repository. + +| HPC Toolkit Community Example | Description | +| ----------------------------- | ----------- | +| [DAOS Cluster](https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/community/examples/intel#daos-cluster) | Use the HPC Toolkit to deploy a standalone DAOS cluster | +| [DAOS Server with Slurm cluster](https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/community/examples/intel#daos-server-with-slurm-cluster) | Use the HPC Toolkit to deploy a set of DAOS servers for storage and a Slurm cluster in which the compute nodes are DAOS clients. The example demonstrates how to use DAOS storage in a Slurm job. | + +If you are just getting started with deploying DAOS on GCP, it is highly recommended to use the HPC Toolkit as it can save you a lot of time as opposed to developing your own Terraform configuration. ## Links - [Distributed Asynchronous Object Storage (DAOS)](https://docs.daos.io/) - [Google Cloud Platform (GCP)](https://cloud.google.com/) +- [Google HPC Toolkit](https://github.com/GoogleCloudPlatform/hpc-toolkit) - [Google Cloud CLI (gcloud)](https://cloud.google.com/cli) - [Google Cloud Build](https://cloud.google.com/build) - [Cloud Shell](https://cloud.google.com/shell) - [Packer](https://www.packer.io/) - [Terraform](https://www.terraform.io/) - ## Development -If you are contributing to the code in this repo, see [Development](docs/development.md) +If you are contributing to this repo, see [Development](docs/development.md) ## License diff --git a/docs/tutorials/example_daos_cluster.md b/docs/tutorials/example_daos_cluster.md index 748823d..4f8863c 100644 --- a/docs/tutorials/example_daos_cluster.md +++ b/docs/tutorials/example_daos_cluster.md @@ -1,4 +1,4 @@ -# DAOS GCP Full Cluster Example +# DAOS Cluster Example ## Overview @@ -6,26 +6,19 @@ In this tutorial you will 1. Use Terraform to deploy a DAOS cluster using the example configuration in ```terraform/examples/daos_cluster``` 2. Perform the following DAOS administration tasks - - Format Storage - - Create a Pool - - Create a Container - - Mount the storage on the clients + - Create a pool + - Create a container + - Mount the container 3. Copy files to DAOS mounted storage 4. Tear down the DAOS cluster deployment Click on **Start** -## Setup and Requirements +## Pre-Deployment Steps -Before continuing, it is assumed that you have completed the following steps in your GCP project. +Before continuing, it is assumed that you have completed the pre-deployment steps in your GCP project. -1. Set defaults for Google Cloud CLI (```gcloud```) in Cloud Shell -2. Create a Packer image in your GCP project -3. Build DAOS Server and Client images with Packer in Cloud Build - -If you have not yet completed these steps, you can open a tutorial in Cloud Shell that will guide you through each step. - -If you are not sure if you have completed these steps run +If you are not sure if you have completed the pre-deployment steps run ```bash gcloud compute images list --filter="name:daos" --format="value(name)" @@ -36,14 +29,14 @@ If you see `daos-server-*` and `daos-client-*` images, click **Next** to continu Otherwise, run another tutorial that walks you though the steps listed above. ```bash -teachme docs/tutorials/daosgcp_setup.md +teachme docs/tutorials/pre-deployment.md ``` ## The daos_cluster example -The example Terraform configuration in [terraform/examples/daos_cluster](https://github.com/daos-stack/google-cloud-daos/tree/main/terraform/examples/daos_cluster) demonstrates how the [DAOS Terraform Modules](https://github.com/daos-stack/google-cloud-daos/tree/main/terraform/modules) can be used in your own Terraform code. +The [terraform/examples/daos_cluster](https://github.com/daos-stack/google-cloud-daos/tree/main/terraform/examples/daos_cluster) Terraform configuration demonstrates how the DAOS Terraform Modules in [terraform/modules](https://github.com/daos-stack/google-cloud-daos/tree/main/terraform/modules) can be used in your own Terraform configurations. -Change into the example directory now +Change into the `daos_cluster` example directory now ```bash cd terraform/examples/daos_cluster @@ -55,19 +48,18 @@ Click **Next** to continue ## Create a `terraform.tfvars` file -You need to create a `terraform.tfvars` file that contains variable values for Terraform. - -There are many variables to configure DAOS server and client configurations. Changes to certain variable values often require corresponding changes in other variable values. Depending on your use case this can become a complex topic. +Create a `terraform.tfvars` file that contains variable values for Terraform. To simplify the task of setting the proper variable values for a working DAOS cluster, there are example tfvars files that can be copied to create a `terraform.tfvars` file. -Select one of the example files to copy to `terraform.tfvars`. +You will need to select one of the example files to copy to `terraform.tfvars`. -The example tfvars files are: +The example tfvars files are 1. `terraform.tfvars.tco.example` - 16 DAOS Clients, 4 DAOS Servers with 16 375GB NVMe SSDs per server. + - 16 DAOS Clients + - 4 DAOS Servers with 16x375GB NVMe SSDs per server To use this configuration run ```bash @@ -76,7 +68,8 @@ The example tfvars files are: 2. `terraform.tfvars.perf.example` - 16 DAOS Clients, 4 DAOS Servers with 4 375GB NVMe SSDs per server. + - 16 DAOS Clients + - 4 DAOS Servers with 4x375GB NVMe SSDs per server To use this configuration run ```bash @@ -87,11 +80,7 @@ Click **Next** to continue ## Modify `terraform.tfvars` -Now that you have created a `terraform.tfvars` file, there is one change that needs to be made in the file. - -You need to replace the `` placeholder with your project id. - -To replace the `` placeholder run +Replace the `` placeholder `terraform.tfvars` file by running ```bash PROJECT_ID=$(gcloud config list --format 'value(core.project)') @@ -130,7 +119,7 @@ The variable values are set in `terrafrom.tfvars`. Click **Next** to continue -## Run Terraform to Deploy the DAOS cluster +## Deploy the DAOS cluster You can now deploy a DAOS cluster using the `terraform/examples/daos_cluster` example configuration. @@ -152,7 +141,7 @@ Execute the actions in the plan. terraform apply -input=false tfplan ``` -List the instances that were created. +**List the instances** Terraform will create 2 [managed instance groups (MIGs)](https://cloud.google.com/compute/docs/instance-groups) that will create the DAOS server and client instances. @@ -161,20 +150,14 @@ It may take some time for the instances to become available. To see the list of instances run ```bash -gcloud compute instances list --filter="name ~ daos.*" --format="value(name,INTERNAL_IP)" +gcloud compute instances list \ + --filter="name ~ daos.*" \ + --format="value(name,INTERNAL_IP)" ``` Click **Next** to continue -## Prepare storage - -When the DAOS server and client instances are deployed the DAOS services are started but the DAOS storage is not ready to use yet. - -There are a few administrative tasks that must be performed before the DAOS storage can be used. - -The DAOS Management Tool (`dmg`) is installed on all DAOS client instances and can be used to perform administrative tasks. - -You can use `dmg` on any of the DAOS client instances. +## Log Into First Client Log into the first DAOS client instance @@ -186,35 +169,28 @@ If you are prompted to create an SSH key pair for gcloud, follow the prompts. Click **Next** to continue -## Storage Format +## Create a Pool -When the DAOS server instances are created the `daos_server` service will be started but will be in "maintenance mode". +The DAOS Management Tool `dmg` is meant to be used by administrators to manage the DAOS storage system and pools. -In order to begin using the storage you must issue a *format* command. +You will need to run `dmg` with `sudo`. -To format the storage run +Check to make sure that the DAOS storage system is ready ```bash -sudo dmg storage format sudo dmg system query -v ``` -To learn more see [Storage Formatting](https://docs.daos.io/latest/admin/deployment/#storage-formatting) - -Click **Next** to continue - -## Create pool +You should see 4 servers with a state of *Joined* -Now that the system has been formatted you can create a Pool. +
-First check to see how much free NVMe storage you have. +View free NVMe storage ```bash sudo dmg storage query usage ``` -This will return storage information for the servers. - The output looks similar to ``` @@ -226,60 +202,101 @@ daos-server-0003 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 % daos-server-0004 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 % ``` - In the example output above there are 4 servers with a total of 6.4TB of free space. -With that information you know you can safely create a 6TB pool. +Create a pool named `pool1` that uses all available space. + +```bash +sudo dmg pool create -z 6.4TB -t 3 --label=pool1 +``` -Create the pool. +View the ACLs on *pool1* ```bash -sudo dmg pool create -z 6TB -t 3 -u ${USER} --label=daos_pool +sudo dmg pool get-acl pool1 ``` -For more information about pools see +```text +# Owner: root@ +# Owner Group: root@ +# Entries: +A::OWNER@:rw +A:G:GROUP@:rw +``` + +Here we see that root owns the pool. + +Add an [ACE](https://docs.daos.io/v2.0/admin/pool_operations/#adding-and-updating-aces) that will allow any user to create a container in the pool -- [https://docs.daos.io/latest/overview/storage/#daos-pool](https://docs.daos.io/latest/overview/storage/#daos-pool) -- [https://docs.daos.io/latest/admin/pool_operations/](https://docs.daos.io/latest/admin/pool_operations/) +```bash +sudo dmg pool update-acl -e A::EVERYONE@:rcta pool1 +``` Click **Next** to continue -### Create container +### Create a Container -Now that a pool has been created, create a container in that pool +Create a container named `cont1` in the `pool1` pool. ```bash -daos container create --type=POSIX --properties=rf:0 --label=daos_cont daos_pool +daos container create \ + --type=POSIX \ + --properties=rf:0 \ + --label=cont1 pool1 ``` -For more information about containers see [https://docs.daos.io/latest/overview/storage/#daos-container](https://docs.daos.io/latest/overview/storage/#daos-container) - Click **Next** to continue -## Mount container +## Mount the Container -The container now needs to be mounted. - -To mount the container run +Create a mount point and mount the `cont1` container ```bash -MOUNT_DIR="/tmp/daos_test1" +MOUNT_DIR="${HOME}/daos/cont1" mkdir -p "${MOUNT_DIR}" -dfuse --singlethread --pool=daos_pool --container=daos_cont --mountpoint="${MOUNT_DIR}" +dfuse --singlethread \ + --pool=pool1 \ + --container=cont1 \ + --mountpoint="${MOUNT_DIR}" +``` + +Verify that the container is mounted + +```bash df -h -t fuse.daos ``` -Your DAOS storage is now ready! +Click **Next** to continue + +## Use DAOS Storage -You can now store files in `/tmp/daos_test1` +The `cont1` container is now mounted on `${HOME}/daos/cont1` + +Create a 20GiB file which will be stored in the DAOS filesystem. + +```bash +pushd ${HOME}/daos/cont1 +time LD_PRELOAD=/usr/lib64/libioil.so \ +dd if=/dev/zero of=./test20GiB.img iflag=fullblock bs=1G count=20 +``` Click **Next** to continue -## Shutting it down +## Unmount the Container + +Unmount the container before logging out of the daos-client-0001 instance. + +```bash +popd +fusermount -u ${HOME}/daos/cont1 +logout +``` + +Click **Next** to continue -If you are still logged into the first DAOS client instance, log out now. +## Shut Down the DAOS Cluster -To shut down the DAOS cluster run +Destroy all resources created by Terraform ```bash terraform destroy @@ -293,14 +310,15 @@ Click **Next** to continue You have completed a DAOS cluster deployment on GCP! -In this tutorial you used the Terraform example configuration in `terraform/examples/daos_cluster` to deploy a DAOS cluster. +The following steps were performed in this tutorial: -You then performed the following administration tasks: +1. Used the Terraform example configuration in `terraform/examples/daos_cluster` to deploy a DAOS cluster. +2. Created a container +3. Mounted the container +3. Stored a large file in the container +4. Unmounted the container +5. Used terraform to destroy all resources that were created -1. Formatted storage -2. Created a pool -3. Created a container -4. Mounted the container What's next? diff --git a/docs/tutorials/daosgcp_setup.md b/docs/tutorials/pre-deployment.md similarity index 62% rename from docs/tutorials/daosgcp_setup.md rename to docs/tutorials/pre-deployment.md index c932b60..e153549 100644 --- a/docs/tutorials/daosgcp_setup.md +++ b/docs/tutorials/pre-deployment.md @@ -1,10 +1,12 @@ -# DAOS GCP Setup +# DAOS GCP Pre-Deployment Steps -In this walkthrough you will +In this tutorial you will complete the following pre-deployment steps which are required to be done once for your GCP project. 1. Set defaults for Google Cloud CLI (```gcloud```) -2. Create a Packer image in your GCP project -3. Build DAOS Server and Client images with Packer in Cloud Build +2. Enable APIs +3. Create a Cloud NAT +4. Create a Packer image in your GCP project +5. Build DAOS Server and Client images with Packer in Cloud Build After completing this walkthrough you will be able to run Terraform to deploy DAOS Server and Client instances. @@ -31,13 +33,12 @@ The default settings are 2. region 3. zone - ### Set Default Project To set the default project run ```bash -gcloud config set project {{project-id}} +gcloud config set project ``` ### Set Default Region @@ -66,50 +67,80 @@ You have now set the necessary defaults required for the scripts and examples in Click **Next** to continue -## Create Packer Image - -DAOS images are built using [Packer](https://www.packer.io/) in [Cloud Build](https://cloud.google.com/build). - -In order to run Packer in Cloud Build you need to provision an instance from an image that has Packer installed. - -Therfore, in order to build DAOS images with Packer in Cloud Build, your GCP project must contain a Packer image. - -Creating the Packer image only needs to be done once in the GCP project. - -### Enable APIs +## Enable APIs -To enable the necessary APIs for Cloud Build run: +Enable the APIs that are used for building images and deploying DAOS instances ```bash -gcloud services enable sourcerepo.googleapis.com +gcloud services enable cloudbuild.googleapis.com gcloud services enable compute.googleapis.com +gcloud services enable networkmanagement.googleapis.com +gcloud services enable secretmanager.googleapis.com gcloud services enable servicemanagement.googleapis.com +gcloud services enable sourcerepo.googleapis.com gcloud services enable storage-api.googleapis.com ``` -### Required IAM permissions +Click **Next** to continue + +## Create a Cloud NAT + +When deploying DAOS server and client instances external IPs are not added to the instances. The instances need to use services that are not accessible on the internal VPC default network as well as the https://packages.daos.io site for installs from DAOS repos. + +Therefore, it is necessary to create a [Cloud NAT using Cloud Router](https://cloud.google.com/architecture/building-internet-connectivity-for-private-vms#create_a_nat_configuration_using_cloud_router). + +1. Create a Cloud Router instance + + ```bash + gcloud compute routers create nat-router-us-central1 \ + --network default \ + --region us-central1 + ``` + +2. Configure the router for Cloud NAT + + ```bash + gcloud compute routers nats create nat-config \ + --router-region us-central1 \ + --router nat-router-us-central1 \ + --nat-all-subnet-ip-ranges \ + --auto-allocate-nat-external-ips + ``` + +Click **Next** to continue + +## Create Packer Image + +### IAM permissions The Cloud Build service account requires the editor role. To grant the editor role to the service account run: ```bash -CLOUD_BUILD_ACCOUNT=$(gcloud projects get-iam-policy {{project-id}} --filter="(bindings.role:roles/cloudbuild.builds.builder)" --flatten="bindings[].members" --format="value(bindings.members[])") +CLOUD_BUILD_ACCOUNT=$(gcloud projects get-iam-policy --filter="(bindings.role:roles/cloudbuild.builds.builder)" --flatten="bindings[].members" --format="value(bindings.members[])") -gcloud projects add-iam-policy-binding {{project-id}} \ +gcloud projects add-iam-policy-binding \ --member "${CLOUD_BUILD_ACCOUNT}" \ --role roles/compute.instanceAdmin ``` -### Create the Packer Image +### Build Packer Image + +DAOS images are built using [Packer](https://www.packer.io/) in [Cloud Build](https://cloud.google.com/build). + +In order to run Packer in Cloud Build you need to provision an instance from an image that has Packer installed. + +Therfore, in order to build DAOS images with Packer in Cloud Build, your GCP project must contain a Packer image. + +Creating the Packer image only needs to be done once in the GCP project. Cloud Build provides the [Packer community builder image](https://github.com/GoogleCloudPlatform/cloud-builders-community/tree/master/packer). To build the Packer image run: ```bash -pushd . -cd ~/ +pushd ~/ git clone https://github.com/GoogleCloudPlatform/cloud-builders-community.git cd cloud-builders-community/packer gcloud builds submit . @@ -117,43 +148,37 @@ rm -rf ~/cloud-builders-community popd ``` -
- -You have completed the necessary steps to create your Packer image which will be used to build DAOS images with Packer in Cloud Build. Click **Next** to continue -## Build DAOS Server and Client images +## Build DAOS images In order to use Terraform to provision DAOS Server and Client instances you need to build images that have DAOS pre-installed. -To build the DAOS Server and Client instances run: +Build the DAOS Server and Client instances ```bash -pushd . -cd images +pushd images ./build_images.sh --type all popd ``` It will take a few minutes for the images to build. -Wait for the image build to complete. - Click **Next** to continue ## DAOS GCP Setup Complete -You can now begin using Terraform to provision DAOS Server and Client instances in the **{{project-id}}** project! +You can now begin using Terraform to provision DAOS Server and Client instances in the **** project! **Next Steps** - Read the terraform/modules/daos_client/README.md file - Read the terraform/modules/daos_server/README.md file - View the files in the ```terraform/examples/daos_cluster``` directory -- Open a tutorial that walks you through the process of deploying a DAOS cluster using the ```terraform/examples/daos_cluster``` example. +- Open a tutorial that walks you through the steps to deploy a DAOS cluster using the ```terraform/examples/daos_cluster``` example. ```bash - cloudshell launch-tutorial ./docs/tutorials/example_daos_cluster.md + teachme ./docs/tutorials/example_daos_cluster.md ``` diff --git a/terraform/examples/README.md b/terraform/examples/README.md index d93da8c..07a2a24 100644 --- a/terraform/examples/README.md +++ b/terraform/examples/README.md @@ -2,9 +2,7 @@ This directory includes examples of Terraform configurations for different types of [DAOS](https://docs.daos.io/) deployments in GCP. -| Subdirectory | Description | -| --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [daos_cluster](daos_cluster) | Example Terraform configuration for a DAOS cluster consisting of servers and clients | -| [io500](io500) | Example that uses custom client images that have [IO500](https://github.com/IO500/io500) installed. Uses the daos_cluster example to deploy a DAOS cluster with the IO500 client images. | -| [only_daos_client](only_daos_client) | Example Terraform configuration for DAOS clients only | -| [only_daos_server](./only_daos_server/) | Example Terraform configuration for DAOS servers only | +| Subdirectory | Description | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [daos_cluster](daos_cluster) | Example Terraform configuration for a DAOS cluster consisting of servers and clients | +| [io500](io500) | Example that uses custom client images that have [IO500](https://github.com/IO500/io500) pre-installed. Uses the daos_cluster example to deploy a DAOS cluster with the IO500 images. | diff --git a/terraform/examples/daos_cluster/README.md b/terraform/examples/daos_cluster/README.md index f69249b..ca24731 100644 --- a/terraform/examples/daos_cluster/README.md +++ b/terraform/examples/daos_cluster/README.md @@ -2,35 +2,13 @@ This example Terraform configuration demonstrates how to use the [DAOS Terraform Modules](../../modules) to deploy a DAOS cluster consisting of servers and clients. -> -> The current version of the [daos_server](../../modules/daos_server) Terraform module does not yet support automation of the following administration tasks -> -> - storage format -> - pool creation -> - container creation -> - mounting container -> -> These steps must be performed manually by an administrator after the DAOS Server and Client instances have been deployed with Terraform. -> -> Instructions for performing the manual steps will be provided in the documentation for this example. - -## Setup - -The following steps must be performed prior to running this example. - -1. Set defaults for Google Cloud CLI (```gcloud```) -2. Create a Packer image in your GCP project -3. Build DAOS Server and Client images - -If you have not completed these steps yet, click the button below to open an interactive walkthrough in [Cloud Shell](https://cloud.google.com/shell). After completing the walkthrough your GCP project will contain the images required to run this Terraform example. +## Pre-deployment -[![DAOS on GCP Setup](http://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/daos-stack/google-cloud-daos&cloudshell_git_branch=main&shellonly=true&tutorial=docs/tutorials/daosgcp_setup.md) +If you have not completed the [pre-deployment steps](../../../README.md#pre-deployment-steps) please complete those steps before continuing to run this Terraform example. -## Deploy a DAOS cluster with this example +## Quickstart in Cloudshell -Click the button below to open a [Cloud Shell](https://cloud.google.com/shell) tutorial that uses this example to deploy a DAOS Cluster in GCP. - -After completing the tutorial you will have a basic understanding of how to use the [DAOS Terraform Modules](../../modules) in your own Terraform configurations as well as how to perform basic administration steps on the DAOS instances after they are deployed. +Click the button below to run this example in a Cloudshell tutorial. The tutorial will walk through each of the steps described in this README.md file. [![DAOS on GCP Setup](http://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/daos-stack/google-cloud-daos&cloudshell_git_branch=main&shellonly=true&tutorial=docs/tutorials/example_daos_cluster.md) @@ -46,29 +24,39 @@ List of Terraform files in this example | terraform.tfvars.perf.example | Pre-Configured set of set of variables focused on performance | | terraform.tfvars.tco.example | Pre-Configured set of set of variables focused on lower total cost of ownership | -## Create a `terraform.tfvars` file +## Deploy a DAOS Cluster With This Example + +The following sections describe how to deploy a DAOS cluster with this example Terraform configuration. -Before you run `terraform apply` to deploy a DAOS cluster with this example you need to create a `terraform.tfvars` file in the `terraform/examples/daos_cluster` directory. +### Create a `terraform.tfvars` file -The `terraform.tfvars` file will contain the variable values that are used by the `main.tf` configuration file. +Before you run any `terraform` commands you need to create a `terraform.tfvars` file in the `terraform/examples/daos_cluster` directory. + +The `terraform.tfvars` file will contain the variable values for the configuration. To ensure a successful deployment of a DAOS cluster there are two `terraform.tfvars.*.example` files that you can choose from. -You will need to decide which of these files you will copy to `terraform.tfvars`. +You will need to decide which of these files to copy to `terraform.tfvars`. -### The terraform.tfvars.tco.example file +#### The terraform.tfvars.tco.example file -The `terraform.tfvars.tco.example` contains variables for a cluster deployment with 16 DAOS Clients, 4 DAOS Servers with 16 375GB NVMe SSDs per server. +The `terraform.tfvars.tco.example` contains variables for a DAOS cluster deployment with +- 16 DAOS Client instances +- 4 DAOS Server instances + Each server instance has sixteen 375GB NVMe SSDs -To use the `terraform.tfvars.tco.example` file run +To use the `terraform.tfvars.tco.example` file ```bash cp terraform.tfvars.tco.example terraform.tfvars ``` -### The terraform.tfvars.perf.example file +#### The terraform.tfvars.perf.example file -The `terraform.tfvars.perf.example` contains variables for a cluster deployment with 16 DAOS Clients, 4 DAOS Servers with 4 375GB NVMe SSDs per server. +The `terraform.tfvars.perf.example` contains variables for a DAOS cluster deployment with +- 16 DAOS Client instances +- 4 DAOS Server instances + Each server instances has four 375GB NVMe SSDs To use the ```terraform.tfvars.perf.example``` file run @@ -78,7 +66,7 @@ cp terraform.tfvars.perf.example terraform.tfvars ### Update `terraform.tfvars` with your project id -Now that you have a `terraform.tfvars` file you need to replace the `` placeholder in the file with your project id. +Now that you have a `terraform.tfvars` file you need to replace the `` placeholder in the file with your GCP project id. To update the project id in `terraform.tfvars` run @@ -87,7 +75,7 @@ PROJECT_ID=$(gcloud config list --format 'value(core.project)') sed -i "s//${PROJECT_ID}/g" terraform.tfvars ``` -## Deploy the DAOS cluster with the example Terraform configuration +### Deploy the DAOS cluster > **Billing Notification!** > @@ -100,24 +88,24 @@ sed -i "s//${PROJECT_ID}/g" terraform.tfvars To deploy the DAOS cluster ```bash -cd terraform/examples/daos_cluster -terraform init -input=false -terraform plan -out=tfplan -input=false -terraform apply -input=false tfplan +terraform init +terraform plan -out=tfplan +terraform apply tfplan ``` -## Perform DAOS administration tasks +### Perform DAOS administration tasks After your DAOS cluster has been deployed you can log into the first DAOS client instance to perform administrative tasks. -### Log into the first DAOS client instance +#### Log into the first DAOS client instance -Find the name and IP of the first client instance +Verify that the daos-client and daos-server instances are running ```bash -gcloud compute instances list --filter="name ~ daos-client.*-0001" --format="value(name,INTERNAL_IP)" +gcloud compute instances list \ + --filter="name ~ daos" \ + --format="value(name,INTERNAL_IP)" ``` -Let's assume the name of the first client is `daos-client-0001` Log into the first client instance @@ -125,20 +113,24 @@ Log into the first client instance gcloud compute ssh daos-client-0001 ``` -### Format Storage - -Format the storage system. +#### Verify that all daos-server instances have joined ```bash -sudo dmg storage format sudo dmg system query -v ``` -Upon successful format, DAOS Control Servers will start DAOS I/O engines that have been specified in the server config file. +The *State* column should display "Joined" for all servers. -For more information see the [Storage Formatting section in the Administration Guide](https://docs.daos.io/latest/admin/deployment/#storage-formatting) +``` +Rank UUID Control Address Fault Domain State Reason +---- ---- --------------- ------------ ----- ------ +0 0796c576-5651-4e37-aa15-09f333d2d2b8 10.128.0.35:10001 /daos-server-0001 Joined +1 f29f7058-8abb-429f-9fd3-8b13272d7de0 10.128.0.77:10001 /daos-server-0003 Joined +2 09fc0dab-c238-4090-b3f8-da2bd4dce108 10.128.0.81:10001 /daos-server-0002 Joined +3 2cc9140b-fb12-4777-892e-7d190f6dfb0f 10.128.0.30:10001 /daos-server-0004 Joined +``` -### Create a Pool +#### Create a Pool Check free NVMe storage. @@ -146,9 +138,7 @@ Check free NVMe storage. sudo dmg storage query usage ``` -This will return storage information for the servers. - -The output looks similar to +From the output you can see there are 4 servers each with 1.6TB free. That means there is a total of 6.4TB free. ``` Hosts SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used @@ -159,14 +149,10 @@ daos-server-0003 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 % daos-server-0004 48 GB 48 GB 0 % 1.6 TB 1.6 TB 0 % ``` -In the example output above there are 4 servers with a total of 6.4TB of free space. - -With that information you know you can safely create a 6TB pool. - -Create the pool. +Create one pool that uses the entire 6.4TB. ```bash -sudo dmg pool create -z 6TB -t 3 -u ${USER} --label=daos_pool +sudo dmg pool create -z 6.4TB -t 3 --label=pool1 ``` For more information about pools see @@ -180,25 +166,45 @@ For more information about pools see Create a container in the pool ```bash -daos container create --type=POSIX --properties=rf:0 --label=daos_cont daos_pool +daos container create --type=POSIX --properties=rf:0 --label=cont1 pool1 ``` For more information about containers see https://docs.daos.io/latest/overview/storage/#daos-container -### Mount +### Mount the container -Mount the storage with `dfuse` +Mount the container with `dfuse` ```bash -MOUNT_DIR="/tmp/daos_test1" +MOUNT_DIR="${HOME}/daos/cont1" mkdir -p "${MOUNT_DIR}" -dfuse --singlethread --pool=daos_pool --container=daos_cont --mountpoint="${MOUNT_DIR}" +dfuse --singlethread --pool=pool1 --container=cont1 --mountpoint="${MOUNT_DIR}" df -h -t fuse.daos ``` -You can now store files in the DAOS container mounted on `/tmp/daos_test1`. +You can now store files in the DAOS container mounted on `${HOME}/daos/cont1`. + +For more information about DFuse see the [DAOS FUSE section of the User Guide](https://docs.daos.io/v2.0/user/filesystem/?h=dfuse#dfuse-daos-fuse). + +### Use the Storage + +The `cont1` container is now mounted on `${HOME}/daos/cont1` + +Create a 20GiB file which will be stored in the DAOS filesystem. + +```bash +cd ${HOME}/daos/cont1 +time LD_PRELOAD=/usr/lib64/libioil.so \ + dd if=/dev/zero of=./test21G.img bs=1G count=20 +``` + +### Unmount the container -## Remove DAOS cluster deployment +```bash +fusermount -u ${HOME}/daos/cont1 +``` + +### Remove DAOS cluster deployment To destroy the DAOS cluster run @@ -208,7 +214,9 @@ terraform destroy This will shut down all DAOS server and client instances. -# Terraform Documentation for this Example +# Terraform Documentation + +Documentation for the `terraform/examples/daos_cluster` Terraform configuration. Copyright 2022 Intel Corporation @@ -282,7 +290,7 @@ No resources. | [server\_os\_disk\_type](#input\_server\_os\_disk\_type) | OS disk type ie. pd-ssd, pd-standard | `string` | `"pd-ssd"` | no | | [server\_os\_family](#input\_server\_os\_family) | OS GCP image family | `string` | `"daos-server-centos-7"` | no | | [server\_os\_project](#input\_server\_os\_project) | OS GCP image project name. Defaults to project\_id if null. | `string` | `null` | no | -| [server\_pools](#input\_server\_pools) | List of pools and containers to be created |
list(object({
name = string
size = string
tier_ratio = number
acls = list(string)
properties = map(string)
containers = list(object({
name = string
type = string
acls = list(string)
properties = map(string)
user_attributes = map(any)
}))
}))
| `[]` | no | +| [server\_pools](#input\_server\_pools) | List of pools and containers to be created |
list(object({
name = string
size = string
tier_ratio = number
user = string
group = string
acls = list(string)
properties = map(any)
containers = list(object({
name = string
type = string
user = string
group = string
acls = list(string)
properties = map(any)
user_attributes = map(any)
}))
}))
| `[]` | no | | [server\_preemptible](#input\_server\_preemptible) | If preemptible instances | `string` | `false` | no | | [server\_service\_account](#input\_server\_service\_account) | Service account to attach to the instance. See https://www.terraform.io/docs/providers/google/r/compute_instance_template.html#service_account. |
object({
email = string,
scopes = set(string)
})
|
{
"email": null,
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append",
"https://www.googleapis.com/auth/cloud-platform"
]
}
| no | | [server\_template\_name](#input\_server\_template\_name) | MIG template name | `string` | `"daos-server"` | no | @@ -292,7 +300,5 @@ No resources. ## Outputs -| Name | Description | -|------|-------------| -| [daos\_pools](#output\_daos\_pools) | Specification of pools and containers to create | +No outputs. diff --git a/terraform/examples/daos_cluster/module.json b/terraform/examples/daos_cluster/module.json index 409d532..98a0da9 100644 --- a/terraform/examples/daos_cluster/module.json +++ b/terraform/examples/daos_cluster/module.json @@ -232,7 +232,7 @@ }, { "name": "server_pools", - "type": "list(object({\n name = string\n size = string\n tier_ratio = number\n acls = list(string)\n properties = map(string)\n containers = list(object({\n name = string\n type = string\n acls = list(string)\n properties = map(string)\n user_attributes = map(any)\n }))\n }))", + "type": "list(object({\n name = string\n size = string\n tier_ratio = number\n user = string\n group = string\n acls = list(string)\n properties = map(any)\n containers = list(object({\n name = string\n type = string\n user = string\n group = string\n acls = list(string)\n properties = map(any)\n user_attributes = map(any)\n }))\n }))", "description": "List of pools and containers to be created", "default": [], "required": false @@ -305,12 +305,7 @@ "description": null } ], - "outputs": [ - { - "name": "daos_pools", - "description": "Specification of pools and containers to create" - } - ], + "outputs": [], "providers": [], "requirements": [ { diff --git a/terraform/examples/daos_cluster/variables.tf b/terraform/examples/daos_cluster/variables.tf index a6daa3e..0d789d6 100644 --- a/terraform/examples/daos_cluster/variables.tf +++ b/terraform/examples/daos_cluster/variables.tf @@ -184,8 +184,8 @@ variable "server_pools" { containers = list(object({ name = string type = string - user = string - group = string + user = string + group = string acls = list(string) properties = map(any) user_attributes = map(any) diff --git a/terraform/modules/daos_server/README.md b/terraform/modules/daos_server/README.md index 322e9bb..0319a83 100644 --- a/terraform/modules/daos_server/README.md +++ b/terraform/modules/daos_server/README.md @@ -75,7 +75,7 @@ No modules. | [os\_disk\_type](#input\_os\_disk\_type) | OS disk type ie. pd-ssd, pd-standard | `string` | `"pd-ssd"` | no | | [os\_family](#input\_os\_family) | OS GCP image family | `string` | `"daos-server-centos-7"` | no | | [os\_project](#input\_os\_project) | OS GCP image project name. Defaults to project\_id if null. | `string` | `null` | no | -| [pools](#input\_pools) | List of pools and containers to be created |
list(object({
name = string
size = string
tier_ratio = number
acls = list(string)
properties = map(string)
containers = list(object({
name = string
type = string
acls = list(string)
properties = map(string)
user_attributes = map(any)
}))
}))
| `[]` | no | +| [pools](#input\_pools) | List of pools and containers to be created |
list(object({
name = string
size = string
tier_ratio = number
user = string
group = string
acls = list(string)
properties = map(any)
containers = list(object({
name = string
type = string
user = string
group = string
acls = list(string)
properties = map(any)
user_attributes = map(any)
}))
}))
| `[]` | no | | [preemptible](#input\_preemptible) | If preemptible instances | `string` | `false` | no | | [project\_id](#input\_project\_id) | The GCP project to use | `string` | n/a | yes | | [region](#input\_region) | The GCP region to create and test resources in | `string` | n/a | yes | diff --git a/terraform/modules/daos_server/module.json b/terraform/modules/daos_server/module.json index 438a221..ca501ff 100644 --- a/terraform/modules/daos_server/module.json +++ b/terraform/modules/daos_server/module.json @@ -116,7 +116,7 @@ }, { "name": "pools", - "type": "list(object({\n name = string\n size = string\n tier_ratio = number\n acls = list(string)\n properties = map(string)\n containers = list(object({\n name = string\n type = string\n acls = list(string)\n properties = map(string)\n user_attributes = map(any)\n }))\n }))", + "type": "list(object({\n name = string\n size = string\n tier_ratio = number\n user = string\n group = string\n acls = list(string)\n properties = map(any)\n containers = list(object({\n name = string\n type = string\n user = string\n group = string\n acls = list(string)\n properties = map(any)\n user_attributes = map(any)\n }))\n }))", "description": "List of pools and containers to be created", "default": [], "required": false diff --git a/terraform/modules/daos_server/variables.tf b/terraform/modules/daos_server/variables.tf index 9c3a748..60a5497 100644 --- a/terraform/modules/daos_server/variables.tf +++ b/terraform/modules/daos_server/variables.tf @@ -180,8 +180,8 @@ variable "pools" { containers = list(object({ name = string type = string - user = string - group = string + user = string + group = string acls = list(string) properties = map(any) user_attributes = map(any) diff --git a/tools/autodoc/cloudshell_urls.sh b/tools/autodoc/cloudshell_urls.sh index 304812a..a6e6819 100755 --- a/tools/autodoc/cloudshell_urls.sh +++ b/tools/autodoc/cloudshell_urls.sh @@ -20,7 +20,7 @@ # This script will update "Open in Google Cloud Shell" in all *.md files. # Before merging from the develop branch to main run # -# ./cloudshell_urls.sh main +# ./cloudshell_urls.sh -b main -r https://github.com/daos-stack/google-cloud-daos # set -e