Skip to content

Latest commit

 

History

History
271 lines (186 loc) · 12.7 KB

File metadata and controls

271 lines (186 loc) · 12.7 KB

DAOS Cluster Example

This example Terraform configuration demonstrates how to use the DAOS Terraform Modules to deploy a DAOS cluster consisting of servers and clients.

The current version of the daos_server Terraform module does not yet support automation of the following administration tasks

  • storage format
  • pool creation
  • container creation
  • mounting container

These steps must be performed manually by an administrator after the DAOS Server and Client instances have been deployed with Terraform.

Instructions for performing the manual steps will be provided in the documentation for this example.

Setup

The following steps must be performed prior to running this example.

  1. Set defaults for Google Cloud CLI (gcloud)
  2. Create a Packer image in your GCP project
  3. Build DAOS Server and Client images

If you have not completed these steps yet, click the button below to open an interactive walkthrough in Cloud Shell. After completing the walkthrough your GCP project will contain the images required to run this Terraform example.

DAOS on GCP Setup

Deploy a DAOS cluster with this example

Click the button below to open a Cloud Shell tutorial that uses this example to deploy a DAOS Cluster in GCP.

After completing the tutorial you will have a basic understanding of how to use the DAOS Terraform Modules in your own Terraform configurations as well as how to perform basic administration steps on the DAOS instances after they are deployed.

DAOS on GCP Setup

Terraform Files

List of Terraform files in this example

Filename Description
main.tf Main Terrform configuration file containing resource definitions
variables.tf Variable definitions for variables used in main.tf
versions.tf Provider definitions
terraform.tfvars.perf.example Pre-Configured set of set of variables focused on performance
terraform.tfvars.tco.example Pre-Configured set of set of variables focused on lower total cost of ownership

Create a terraform.tfvars file

Before you run terraform apply to deploy a DAOS cluster with this example you need to create a terraform.tfvars file in the terraform/examples/daos_cluster directory.

The terraform.tfvars file will contain the variable values that are used by the main.tf configuration file.

To ensure a successful deployment of a DAOS cluster there are two terraform.tfvars.*.example files that you can choose from.

You will need to decide which of these files you will copy to terraform.tfvars.

The terraform.tfvars.tco.example file

The terraform.tfvars.tco.example contains variables for a cluster deployment with 16 DAOS Clients, 4 DAOS Servers with 16 375GB NVMe SSDs per server.

To use the terraform.tfvars.tco.example file run

cp terraform.tfvars.tco.example terraform.tfvars

The terraform.tfvars.perf.example file

The terraform.tfvars.perf.example contains variables for a cluster deployment with 16 DAOS Clients, 4 DAOS Servers with 4 375GB NVMe SSDs per server.

To use the terraform.tfvars.perf.example file run

cp terraform.tfvars.perf.example terraform.tfvars

Update terraform.tfvars with your project id

Now that you have a terraform.tfvars file you need to replace the <project_id> placeholder in the file with your project id.

To update the project id in terraform.tfvars run

PROJECT_ID=$(gcloud config list --format 'value(core.project)')
sed -i "s/<project_id>/${PROJECT_ID}/g" terraform.tfvars

Deploy the DAOS cluster with the example Terraform configuration

Billing Notification!

Running this example will incur charges in your project.

To avoid surprises, be sure to monitor your costs associated with running this example.

Don't forget to shut down the DAOS cluster with terraform destroy when you are finished.

To deploy the DAOS cluster

cd terraform/examples/daos_cluster
terraform init -input=false
terraform plan -out=tfplan -input=false
terraform apply -input=false tfplan

Perform DAOS administration tasks

After your DAOS cluster has been deployed you can log into the first DAOS client instance to perform administrative tasks.

Log into the first DAOS client instance

Find the name and IP of the first client instance

gcloud compute instances list --filter="name ~ daos-client.*-0001" --format="value(name,INTERNAL_IP)"

Let's assume the name of the first client is daos-client-0001

Log into the first client instance

gcloud compute ssh daos-client-0001

Format Storage

Format the storage system.

dmg storage format

Upon successful format, DAOS Control Servers will start DAOS I/O engines that have been specified in the server config file.

For more information see the Storage Formatting section in the Administration Guide

Create a Pool

Now that the system has been formatted a Pool can be created.

Check free NVMe storage.

dmg storage query usage

This will return something like

Hosts            SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used
-----            --------- -------- -------- ---------- --------- ---------
daos-server-0001 107 GB    107 GB   0 %      3.2 TB     3.2 TB    0 %

In the example output above there is one server with a total of 3.2TB of free space.

With that information you know you can create a 3TB pool.

Create the pool.

dmg pool create -z 3TB -t 3 -u ${USER} --label=daos_pool

For more information about pools see

Create a Container

Create a container in the pool

daos container create --type=POSIX --properties=rf:0 --label=daos_cont daos_pool

For more information about containers see https://docs.daos.io/latest/overview/storage/#daos-container

Mount

Mount the storage with dfuse

MOUNT_DIR="/tmp/daos_test1"
mkdir -p "${MOUNT_DIR}"
dfuse --singlethread --pool=daos_pool --container=daos_cont --mountpoint="${MOUNT_DIR}"
df -h -t fuse.daos

You can now store files in the DAOS container mounted on /tmp/daos_test1.

Remove DAOS cluster deployment

To destroy the DAOS cluster run

terraform destroy

This will shut down all DAOS server and client instances.

Terraform Documentation for this Example

Requirements

Name Version
terraform >= 0.14.5
google >= 3.54.0

Providers

No providers.

Modules

Name Source Version
daos_client ../../modules/daos_client n/a
daos_server ../../modules/daos_server n/a

Resources

No resources.

Inputs

Name Description Type Default Required
client_instance_base_name MIG instance base names to use string null no
client_labels Set of key/value label pairs to assign to daos-client instances any {} no
client_machine_type GCP machine type. e.g. e2-medium string null no
client_mig_name MIG name string null no
client_number_of_instances Number of daos servers to bring up number null no
client_os_disk_size_gb OS disk size in GB number 20 no
client_os_disk_type OS disk type e.g. pd-ssd, pd-standard string "pd-ssd" no
client_os_family OS GCP image family string null no
client_os_project OS GCP image project name string null no
client_preemptible If preemptible client instances string true no
client_template_name MIG template name string null no
network GCP network to use string "default" no
project_id The GCP project to use string null no
region The GCP region to create and test resources in string null no
server_daos_crt_timeout crt_timeout number null no
server_daos_disk_count Number of local ssd's to use number null no
server_daos_scm_size scm_size number null no
server_instance_base_name MIG instance base names to use string null no
server_labels Set of key/value label pairs to assign to daos-server instances any {} no
server_machine_type GCP machine type. e.g. e2-medium string null no
server_mig_name MIG name string null no
server_number_of_instances Number of daos servers to bring up number null no
server_os_disk_size_gb OS disk size in GB number 20 no
server_os_disk_type OS disk type e.g. pd-ssd, pd-standard string "pd-ssd" no
server_os_family OS GCP image family string null no
server_os_project OS GCP image project name string null no
server_preemptible If preemptible server instances string true no
server_template_name MIG template name string null no
subnetwork GCP sub-network to use string "default" no
subnetwork_project The GCP project where the subnetwork is defined string null no
zone The GCP zone to create and test resources in string null no

Outputs

No outputs.