Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move GKE Modules to Core #2758

Merged
merged 9 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1371,7 +1371,7 @@ secondary IP ranges defined.
The `gke-job-template` module is used to create a job file that can be submitted
to the cluster using `kubectl` and will run on the specified node pool.

[hpc-gke.yaml]: ../community/examples/hpc-gke.yaml
[hpc-gke.yaml]: ../examples/hpc-gke.yaml

### [ml-gke.yaml] ![community-badge] ![experimental-badge]

Expand All @@ -1390,7 +1390,7 @@ Toolkit. It includes:
Example settings for a2 look like:

```yaml
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
disk_type: pd-balanced
Expand Down Expand Up @@ -1438,7 +1438,7 @@ guest_accelerator:
Once you have deployed the blueprint, follow output instructions to _fetch
credentials for the created cluster_ and _submit a job calling `nvidia_smi`_.

[ml-gke.yaml]: ../community/examples/ml-gke.yaml
[ml-gke.yaml]: ../examples/ml-gke.yaml
[`kubernetes-operations`]: ../community/modules/scripts/kubernetes-operations/README.md

### [storage-gke.yaml] ![community-badge] ![experimental-badge]
Expand Down Expand Up @@ -1470,7 +1470,7 @@ cleaned up when the job is deleted.
> `--vars authorized_cidr=<your-ip-address>/32`.** You can use a service like
> [whatismyip.com](https://whatismyip.com) to determine your IP address.

[storage-gke.yaml]: ../community/examples/storage-gke.yaml
[storage-gke.yaml]: ../examples/storage-gke.yaml

### [htc-htcondor.yaml] ![community-badge] ![experimental-badge]

Expand Down
6 changes: 3 additions & 3 deletions community/examples/hpc-gke.yaml → examples/hpc-gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,18 @@ deployment_groups:
ip_cidr_range: 10.0.32.0/20

- id: gke_cluster
source: community/modules/scheduler/gke-cluster
source: modules/scheduler/gke-cluster
use: [network1]
settings:
enable_private_endpoint: false # Allows for access from authorized public IPs
outputs: [instructions]

- id: compute_pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]

- id: job-template
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use: [compute_pool]
settings:
image: busybox
Expand Down
6 changes: 3 additions & 3 deletions community/examples/ml-gke.yaml → examples/ml-gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ deployment_groups:
ip_cidr_range: 10.0.32.0/20

- id: gke_cluster
source: community/modules/scheduler/gke-cluster
source: modules/scheduler/gke-cluster
use: [network1]
settings:
enable_private_endpoint: false # Allows for access from authorized public IPs
Expand All @@ -51,14 +51,14 @@ deployment_groups:
outputs: [instructions]

- id: g2_pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
disk_type: pd-balanced
machine_type: g2-standard-4

- id: job_template
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use: [g2_pool]
settings:
image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ deployment_groups:
ip_cidr_range: 10.0.32.0/20

- id: gke_cluster
source: community/modules/scheduler/gke-cluster
source: modules/scheduler/gke-cluster
use: [network1]
settings:
enable_filestore_csi: true
Expand All @@ -52,7 +52,7 @@ deployment_groups:
outputs: [instructions]

- id: debug_pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
name: debug
Expand All @@ -69,7 +69,7 @@ deployment_groups:
force_destroy: true

- id: data-bucket-pv
source: community/modules/file-system/gke-persistent-volume
source: modules/file-system/gke-persistent-volume
use: [gke_cluster, data-bucket]
settings: {capacity_gb: 5000}

Expand All @@ -81,13 +81,13 @@ deployment_groups:
settings: {local_mount: /shared}

- id: shared-filestore-pv
source: community/modules/file-system/gke-persistent-volume
source: modules/file-system/gke-persistent-volume
use: [gke_cluster, filestore]

### Shared Storage Job ###

- id: shared-fs-job
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use:
- gke_cluster
- debug_pool
Expand Down Expand Up @@ -117,15 +117,15 @@ deployment_groups:
### Ephemeral Storage ###

- id: local-ssd-pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
name: local-ssd
machine_type: n2d-standard-2
local_ssd_count_ephemeral_storage: 1

- id: ephemeral-storage-job
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use: [local-ssd-pool]
settings:
name: ephemeral-storage-job
Expand Down
20 changes: 10 additions & 10 deletions modules/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ Modules that are still in development and less stable are labeled with the
Creates a TPU nodeset to be used by the [schedmd-slurm-gcp-v6-partition] module.
* **[schedmd-slurm-gcp-v6-nodeset-dynamic]** ![community-badge] ![experimental-badge]:
Creates a dynamic nodeset to be used by the [schedmd-slurm-gcp-v6-partition] module and instance template.
* **[gke-node-pool]** ![community-badge] ![experimental-badge] : Creates a
* **[gke-node-pool]** ![core-badge] ![experimental-badge] : Creates a
Kubernetes node pool using GKE.
* **[gke-job-template]** ![community-badge] ![experimental-badge] : Creates a
* **[gke-job-template]** ![core-badge] ![experimental-badge] : Creates a
Kubernetes job file to be used with a [gke-node-pool].
* **[htcondor-execute-point]** ![community-badge] ![experimental-badge] :
Manages a group of execute points for use in an [HTCondor
Expand All @@ -61,8 +61,8 @@ Modules that are still in development and less stable are labeled with the
Notebook. Primarily used for [FSI - MonteCarlo Tutorial][fsi-montecarlo-on-batch-tutorial].

[vm-instance]: compute/vm-instance/README.md
[gke-node-pool]: ../community/modules/compute/gke-node-pool/README.md
[gke-job-template]: ../community/modules/compute/gke-job-template/README.md
[gke-node-pool]: ../modules/compute/gke-node-pool/README.md
[gke-job-template]: ../modules/compute/gke-job-template/README.md
[schedmd-slurm-gcp-v5-partition]: ../community/modules/compute/schedmd-slurm-gcp-v5-partition/README.md
[schedmd-slurm-gcp-v5-node-group]: ../community/modules/compute/schedmd-slurm-gcp-v5-node-group/README.md
[schedmd-slurm-gcp-v6-partition]: ../community/modules/compute/schedmd-slurm-gcp-v6-partition/README.md
Expand Down Expand Up @@ -104,7 +104,7 @@ Modules that are still in development and less stable are labeled with the
* **[Intel-DAOS]** ![community-badge] : Creates
a [DAOS](https://docs.daos.io/) file system.
* **[cloud-storage-bucket]** ![community-badge] ![experimental-badge] : Creates a Google Cloud Storage (GCS) bucket.
* **[gke-persistent-volume]** ![community-badge] ![experimental-badge] : Creates persistent volumes and persistent volume claims for shared storage.
* **[gke-persistent-volume]** ![core-badge] ![experimental-badge] : Creates persistent volumes and persistent volume claims for shared storage.
* **[nfs-server]** ![community-badge] ![experimental-badge] : Creates a VM and
configures an NFS server that can be mounted by other VM.

Expand All @@ -115,7 +115,7 @@ Modules that are still in development and less stable are labeled with the
[intel-daos]: ../community/modules/file-system/Intel-DAOS/README.md
[nfs-server]: ../community/modules/file-system/nfs-server/README.md
[cloud-storage-bucket]: ../community/modules/file-system/cloud-storage-bucket/README.md
[gke-persistent-volume]: ../community/modules/file-system/gke-persistent-volume/README.md
[gke-persistent-volume]: ../modules/file-system/gke-persistent-volume/README.md

### Monitoring

Expand Down Expand Up @@ -189,9 +189,9 @@ Pub/Sub subscription. Primarily used for [FSI - MonteCarlo Tutorial][fsi-monteca
template that works with other Toolkit modules.
* **[batch-login-node]** ![core-badge] : Creates a VM that can be used for
submission of Google Cloud Batch jobs.
* **[gke-cluster]** ![community-badge] ![experimental-badge] : Creates a
* **[gke-cluster]** ![core-badge] ![experimental-badge] : Creates a
Kubernetes cluster using GKE.
* **[pre-existing-gke-cluster]** ![community-badge] ![experimental-badge] : Retrieves an existing GKE cluster. Substitute for ([gke-cluster]) module.
* **[pre-existing-gke-cluster]** ![core-badge] ![experimental-badge] : Retrieves an existing GKE cluster. Substitute for ([gke-cluster]) module.
* **[schedmd-slurm-gcp-v5-controller]** ![community-badge] :
Creates a Slurm controller node using [slurm-gcp-version-5].
* **[schedmd-slurm-gcp-v5-login]** ![community-badge] :
Expand All @@ -217,8 +217,8 @@ Pub/Sub subscription. Primarily used for [FSI - MonteCarlo Tutorial][fsi-monteca

[batch-job-template]: ../modules/scheduler/batch-job-template/README.md
[batch-login-node]: ../modules/scheduler/batch-login-node/README.md
[gke-cluster]: ../community/modules/scheduler/gke-cluster/README.md
[pre-existing-gke-cluster]: ../community/modules/scheduler/pre-existing-gke-cluster/README.md
[gke-cluster]: ../modules/scheduler/gke-cluster/README.md
[pre-existing-gke-cluster]: ../modules/scheduler/pre-existing-gke-cluster/README.md
[htcondor-setup]: ../community/modules/scheduler/htcondor-setup/README.md
[htcondor-pool-secrets]: ../community/modules/scheduler/htcondor-pool-secrets/README.md
[htcondor-access-point]: ../community/modules/scheduler/htcondor-access-point/README.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The following example creates a GKE job template file.

```yaml
- id: job-template
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use: [compute_pool]
settings:
node_count: 3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The following example creates a GKE node group.

```yaml
- id: compute_pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
```

Expand Down Expand Up @@ -83,7 +83,7 @@ fixed number of attached GPUs, let's call these machine types as "pre-defined gp

```yaml
- id: simple-a2-pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
machine_type: a2-highgpu-1g
Expand All @@ -109,7 +109,7 @@ an A100 GPU:

```yaml
- id: multi-instance-gpu-pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
machine_type: a2-highgpu-1g
Expand All @@ -125,7 +125,7 @@ The following is an example of

```yaml
- id: time-sharing-gpu-pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
machine_type: a2-highgpu-1g
Expand All @@ -140,7 +140,7 @@ Following is an example of using a GPU attached to an `n1` machine:

```yaml
- id: t4-pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [gke_cluster]
settings:
machine_type: n1-standard-16
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The following example creates a Filestore and then uses the

```yaml
- id: gke_cluster
source: community/modules/scheduler/gke-cluster
source: modules/scheduler/gke-cluster
use: [network1]
settings:
master_authorized_networks:
Expand All @@ -34,11 +34,11 @@ The following example creates a Filestore and then uses the
local_mount: /data

- id: datafs-pv
source: community/modules/file-system/gke-persistent-volume
source: modules/file-system/gke-persistent-volume
use: [datafs, gke_cluster]

- id: job-template
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use: [datafs-pv, compute_pool]
```

Expand All @@ -48,7 +48,7 @@ The following example creates a GCS bucket and then uses the

```yaml
- id: gke_cluster
source: community/modules/scheduler/gke-cluster
source: modules/scheduler/gke-cluster
use: [network1]
settings:
master_authorized_networks:
Expand All @@ -61,11 +61,11 @@ The following example creates a GCS bucket and then uses the
local_mount: /data

- id: datafs-pv
source: community/modules/file-system/gke-persistent-volume
source: modules/file-system/gke-persistent-volume
use: [data-bucket, gke_cluster]

- id: job-template
source: community/modules/compute/gke-job-template
source: modules/compute/gke-job-template
use: [datafs-pv, compute_pool, gke_cluster]
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ requirements.
ip_cidr_range: 10.0.32.0/20

- id: gke_cluster
source: community/modules/scheduler/gke-cluster
source: modules/scheduler/gke-cluster
use: [network1]
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ GKE node pool will be created.

```yaml
- id: existing-gke-cluster
source: community/modules/scheduler/pre-existing-gke-cluster
source: modules/scheduler/pre-existing-gke-cluster
settings:
project_id: $(vars.project_id)
cluster_name: my-gke-cluster
region: us-central1

- id: compute_pool
source: community/modules/compute/gke-node-pool
source: modules/compute/gke-node-pool
use: [existing-gke-cluster]
```

Expand Down
4 changes: 2 additions & 2 deletions pkg/modulereader/metadata_legacy.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,10 @@ func defaultAPIList(source string) []string {
"iam.googleapis.com",
"storage.googleapis.com",
},
"community/modules/compute/gke-node-pool": {
"modules/compute/gke-node-pool": {
"container.googleapis.com",
},
"community/modules/scheduler/gke-cluster": {
"modules/scheduler/gke-cluster": {
"container.googleapis.com",
},
"modules/scheduler/batch-job-template": {
Expand Down
2 changes: 1 addition & 1 deletion tools/cloud-build/daily-tests/builds/gke-storage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ steps:
cd /workspace && make
BUILD_ID_FULL=$BUILD_ID
BUILD_ID_SHORT=$${BUILD_ID_FULL:0:6}
SG_EXAMPLE=community/examples/storage-gke.yaml
SG_EXAMPLE=examples/storage-gke.yaml

# adding vm to act as remote node
echo ' - id: remote-node' >> $${SG_EXAMPLE}
Expand Down
4 changes: 2 additions & 2 deletions tools/cloud-build/daily-tests/builds/gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ steps:
cd /workspace && make
BUILD_ID_FULL=$BUILD_ID
BUILD_ID_SHORT=$${BUILD_ID_FULL:0:6}
SG_EXAMPLE=community/examples/hpc-gke.yaml
SG_EXAMPLE=examples/hpc-gke.yaml

# adding vm to act as remote node
echo ' - id: remote-node' >> $${SG_EXAMPLE}
Expand All @@ -47,7 +47,7 @@ steps:
echo ' zone: us-central1-a' >> $${SG_EXAMPLE}

echo ' - id: ubuntu_pool' >> $${SG_EXAMPLE}
echo ' source: community/modules/compute/gke-node-pool' >> $${SG_EXAMPLE}
echo ' source: modules/compute/gke-node-pool' >> $${SG_EXAMPLE}
echo ' use: [gke_cluster]' >> $${SG_EXAMPLE}
echo ' settings: {name: ubuntu, image_type: UBUNTU_CONTAINERD}' >> $${SG_EXAMPLE}

Expand Down
4 changes: 2 additions & 2 deletions tools/cloud-build/daily-tests/builds/ml-gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ steps:
cd /workspace && make
BUILD_ID_FULL=$BUILD_ID
BUILD_ID_SHORT=$${BUILD_ID_FULL:0:6}
SG_EXAMPLE=community/examples/ml-gke.yaml
SG_EXAMPLE=examples/ml-gke.yaml

# adding vm to act as remote node
echo ' - id: remote-node' >> $${SG_EXAMPLE}
Expand All @@ -47,7 +47,7 @@ steps:
echo ' zone: asia-southeast1-b' >> $${SG_EXAMPLE}

echo ' - id: ubuntu_pool' >> $${SG_EXAMPLE}
echo ' source: community/modules/compute/gke-node-pool' >> $${SG_EXAMPLE}
echo ' source: modules/compute/gke-node-pool' >> $${SG_EXAMPLE}
echo ' use: [gke_cluster]' >> $${SG_EXAMPLE}
echo ' settings: {name: ubuntu, image_type: UBUNTU_CONTAINERD}' >> $${SG_EXAMPLE}

Expand Down
Loading
Loading