Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Add GPU Accelerator to Node Pool part of Regional Cluster #263

Closed
DanielWozniak opened this issue Sep 18, 2019 · 12 comments
Closed

Can't Add GPU Accelerator to Node Pool part of Regional Cluster #263

DanielWozniak opened this issue Sep 18, 2019 · 12 comments
Labels
enhancement New feature or request

Comments

@DanielWozniak
Copy link

I currently have a regional cluster. For this cluster I would like to create a new node pool which contains a GPU accelerator. The problem with this is that in the google documentation it states:

For regional clusters that run GPUs, there is currently no region which has any GPU type in three zones. If you want to run GPUs in a regional cluster, you need to specify zones using the --node-locations flag

When I specify the location in the module like so

node_pools = [
    {
      name              = "node-pool-name"
      machine_type      = "n1-highmem-32"
      min_count         = 0
      max_count         = 1
      disk_size_gb      = 100
      disk_type         = "pd-standard"
      image_type        = "COS"
      accelerator_count = 2
      accelerator_type  = "nvidia-tesla-p100"
      auto_repair       = true
      auto_upgrade      = true
      preemptible       = false
      location          = "us-east1-b"
    },
  ]

For some reason the location is not getting picked up and in the plan we get:

      + location            = "us-east1"
      + max_pods_per_node   = (known after apply)
      + name                = "node-pool-name"
      + name_prefix         = (known after apply)
      + node_count          = (known after apply)
      + project             = "project-name"
      + region              = (known after apply)
      + version             = (known after apply)
      + zone                = (known after apply)

Even after running the apply it still doesn't get picked up and the following error gets thrown:

module.gke.google_container_node_pool.pools[1]: Creating...

Error: error creating NodePool: googleapi: Error 400: Accelerator type "nvidia-tesla-p100" does not exist in zone us-east1-d., badRequest

Would this be a bug in the module? Or is there something that I might be missing?

@aaron-lane aaron-lane added the question Further information is requested label Sep 19, 2019
@DanielWozniak
Copy link
Author

Went over the code and looks like the problem is that node pool doesn't accept a location. What ever location is specified in the cluster is used directly in the node pool. I've created a branch with some changes but while testing I can't seem to even get the cluster to create using the master branch. For some reason the cluster gets created with an error code 13 with the message INTERNAL(not very helpful), then proceeds to deleting the cluster.

@morgante
Copy link
Contributor

Can you share the docs you're referring to? If you're able to get this working even via gcloud that would help.

@rileykarson
Copy link
Contributor

rileykarson commented Sep 19, 2019

FYI, node pool node locations (that are different than the cluster's default node locations) will be supported as part of the next provider release: GoogleCloudPlatform/magic-modules#2320

@morgante
Copy link
Contributor

Interesting, I didn't know we supported the ability to run different node pools in different locations. Looks like we should be able to add this with the next provider release, thanks for the heads up!

@DanielWozniak
Copy link
Author

Ah, makes sense. For some reason when I read the description of location in the docs, I assumed you could specify something other than the clusters location. Even though it clearly says in which the cluster resides.
Thanks, for clearing things up guys. Any idea when this might be part of this module?

@morgante
Copy link
Contributor

@DanielWozniak Likely 2-3 weeks, depending on when the provider release lands.

@DanielWozniak
Copy link
Author

Sounds good. In the meantime I suppose we can close this issue.

@DanielWozniak
Copy link
Author

Hi All,

Since the feature has been released in version 2.16.0 could I request this be added to the module.
More specifically, being able to specify which zone is used in the node pool.

I've tried working on this myself however for some reason im not even able to create a simple cluster before even adding any changes I get a simple

Error: Error waiting for creating GKE cluster: Failed to create cluster

  on .terraform/modules/gke/cluster.tf line 22, in resource "google_container_cluster" "primary":
  22: resource "google_container_cluster" "primary" {

For some reason after 20 seconds of trying to create the cluster it just starts deleting itself, without giving any helpful message of why this is happening.

@morgante
Copy link
Contributor

@DanielWozniak Is this no longer needed?

@DanielWozniak
Copy link
Author

DanielWozniak commented Oct 21, 2019

It's still needed, just thought that since this issue is tagged with question it might not get noticed. I was thinking of just creating a new one with all the required info to get rid of any confusion. Or if you are able to change the tag to Enhancement then we can keep this one.

@morgante morgante reopened this Oct 21, 2019
@morgante morgante added enhancement New feature or request and removed question Further information is requested labels Oct 21, 2019
@morgante
Copy link
Contributor

Ok I've marked it as an Enhancement. We'll put it in our backlog.

@morgante
Copy link
Contributor

I've actually moved this into #290.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants