Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Google accelerators (i.e. GPUs) on workers #161

Merged
merged 1 commit into from
Mar 12, 2018

Conversation

dghubble
Copy link
Member

@dghubble dghubble commented Mar 11, 2018

Warning: This does not magically make GPUs work on Container Linux or Kubernetes. It simply allows advanced users to begin experimenting with them. Like the comments imply, this feature is unofficial, undocumented, unsupported, and may be changed or removed at any time.

Caveats:

  • Requires changes to Google Cloud default quotas
  • Requires using terraform-provider-google 1.6.0 or higher to work with "0" GPUs properly
  • Requires compiling your own kernel modules on Container Linux. (It's possible, I've done it. Just lots of rough edges)
  • Some instances will remain un-created forever, because no GPU model is uniformly available across zones and workers are randomized into zones within a region automatically (a Typhoon feature). We just have to fiddle with the count until GCP learns to only try to create the instance in a zone it can actually be created in.

@dghubble dghubble force-pushed the google-accelerators branch from a1653b5 to 2592a0a Compare March 12, 2018 00:21
@dghubble
Copy link
Member Author

Mon Mar 12 01:00:04 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    29W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@dghubble dghubble merged commit 2592a0a into master Mar 12, 2018
@dghubble dghubble deleted the google-accelerators branch March 12, 2018 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant