Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shim] Rework resource management #2093

Merged
merged 2 commits into from
Dec 18, 2024
Merged

Conversation

un-def
Copy link
Collaborator

@un-def un-def commented Dec 13, 2024

  • Collect GPU resource identifiers: UUIDs for NVIDIA, DRI render nodes for AMD
  • Request GPUs by identifiers when creating a container
  • Keep track of busy/idle GPUs
  • Move host-related code to a separate package
  • Don't panic()

Part-of: #1780

* Collect GPU resource identifiers: UUIDs for NVIDIA, DRI render nodes
  for AMD
* Request GPUs by identifiers when creating a container
* Keep track of busy/idle GPUs
* Move host-related code to a separate package
* Don't `panic()`

Part-of: #1780
@un-def un-def requested a review from r4victor December 13, 2024 15:29
@r4victor
Copy link
Collaborator

@un-def, how does requesting GPUs work with TPUs? Will a task have gpu_count: 0? No need to support GPU blocks for TPUs since it doesn't seem TPU Pods can be sliced further. Just want to ensure nothing breaks.

@un-def
Copy link
Collaborator Author

un-def commented Dec 16, 2024

how does requesting GPUs work with TPUs?

shim knows nothing about TPUs, for shim TPU instances are CPU instances, that is, gpuVendor is "none".

Will a task have gpu_count: 0

Yes

it doesn't seem TPU Pods can be sliced further

Yes. As for now, we have no plans to allow TPU slices

@un-def un-def merged commit 558f381 into master Dec 18, 2024
23 checks passed
@un-def un-def deleted the issue_1780_shim_collect_gpu_ids branch December 18, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants