-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for GPU and PCI passthrough support #1237
Conversation
Hi @geetikabatra. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
|
||
#### Story 3 - PCI passthrough for single node Customer | ||
|
||
Alex is an Engineer at retail organization that requires single GPU node. They use one node with GPU attached and want to keep things simple. Alex can simply add this GPU connected machine to the cluster and that should do the job. While selecting nodes, Alex can use appropriate labels to run his AI/ML workload on this particular node. PCI passthrough will provide direct GPU support. Challenges that Alex can face is that using passthrough Alex wouln't be able to migrate nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this also applies to non-GPU devices, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxRink I don't get your question here. Non GPU will be counted among regular nodes in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean passthrough for PCI-E devices that are not GPUs, as PCI-E passthrough shouldnt really care if its a GPU, a NIC or even an HBA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, If we put it that way, yes. It also applies for non GPU devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, precisely this. This proposal should be retitled to cover both GPU and PCI Passthrough support as that's what's being implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yastij Does it make sense to separate out the PCI Passthrough support to a separate proposal? This could also enable the workflow of provisioning VMs physical NICs attached to the VMs created by CAPV.
And if it is too much work to separate the proposal out, I +1 Naadir's comments on retitling the proposal since PCI passthrough support is not strictly GPU related.
4f2db4b
to
3795e85
Compare
6c69290
to
a486a10
Compare
a486a10
to
649de59
Compare
649de59
to
edcaac5
Compare
/unassign @gab-satchi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did an API review, a few comments to fix before merging. I also Agree that splitting vgpu and pci made things clearer.
Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the missing things here is, changes to the VSphereVMStatus
object to bubble the Device info to the user.
Another thing to take into consideration is if any new Conditions are needed to surface error scenarios or maybe just new reasons which can be set for existing conditions if any error happens during VM creation.
1928528
to
74fb06c
Compare
@geetikabatra: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
c52bcc7
to
a49e0d7
Compare
@srm09 PTAL |
|
||
```text | ||
--- | ||
title: GPU support in CAPV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is missing the updates to the VSphereVMStatus
to provide the Device info to the user which was called out in this comment.
I do not see the validation webhook changes in the final proposal, was this change overridden?
Could you please add the two things mentioned above and also take a look at the open conversations and resolve them if appropriate?
Thanks for all the work on the proposal, it is almost there! 👍🏾
I had pushed an older version of the proposal hence many changes disappeared. Updating the latest one. |
1ce8a90
to
5894c2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final set of changes. @yastij can you take a look at it too?
Signed-off-by: geetikab@vmware.com <geetikab@vmware.com> Co-authored-by: Sagar Muchhal <muchhals@vmware.com>
This is addressed in https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1237/files#diff-da5c64bcd81497eebce388a7c1e0250d75fd759fb2131341546691e3d7475e41R147 |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yastij The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This commit adds a proposal which enables GPU support
in CAPV using.
Signed-off-by: Geetika Batra geetikab@vmware.com
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: