-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension to support new compute resources #368
Comments
/sig node |
Progress Tracker
|
@RenaudWasTaken should the resource class design also be part of this tracker? Just asking because I'm missing it in the progress tracker. |
@fabiand the resource class design is a beta (1.9 and up) feature :) My progress tracker is designed to track all the alpha (1.8) features for now. |
I'm shepherding this feature. So assigning to myself for now. |
Thanks!
…On Mon, Aug 14, 2017, 9:09 PM Vish Kannan ***@***.***> wrote:
I'm shepherding this feature. So assigning to myself for now.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#368 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHOu_iCKHIA3wEWRQDXFrH7BABisQoRks5sYI04gaJpZM4OnyQp>
.
|
Automatic merge from submit-queue (batch tested with PRs 49342, 50581, 50777) Device Plugin Protobuf API **What this PR does / why we need it:** This implements the Device Plugin API - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) Special notes for your reviewer: First proposal submitted to the community repo, please advise if something's not right with the format or procedure, etc. @vishh @derekwaynecarr **Release note:** ``` NONE ```
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607) Updated gRPC vendoring to support Keep Alive **What this PR does / why we need it**: This PR bumps the version of the vendored version of gRPC from v1.0.4 to v1.5.1 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Special notes for your reviewer**: @vishh @jiayingz **Release note**: ``` Bumped gRPC from v1.0.4 to v1.5.1 ```
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200) Bumped gRPC version to 1.3.0 **What this PR does / why we need it**: This PR bumps down the version of the vendored version of gRPC from v1.5.1 to v1.3.0 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Which issue this PR fixes**: fixes #51099 Which was caused by my previous PR updating to 1.5.1 **Special notes for your reviewer**: @vishh @jiayingz @shyamjvs **Release note**: ``` Bumped gRPC to v1.3.0 ```
Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627) Deviceplugin jiayingz **What this PR does / why we need it**: This PR implements the kubelet Device Plugin Manager. It includes four commits implemented by @RenaudWasTaken and a commit that supports allocation. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # Design document: kubernetes/community#695 PR tracking: kubernetes/enhancements#368 **Special notes for your reviewer**: **Release note**: Extending Kubelet to support device plugin ```release-note ```
Pretty sure this should not be closed :D |
Automatic merge from submit-queue (batch tested with PRs 54826, 53576, 55591, 54946, 54825). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached - Instead of the old `Accelerators` feature that added `alpha.kubernetes.io/nvidia-gpu` resource, use the new `DevicePlugins` feature that adds vendor specific resources. (In case of nvidia GPUs it will add `nvidia.com/gpu` resource.) - Add node label to GCE nodes with accelerators attached. This node label is the same as what GKE attaches to node pools with accelerators attached. (For example, for nvidia-tesla-p100 GPU, the label would be `cloud.google.com/gke-accelerator=nvidia-tesla-p100`) This will help us target accelerator specific daemonsets etc. to these nodes. - Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached. - Some minor documentation improvements in addon manager. **Release note**: ```release-note GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`. ``` /sig cluster-lifecycle /sig scheduling /area hw-accelerators kubernetes/enhancements#368
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Extend test/e2e/scheduling/nvidia-gpus.go to track resource usage of installer and device plugin containers. To support this, exports certain functions and fields in framework/resource_usage_gatherer.go so that it can be used in any e2e test to track any specified pod resource usage with the specified probe interval and duration. **What this PR does / why we need it**: We need to quantify the resource usage of the device plugin DaemonSet to make sure it can run reliably on nodes with GPUs. We also want to measure gpu driver installer resource usage to track any unexpected resource consumption during driver installation. For the later part, see a related issue kubernetes/enhancements#368. Example resource summary output: Oct 6 12:35:07.289: INFO: Printing summary: ResourceUsageSummary Oct 6 12:35:07.289: INFO: ResourceUsageSummary JSON { "100": [ { "Name": "nvidia-device-plugin-6kqxp/nvidia-device-plugin", "Cpu": 0.000507167, "Mem": 2134016 }, { "Name": "nvidia-device-plugin-6kqxp/nvidia-driver-installer", "Cpu": 1.915508718, "Mem": 663330816 }, { "Name": "nvidia-device-plugin-l28zc/nvidia-device-plugin", "Cpu": 0.000836256, "Mem": 2211840 }, { "Name": "nvidia-device-plugin-l28zc/nvidia-driver-installer", "Cpu": 1.916886293, "Mem": 691449856 }, { "Name": "nvidia-device-plugin-xb4vh/nvidia-device-plugin", "Cpu": 0.000515103, "Mem": 2265088 }, { "Name": "nvidia-device-plugin-xb4vh/nvidia-driver-installer", "Cpu": 1.909435982, "Mem": 832430080 } ], "50": [ { ... **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: **Release note**: ```release-note ```
@RenaudWasTaken @vishh Hello,I would like to pick up the e2e test if no one already did. |
Automatic merge from submit-queue (batch tested with PRs 56681, 57384). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Deprecate the alpha Accelerators feature gate. Encourage people to use DevicePlugins instead. /kind cleanup Related to kubernetes/enhancements#192 and kubernetes/enhancements#368 **Release note**: ```release-note The alpha Accelerators feature gate is deprecated and will be removed in v1.11. Please use device plugins instead. They can be enabled using the DevicePlugins feature gate. ``` /sig node /sig scheduling /area hw-accelerators
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
device plugins graduated to beta in 1.10. |
@vishh @jiayingz @derekwaynecarr @kubernetes/sig-node-feature-requests If so, can you please ensure the feature is up-to-date with the appropriate:
cc @idvoretskyi |
We now have device plugins in Beta, quota supports extensible resources, and the scheduler supports it as well. Scheduler extensions have been proven to work with extended resources. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607) Updated gRPC vendoring to support Keep Alive **What this PR does / why we need it**: This PR bumps the version of the vendored version of gRPC from v1.0.4 to v1.5.1 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Special notes for your reviewer**: @vishh @jiayingz **Release note**: ``` Bumped gRPC from v1.0.4 to v1.5.1 ``` Kubernetes-commit: 967c19df4916160d4d4fbd9a65bad41a53992de8
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200) Bumped gRPC version to 1.3.0 **What this PR does / why we need it**: This PR bumps down the version of the vendored version of gRPC from v1.5.1 to v1.3.0 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Which issue this PR fixes**: fixes #51099 Which was caused by my previous PR updating to 1.5.1 **Special notes for your reviewer**: @vishh @jiayingz @shyamjvs **Release note**: ``` Bumped gRPC to v1.3.0 ``` Kubernetes-commit: 5fb38a325efb343c2a0467a12732829bd5ed3c3c
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607) Updated gRPC vendoring to support Keep Alive **What this PR does / why we need it**: This PR bumps the version of the vendored version of gRPC from v1.0.4 to v1.5.1 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Special notes for your reviewer**: @vishh @jiayingz **Release note**: ``` Bumped gRPC from v1.0.4 to v1.5.1 ``` Kubernetes-commit: 967c19df4916160d4d4fbd9a65bad41a53992de8
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200) Bumped gRPC version to 1.3.0 **What this PR does / why we need it**: This PR bumps down the version of the vendored version of gRPC from v1.5.1 to v1.3.0 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Which issue this PR fixes**: fixes #51099 Which was caused by my previous PR updating to 1.5.1 **Special notes for your reviewer**: @vishh @jiayingz @shyamjvs **Release note**: ``` Bumped gRPC to v1.3.0 ``` Kubernetes-commit: 5fb38a325efb343c2a0467a12732829bd5ed3c3c
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607) Updated gRPC vendoring to support Keep Alive **What this PR does / why we need it**: This PR bumps the version of the vendored version of gRPC from v1.0.4 to v1.5.1 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Special notes for your reviewer**: @vishh @jiayingz **Release note**: ``` Bumped gRPC from v1.0.4 to v1.5.1 ``` Kubernetes-commit: 967c19df4916160d4d4fbd9a65bad41a53992de8
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200) Bumped gRPC version to 1.3.0 **What this PR does / why we need it**: This PR bumps down the version of the vendored version of gRPC from v1.5.1 to v1.3.0 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Which issue this PR fixes**: fixes #51099 Which was caused by my previous PR updating to 1.5.1 **Special notes for your reviewer**: @vishh @jiayingz @shyamjvs **Release note**: ``` Bumped gRPC to v1.3.0 ``` Kubernetes-commit: 5fb38a325efb343c2a0467a12732829bd5ed3c3c
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607) Updated gRPC vendoring to support Keep Alive **What this PR does / why we need it**: This PR bumps the version of the vendored version of gRPC from v1.0.4 to v1.5.1 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Special notes for your reviewer**: @vishh @jiayingz **Release note**: ``` Bumped gRPC from v1.0.4 to v1.5.1 ``` Kubernetes-commit: 967c19df4916160d4d4fbd9a65bad41a53992de8
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200) Bumped gRPC version to 1.3.0 **What this PR does / why we need it**: This PR bumps down the version of the vendored version of gRPC from v1.5.1 to v1.3.0 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Which issue this PR fixes**: fixes #51099 Which was caused by my previous PR updating to 1.5.1 **Special notes for your reviewer**: @vishh @jiayingz @shyamjvs **Release note**: ``` Bumped gRPC to v1.3.0 ``` Kubernetes-commit: 5fb38a325efb343c2a0467a12732829bd5ed3c3c
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607) Updated gRPC vendoring to support Keep Alive **What this PR does / why we need it**: This PR bumps the version of the vendored version of gRPC from v1.0.4 to v1.5.1 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Special notes for your reviewer**: @vishh @jiayingz **Release note**: ``` Bumped gRPC from v1.0.4 to v1.5.1 ``` Kubernetes-commit: 967c19df4916160d4d4fbd9a65bad41a53992de8
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200) Bumped gRPC version to 1.3.0 **What this PR does / why we need it**: This PR bumps down the version of the vendored version of gRPC from v1.5.1 to v1.3.0 This is needed as part of the Device Plugin API where we expect client and server to use the Keep alive feature in order to detect an error. Unfortunately I had to also bump the version of `golang.org/x/text` and `golang.org/x/net`. - Design document: kubernetes/community#695 - PR tracking: [kubernetes/enhancements#368](kubernetes/enhancements#368 (comment)) **Which issue this PR fixes**: fixes #51099 Which was caused by my previous PR updating to 1.5.1 **Special notes for your reviewer**: @vishh @jiayingz @shyamjvs **Release note**: ``` Bumped gRPC to v1.3.0 ``` Kubernetes-commit: 5fb38a325efb343c2a0467a12732829bd5ed3c3c
Feature Description
New Extension at the node level to surface, schedule and manage lifecycle of new compute resources.
The text was updated successfully, but these errors were encountered: