-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topology management doc updates #17451
Topology management doc updates #17451
Conversation
/milestone 1.17 |
@lmdaly: You must be a member of the kubernetes/website-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Website milestone maintainers and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Deploy preview for kubernetes-io-vnext-staging processing. Building with commit 23981fc https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/5dd7ac7ca413f50009f6fab9 |
/milestone 1.17 |
@lmdaly is this related to a code change PR and / or KEP? |
c0ef48b
to
8e3095e
Compare
@sftim it's updates for the topology manager which has an issue here: kubernetes/enhancements#693 This documentation is more updating the existing knowledge base to give users more information on the feature. |
Could this PR target the master branch? SIG Docs / this repo works with a continuous release process and has PRs targeting master by default. |
Topology Manager would consider this Pod. The Topology Manager consults the Device Manager to discover the topology of the available devices for example.com/deviceA and example.com/deviceB. | ||
|
||
As above Topology Manager will use this information to store the best Topology for this container. Device Manager will then use this when assigning devices to the Pod. | ||
|
||
{{% /capture %}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a section about known issues / limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the ones I could remember off hand, let me know what others I have missed
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Show resolved
Hide resolved
8e3095e
to
57be489
Compare
@@ -95,6 +99,8 @@ If it is, Topology Manager will store this and the *Hint Providers* can then use | |||
resource allocation decision. | |||
If, however, this is not possible then the Topology Manager will reject the pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure. | |||
|
|||
Once the pod is in a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It is recommended a Deployment with Replicas to trigger a redeploy of the pod. | |||
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error. | |
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affinity` error. |
|
||
### Known Limitations | ||
1. As of K8s 1.16 the Topology Manager is currently only guaranteed to work if a *single* container in the pod spec requires aligned resources. This is due to the hint generation being based on current resource allocations, and all containers in a pod generate hints before any resource allocation has been made. This results in unreliable hints for all but the first container in a pod. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler is not topology-aware, so it's possible to pass the scheduler and fail the Admit() check in kubelet. If a higher-level controller is used (replicaset, for example) it can repeatedly re-create the pod and have it schedule and fail the same way.
If multiple pods/containers are considered by kubelet in close succession, they can result in the topology manager policy being effectively ignored. See kubernetes/kubernetes#84749
@lmdaly Just a reminder about the last Docs deadline - 22nd Nov, by which this PR needs to be merged! You have some review comments to be addressed |
4464606
to
3462262
Compare
3462262
to
8e30906
Compare
@daminisatya I have addressed the review comments, do I need a lgtm from anyone in particular? |
* Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections
/lgtm |
/assign @kbarnard10 |
/cc @kubernetes/sig-docs-en-owners |
@@ -205,5 +231,7 @@ Here are some examples of device plugin implementations: | |||
* Learn about [scheduling GPU resources](/docs/tasks/manage-gpus/scheduling-gpus/) using device plugins | |||
* Learn about [advertising extended resources](/docs/tasks/administer-cluster/extended-resource-node/) on a node | |||
* Read about using [hardware acceleration for TLS ingress](https://kubernetes.io/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/) with Kubernetes | |||
* Learn about [The Topology Manager] (/docs/tasks/adminster-cluster/topology-manager.md) | |||
>>>>>>> Update Topology Manager docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is a line from a merge conflict resolution here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, thanks!
Removed line.
8e30906
to
23981fc
Compare
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: daminisatya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@lmdaly does a separate PR need to be opened to make sure these changes make it into master as well? |
These changes will land in master once v1.17 is released, a few weeks from now. As part of the release, SIG Docs's nominated lead merges the website dev-1.17 branch into master. |
Perfect. Thanks. |
* Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections
* feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
Current docs will be updated to provide clarifications on usage.
How to extend a device plugin will be added for device plugin authors to leverage topology manager