-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Windows KEP for GA #729
Conversation
/hold |
- Some kubeadm work was done in the past to add Windows nodes to Kubernetes, but that effort has been dormant since. We will need to revisit that work and complete it in the future. | ||
- Calico CNI for Pod networking | ||
- Hyper-V isolation (Currently this is limited to 1 container per Pod and is an alpha feature) | ||
- It is unclear if the RuntimeClass proposal from sig-node will simplify scheduled Windows containers. we will work with sig-node on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is still not well understood I don't think it needs to be included here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folks from sig-architecture will likely ask about this, which is why i included here. indicating we will do more work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My meta-point here is that Windows stable shouldn't require supporting an alpha or beta feature. We should continue working on a plan for this alongside SIG-Node. I think this is ok as-is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be clearer if the section was renamed to "Windows Node Roadmap" to make it explicit that the eventually is beyond the scope of GA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked about RuntimeClass back in November. :-)
@craiglpeters has a good point. I assume "eventually" is post-GA for all of these?
nodeSelector: | ||
"beta.kubernetes.io/os": windows | ||
tolerations: | ||
- key: "Os" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you actually want a lower-case os
here to match the example.
"beta.kubernetes.io/os": windows | ||
tolerations: | ||
- key: "Os" | ||
operator: "Equals" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the operator is Equal
not Equals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. updated.
/lgtm |
- Horizontal Pod Autoscaling | ||
- Windows Server 2019 is the only Windows operating system we will support at GA timeframe. Note above that the host operating system version and the container base image need to match. This is a Windows limitation we cannot overcome. | ||
- Customers can deploy a heterogeneous cluster, with Windows and Linux compute nodes side-by-side and schedule Docker containers on both operating systems. Of course, Windows Server containers have to be scheduled on Windows and Linux containers on Linux | ||
- Out-of-tree Pod networking with [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), [OVN-Kubernetes](https://github.com/openvswitch/ovn-kubernetes), [two CNI meta-plugins](https://github.com/containernetworking/plugins), [Flannel (VXLAN and Host-Gateway)](https://github.com/coreos/flannel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't VXLAN support only in 1903 currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrieanna by the time we GA, it will be supported for Server 2019
operator: "Equal" | ||
Value: “Windows” | ||
effect: "NoSchedule" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Windows containers are specific to the os version, does it make sense to have the taint/toleration include the windows version? While only 2019 is supported at GA, eventually there will be more versions of windows support (as new Windows versions are released). A version-specific taint could help containers land on the right nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were going to add that in the docs, but i made the change here as well for additional clarity
- All features and functionality under `What works today` is fully tested and vetted to be working by SIG-Windows | ||
- SIG-Windows has high confidence to the stability and reliability of Windows Server containers on Kubernetes | ||
- 100% green/passing conformance tests that are applicable to Windows (see the Testing Plan section for details on these tests) | ||
- Comprehensive documentation that includes but is not limited to the following sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan for where these docs will live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to include the location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@craiglpeters - have you talked to SIG-Docs on this yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PatrickLang I have not. Just started conversation with internal team about docs. I'll reach out to sig-docs via slack today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have a plan for where the docs will live, and how they will be written. But finalizing those plans shouldn't be a prerequisite for calling this KEP /implementable
Co-Authored-By: michmike <michmike@users.noreply.github.com>
Co-Authored-By: michmike <michmike@users.noreply.github.com>
/lgtm More PRs still coming. The final one will include a change to status:implementable. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benmoss, michmike, PatrickLang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/remove hold |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates
|
||
## Proposal | ||
|
||
As of 29-11-2018 much of the work for enabling Windows nodes has already been completed. Both `kubelet` and `kube-proxy` have been adapted to work on Windows Server, and so the first goal of this KEP is largely already complete. | ||
|
||
### What works today | ||
- Windows-based containers can be created by kubelet, [provided the host OS version matches the container base image](https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility) | ||
- ConfigMap, Secrets: as environment variables or volumes | ||
- Pod (single or multiple containers per Pod with process isolation), Deployment, ReplicaSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing to mention Deployment and ReplicaSet here, and DaemonSet and StatefulSet below. Please discuss all the workload controllers adjacent to one another.
Do Job and CronJob have any issues? If not, please list them with ReplicaSet and Deployment.
|
||
## Proposal | ||
|
||
As of 29-11-2018 much of the work for enabling Windows nodes has already been completed. Both `kubelet` and `kube-proxy` have been adapted to work on Windows Server, and so the first goal of this KEP is largely already complete. | ||
|
||
### What works today | ||
- Windows-based containers can be created by kubelet, [provided the host OS version matches the container base image](https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility) | ||
- ConfigMap, Secrets: as environment variables or volumes | ||
- Pod (single or multiple containers per Pod with process isolation), Deployment, ReplicaSet | ||
- Services types NodePort, ClusterIP, LoadBalancer, and ExternalName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headless services?
Are there any DNS differences?
- Dockershim CRI | ||
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing) | ||
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers | ||
- Windows Server containers can take advantage of StatefulSet functionality for stateful applications and distributed systems | ||
- Windows Pods can take advantage of DaemonSet, with the exception that privileged containers are not supported on Windows (more on that below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above you mentioned "Windows server containers" and here "Windows pods". Is there any difference in meaning between the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no difference. i will update the naming to be consistent.
- Resource limits | ||
- Pod & container metrics | ||
- Pod networking with [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), [OVN-Kubernetes](https://github.com/openvswitch/ovn-kubernetes), [two CNI meta-plugins](https://github.com/containernetworking/plugins), [Flannel](https://github.com/coreos/flannel) and [Calico](https://github.com/projectcalico/calico) | ||
- Horizontal Pod Autoscaling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are system OOMs reported?
Are there notable differences in Pod Status fields?
- Some kubeadm work was done in the past to add Windows nodes to Kubernetes, but that effort has been dormant since. We will need to revisit that work and complete it in the future. | ||
- Calico CNI for Pod networking | ||
- Hyper-V isolation (Currently this is limited to 1 container per Pod and is an alpha feature) | ||
- It is unclear if the RuntimeClass proposal from sig-node will simplify scheduled Windows containers. we will work with sig-node on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked about RuntimeClass back in November. :-)
@craiglpeters has a good point. I assume "eventually" is post-GA for all of these?
|
||
### What will never work (without underlying OS changes) | ||
- Certain Pod functionality | ||
- Privileged containers | ||
- Privileged containers and other Pod security context privilege and access control settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked a bunch of other questions on the original KEP PR:
#676 (comment)
#676 (comment)
#676 (comment)
#676 (comment)
#676 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 , which linux capabilities specifically do you mean? these ones? https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
## Graduation Criteria | ||
#### Ensuring OS-specific workloads land on appropriate container host | ||
As you can see below, we plan to document how Windows containers can be scheduled on the appropriate host using Taints and Tolerations. All nodes today have the following default labels | ||
- beta.kubernetes.io/os = [windows|linux] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting the promotion of these to stable:
kubernetes/kubernetes#72929
|
||
## Implementation History | ||
However, we understand that in certain cases customers have a pre-existing large number of deployments for Linux containers. Since they will not want to change all deployments to add nodeSelectors, the alternative is to use Taints. Because the kubelet can set Taints during registration, it could easily be modified to automatically add a taint when running on Windows only (`“--register-with-taints=’os=Win1809:NoSchedule’” `). By adding a taint to all Windows nodes, nothing will be scheduled on them (that includes existing Linux Pods). In order for a Windows Pod to be scheduled on a Windows node, it would need both the nodeSelector to choose Windows, and a toleration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just deployments, but also ecosystem off-the-shelf configurations, such as community Helm charts, and programmatic pod generation cases, such as with Operators. I think taints are going to be needed in most cases.
|
||
## Graduation Criteria | ||
- All features and functionality under `What works today` is fully tested and vetted to be working by SIG-Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this section complete, or is @craiglpeters still working on it?
My previous comment:
#676 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i made some more edits now that i will pushing through
11. Advanced: How to use Hyper-V isolation (not a stable feature yet) | ||
12. Advanced: How to build Kubernetes for Windows from source | ||
13. Supported functionality (with examples where appropriate) | ||
14. Known Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any node addons work, such as node problem detector?
- Horizontal Pod Autoscaling | ||
- Windows Server 2019 is the only Windows operating system we will support at GA timeframe. Note above that the host operating system version and the container base image need to match. This is a Windows limitation we cannot overcome. | ||
- Customers can deploy a heterogeneous cluster, with Windows and Linux compute nodes side-by-side and schedule Docker containers on both operating systems. Of course, Windows Server containers have to be scheduled on Windows and Linux containers on Linux | ||
- Out-of-tree Pod networking with [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), [OVN-Kubernetes](https://github.com/openvswitch/ovn-kubernetes), [two CNI meta-plugins](https://github.com/containernetworking/plugins), [Flannel (VXLAN and Host-Gateway)](https://github.com/coreos/flannel) | ||
- Dockershim CRI | ||
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing) | ||
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions/notes from the storage perspective:
-
Is there some in-tree code that specifically allows AzureDisk to work with Windows that is not present for other similar existing in-tree block/disk backed storage plugins like GCE PD/AWS EBS/etc?
-
If GCE PD/AWS EBS and others are known to work with Windows workers, can they also be added here (along with Azure Disk) please for clarity?
-
In the context of the CSI Migration initiative (the effort to have in-tree plugins shim out to CSI versions of the in-tree plugins over a couple of releases so that eventually the in-tree plugin code can be removed), lack of support for CSI node plugins for Windows 2019 may have an impact if EBS/GCE-PD in-tree works with Windows workers today but their CSI counterparts will not in the future (until Windows OS enhancements to support CSI node plugins like mount propagation, privileged containers, etc. are in).
-
While SMB based storage will be available (through the Flexvolume plugin and AzureFile), can the support for NFS based storage be clarified? For example, are there any plans for a NFS Flexvolume plugin for Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For NFS, just came across kubernetes/kubernetes#56188 (comment). So sounds like NFS [#4 above] is beyond scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will try to get answers to your questions
|
||
### Non-Goals | ||
|
||
- Adding Windows support to all projects in the Kubernetes ecosystem (Cluster Lifecycle, etc) | ||
- Enable the Kubernetes master components to run on Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is supporting LCOW a non goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now yes. i will clarify
@ddebroy and @bgrant0607 you asked some really good questions. we will find the answers and make the necessary updates. |
Updating the windows KEP as we move the KEP towards implementable stage