-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows containers behind a Kubernetes loadbalancer become unreachable #78
Comments
using AKS engine with k8s
|
There was a change that we detected in aks-engine: Azure/aks-engine#3956 (comment) that caused service calls to fail. The fix was to re-order the calls: Azure/aks-engine#3956 (comment) done in Azure/aks-engine#4002 Not sure if this is related but seems fairly suspect. |
This issue has been open for 30 days with no updates. |
This is still under active investigation. |
This issue has been open for 30 days with no updates. |
This is still occurring on OCP 4.6 (Kubernetes 1.19), but on OCP 4.7 (Kubernetes 1.20), we are not able to reproduce this issue because of a new issue #103 |
This was fixed by kubernetes/kubernetes#96499 |
Until people move to Kubernetes 1.20 (assuming the problem is fixed in this version) as a workaround you just need to bypass the Azure Load Balancer on your http calls. Example Calls from any container will translate the name "my.app.org" to 10.0.16.10 and the request will be routed through the Azure Load Balancer and back inside the AKS cluster. To do so, you just need to add the following into the "coredns-custom" configmap:
|
Removed 2019-Datacenter-with-Containers SKU fixed version Removed verbiage regarding the issue with Windows containers behind a Kubernetes load balancer becoming unreachable, since no longer applicable. See microsoft/Windows-Containers#78 Added sections for sample machineSet parameters and object
Removed 2019-Datacenter-with-Containers SKU fixed version. Removed verbiage regarding the issue with Windows containers behind a Kubernetes load balancer becoming unreachable, since no longer applicable. See microsoft/Windows-Containers#78 Added sections for sample machineSet parameters and object.
Added sections for sample machineSet parameters and object. Re-arranged parameters and added command to get the latest compatible image for a given region. Removed verbiage regarding the issue Windows containers behind a Kubernetes load balancer becoming unreachable issue, since no longer applicable. See microsoft/Windows-Containers#78
Removed 2019-Datacenter-with-Containers SKU fixed version. Removed verbiage regarding the issue with Windows containers behind a Kubernetes load balancer becoming unreachable, since no longer applicable. See microsoft/Windows-Containers#78 Added sections for sample machineSet parameters and object.
Added sections for sample machineSet parameters and object. Re-arranged parameters and added command to get the latest compatible image for a given region. Removed verbiage regarding the issue Windows containers behind a Kubernetes load balancer becoming unreachable issue, since no longer applicable. See microsoft/Windows-Containers#78
Removed 2019-Datacenter-with-Containers SKU fixed version. Removed verbiage regarding the issue with Windows containers behind a Kubernetes load balancer becoming unreachable, since no longer applicable. See microsoft/Windows-Containers#78 Added sections for sample machineSet parameters and object.
Added sections for sample machineSet parameters and object. Re-arranged parameters and added command to get the latest compatible image for a given region. Removed verbiage regarding the issue Windows containers behind a Kubernetes load balancer becoming unreachable issue, since no longer applicable. See microsoft/Windows-Containers#78
There is a regression when using Windows worker nodes with newer Windows kernel versions on an OpenShift 4.6.8 cluster with mutltiple Windows worker nodes. The cluster configured with hybrid OVN networking.
This issue is present at least in Windows Server 2019 OS Builds 17763.1579 and 17763.1637.
This issue was not present in Windows Server 2019 OS Build 17763.1457
The issue is that http requests made through a load balancer backed by a webserver deployment with 3 pods, are not always making it to the webservers. This issue occurs only when the pods are running on separate Windows nodes. We are seeing this on both Azure and AWS. The logs in this issue are for an Azure cluster with Windows server 2019 OS Build 17763.1637 worker nodes.
Here is a deployment yaml which can be used to exercise this issue.
Once the above yaml was applied to the cluster, the external ip of the load balancer was repeatedly curled using this script:
With one replica, everything worked fine. When the deployment was scaled to three replicas, after a few sucesses the webservers were unreachable. Scaling back down to one replica fixed this issue.
The text was updated successfully, but these errors were encountered: