-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal Kubernetes API Calls Blocked by Istio #8696
Comments
I can confirm this issue exists and is also the root cause of Knative not working on AKS-- their autoscaler, which is a controller, is unable to sync with the Kubernetes apiserver. Disabling the It's perplexing to me that this occurs in AKS but not elsewhere. Can someone help me troubleshoot this? (I work for Azure.) |
We have this problem - we made a ServiceEntry and VirtualService to account for the fact that the apiserver is now accessed over a public URL. I made the following ServiceEntry and VirtualService:
This gets it to the point where I can access the api-server, but after 5 minutes, it stops working, and calls to the api-server hang. Also, calls to Go's net.LookupIP(host) hang during this period, when providing the FQDN of the AKS apiserver. I found that if I wait 10-15 minutes, the problem seems to resolve itself, but starts failing again after another 5 minutes. I also found that I can make a request to the apiserver when it's working, and that seems to delay the point where it stops. I made a request, then another 1 minute later - and it started failing 5 minutes after the 2nd request, not the first. I should mention that curl requests made directly to the API server when I |
I am having the same issue with the RabbitMQ Kubernetes peer plugin, that wants to list the other pods, but can't connect to the AKS API server. The external service entry didn't work, the only way to fix this, was to set the ip range I only experienced this issue on AKS. |
@mkjoerg, does the plug-in use client-go? If so, there's a strange issue that seems to only happen when combining Istio, AKS, and client-go (but any 2 are fine): kubernetes/client-go#527 |
@adinunzio84, no the rabbitmq plugin is erlang based. https://github.com/adinunzio84 |
While kubernetes/client-go#527 may technically still be an issue, I want to point out that there is now, at least, a workaround in place on the AKS end. A mutating webhook is now overwriting environment variables such as While this may be more of a workaround than a strategic solution, it's fair to say that this issue is effectively remediated. Should we consider closing it? EDIT: Because the apiserver address will appear to be external, you do have to add an appropriate |
@krancour are you sure about: "The load balancer(s) that are involved in this alternative route to the apiserver are not subject to the difficulties explained in kubernetes/client-go#527."? I commented here more info about the issue I describe in kubernetes/client-go#527, and it seems it's actually more related to the load balancers involved with AKS than client-go. If applications with Istio sidecars are able to access the API server after the 5-minute window (where the LB closes the connection), then I think this can be closed. Otherwise, in my opinion, this should remain open and depends on envoyproxy/envoy#3634 |
@adinunzio84, the cluster-internal load balancers and externally facing load balancers are different. The externally facing load balancers have had TCP reset as an opt-in "preview" feature since mid-September, while the cluster-internal load balancers, if I understand correctly, still lack this feature. https://azure.microsoft.com/en-us/updates/load-balancer-outbound-rules/ Oddly, when I dig down into LB details, I cannot see any evidence that AKS actually enabled the feature in question when it deployed the cluster, however, I am currently observing correct/desired behavior. I'll follow up with my colleagues on the AKS team to figure out what's going on here. If you want to try this yourself, perhaps you can independently verify / refute that this works as I claim. |
Sure I'll test it out when I have a chance. If I understand correctly, that TCP reset preview feature is for a Standard Load Balancer. One of the people I spoke with on the AKS team said that AKS does not have Standard Load Balancer support enabled yet, but it should happen soon. |
That is correct. The post I linked to does reference standard load balancers, whilst AKS is currently using basic load balancers only-- which deepens the mystery of why this is now working. |
Internal Kubernetes API calls seem to fail for the first seconds. You might find the this repro useful: see update of #12187 |
This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last month and a half. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
I know it is an old issue but i solved this issue with: K8S_INTERNAL_API_IP=$(kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}') apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
....
.....
proxy:
autoInject: enabled
clusterDomain: cluster.local
componentLogLevel: misc:error
enableCoreDump: false
envoyStatsd:
enabled: false
excludeIPRanges: "${K8S_INTERNAL_API_IP}/32" Path is: .spec.values.global.proxy.excludeIPRanges |
how about
or if its only one POD which is affected: simply inject a POD annotation:
but the thing that works best for me is:
|
thank you so much. This worked perfectly. I was running into an issue because I'm using kubectl in init containers to check job statuses before deploying a replicaset. couldnt connect to the api at all. |
We just switched from Contour to Istio (1.9.4) on our dev environments and are running into this issue alot We've modified our IstioOperator settings with the excludeIPRanges mentioned in this issue, but we're still seeing the issue. It tends to happen regularly when our nightly builds get deployed to our dev clusters (~2AM PT). After enough restarts, the problem seems to go away, but we've yet to find the surefire workaround/remedy. Any other things we should be looking at? |
We managed to get around this issue with the following DestinationRul in our services' namespace:
|
be aware that something has changed in terms of egress rules precedence in 1.8.5+ |
Even with the DestinationRule, we're still seeing intermittent issues talking to kubernetes.default.svc and ElasticSearch instances in different namespaces. We've added retries in our code as well as a VirtualService to kubernetes.default.svc with retries, yet we still see intermittent issues. The issues were present, but not nearly as common since the switch to istio. |
ok so I have another solution : try this master sidecar config:
credit not to me but to @WilliamNewshutz & @gregoryhanson |
Istio and the api-sever does not play pretty well together, like pizza and pineapple: * istio/istio#8696 * istio/istio#12187 Chart is open enough for the users to set the podAnnotation with the api-server instead of not deploying the sidecar in the webhook, but by default we offer an option that works on most possible scenarios.
* fix: Don't let istio to instrument the webhook Istio and the api-sever does not play pretty well together, like pizza and pineapple: * istio/istio#8696 * istio/istio#12187 Chart is open enough for the users to set the podAnnotation with the api-server instead of not deploying the sidecar in the webhook, but by default we offer an option that works on most possible scenarios. * chore: Bump version
My team and I recently observed a somewhat similar issue with Istio and the Kubernetes API server (kube-apiserver) on Microsoft Azure Kubernetes Service (AKS), and found the suggestions in this thread to be helpful. For our issue, we saw connection resets when attempting to contact kube-apiserver from pods having the istio sidecar injected:
We also saw We tried @taitelman's solution of adding a Istio version: 1.15.3 Kubernetes versions:
|
Hi @taitelman and @pmalmsten, I have installed Istio v1.18.0 and I am trying to install Kiali v1.67.0 on two different namespaces, but I am facing below issue, I have tried adding above sidecar crd, service entry crd and destination entry crd too mentioned by @taitelman and @jdelgadillo, but I am still not able to resolve the issue with Kiali and and below is the error log, could you please check and let me know what restriction can be on AKS cluster and how I can it get resolved?
|
@Chaitan1991 were you able to fix your issue? |
yes, check the new versions of Istio and Kiali which you are installing are from correct repo with correct image tag, we were referring old testing repo with latest tag and that's not updated code at all!!! |
ty @taitelman , your |
To add to this if/when anyone googles this and finds it, the EKS endpoint seems to be 172.20.0.1 instead. |
I'm assuming there's some default behaviour here, as I'm still experiencing this in 2024. Have tried many of the above solutions. I have a vanilla AKS cluster with Istio enabled via the menu. I have made a new NS called bendev which has istio injection enabled via the label istio.io/rev: asm-1-20 The sidecars are injecting ok, but no matter which combination of the above solutions I try my internal services can't communicate. I notice that by default there is no "catch-all" listener on 0.0.0.0:443 for TLS traffic. This may be why I can't communicate to the API service without excludeIPs (this is the 1 solution that works). I don't really want to start moving traffic out of the mesh, however. I would rather figure out why this is happening. I created some basic services, such as an nginx-deployment consisting of 3 replicas, behind a simple ClusterIP service. When using exec into the terminal, I can curl everything quite happily. However, when relying on the script, all of the curls fail (unless using excludeIPs) including the curl to the internal svc, nginx-deploy.bendev.svc.cluster.local:80 I can see a route for this does exist in the proxy config for the netshoot pod out of the box, due to this deployment having been created "on the watch" of istio I have tried making a service entry to cover the kubernetes API, using location EXTERNAL/INTERNAL and resolution DNS/STATIC/NONE whilst switching the address to the local address for the kubernetes.default.svc server within the cluster (for me it is 10.0.0.1) I have tried to make a custom sidecar but I couldn't get it to inject I have also tried a virtual service in addition to the service-entry above, to create a listener on 0.0.0.0/443 and a route None of these things worked, and the script itself always fails There are no authorization policies or peerauthentication requirements across the cluster. Please advise - I am quite stuck with this I am also not getting any logs inside Envoy in the istio-proxy sidecar. Seems like my requests aren't even getting that far. |
Describe the bug
I'm installing a monitoring service in to my pod which is trying to make a call to the Kubernetes API server. This request is being blocked by the Istio sidecar. If I disable the
istio-injection
and redeploy everything works as planned. Do I need to enable anything to make this work?Expected behavior
My pods can access the internal Kubernetes API
Steps to reproduce the bug
from inside my pod does not respond.
Version
Istio:
Kubernetes:
Installation
Environment
Microsoft Azure AKS
The text was updated successfully, but these errors were encountered: