-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamwatcher.go:109] Unable to decode an event from the watch stream: stream error: stream ID 1; INTERNAL_ERROR #676
Comments
Is this not the same root cause as to performance degradation issue which is also still opened? #620 |
I saw this tagged from the TLS Handshake error thread — #14 Would the above work around with environmental variables be a potential workaround for the timeout issue? |
@necevil I don't think this is related to #14. This issue is about calling kube-apiserver from inside of the cluster vs. outside. |
any Eta for permanent fix ? |
I don't have an ETA yet, but the fix is actively being worked on. The permanent fix will be to apply the same mitigation automatically for customer pods. I recommend applying the mitigation if you're running into this issue until we can have the fix automatically applied. |
Wait, are you saying the workaround from above is the proper fix for this issue, and you are not working on a solution to reliably access the master plane without being routed over the public internet? |
That's correct @DenisBiondic, we will apply the same change automatically. Just to clarify, the traffic will be routed over the azure backplane similar to how it works now except we will bypass the hop through azureproxy (kube-svc-redirect). |
@juan-lee Will tunnelfront be removed then? |
This is affecting cert-manager too. Errors continue after deleting & recreating the pod. |
Yeah, it affects each container that has a connection to the master plane (different operators, external dns, ingress controller, cert manager etc.) |
@holmesb if you're installing cert-manager via Helm you can add the following to your custom values.yaml (passed in with the extraEnv:
- name: KUBERNETES_PORT_443_TCP_ADDR
value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io
- name: KUBERNETES_PORT
value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
- name: KUBERNETES_PORT_443_TCP
value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
- name: KUBERNETES_SERVICE_HOST
value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io You can get your specific URL from the portal. Note, tiller uses client-go so it inherits the bugs client-go has as was previously pointed out. |
Also running into this problem using https://github.com/argoproj/argo-events. Currently using Azure AKS 1.11.3. Does anyone know if there is a fix planned? |
Is there a possibility to inject this into a tiller-deploy pod already running on my cluster or does this have to be done at initialisation time? EDIT: Nevermind, I found out that you have to add these lines to the deployment by |
I'm running into the same issue. I've tried the workaround described by @dkipping to no avail. Setting the environment variables on the Edit: |
@evandervalk could you apply the values to the If they got applied, could you check that they got applied to a (newly spawned) Edit: |
Editing the deployment went as you described. I can edit the values in the After editing the configuration it created a new pod with the earlier mentions environment variables defined. So that also seems to work. I have the variables defined in the However, after editing the configuration and making sure everything propagated, I am still seeing: If I describe the pod for I would assume these environment variables should be defined on |
Yes, sounds like it. |
I will try and report back. Edit: I've removed the environment variables from Edit 2: Setting the environment variables on the |
Workaround works for me actually
…On Tue, Oct 16, 2018, 5:29 PM E. van der Valk ***@***.***> wrote:
I will try and report back.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#676 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABHXfQC1ZEEACLthOd9N-ddSDqbJhVjRks5ulftigaJpZM4XGbpl>
.
|
@weinong : Good to know. Is this being rolled out region by region, and if so, is there a set sequence? I am particularly interested in when this will be rolled out in region westeurope. |
West Europe will be later this week. It will be applied to all new clusters at first. Then we will roll it out to all existing clusters shortly after. |
Hi,when will the fix be released for thanks in advanced |
fix has reached westeurope. |
Still getting the Issue: Do I need to reinstall Istio for this fix to work? |
|
Hi, Do we need to do anything for our cluster? Or it will be applied automatically? Br, |
I understood:
So you'll have to recreate the cluster. |
@adinunzio84 You mentioned that with this fix there may need to be an additional ServiceEntry needed for Istio. We are having the issue described and also have Istio installed so I am curious if you have the fix and what ServiceEntry you had to add? |
The fix is no longer behind a feature flag. All new clusters will get it automatically. Existing clusters will need to do a scale or upgrade to get the fix. |
@juan-lee Thanks for your reply, we rebuilt our cluster yesterday, however, I still can see this problem this morning. Br, |
Hi, i redeployed with terraform in westeurope and centralus region. with nginx ingress controller. I can confirm from my side that the messages are gone from the log file. thanks for the fix. |
Can you provide some more details? Does the pod in question have the appropriate KUBERNETES_ environment variables set. |
@mmosttler Not fully tested, but this kinda works:
I did This is the closest I got to it working. The javascript Kubernetes client is happy with this, but the Go client is not. Please let me know if you come up with something better though |
I can confirm that the problem is gone from our clusters. (west EU) |
thanks for reporting back. i'm closing the issue for now |
I am still seeing this issue even this week in east us. |
Can you elaborate on your scenario? Also, keep in mind that pods will need to be restarted in order to get the fix. You can check to see if a pod has the fix by seeing if KUBERNETES_PORT, etc env variables are set for each container. |
Symptoms
Pods using the in cluster config to perform a watch on a resource will see intermittent timeouts and the following error in the pod log.
streamwatcher.go:109] Unable to decode an event from the watch stream: stream error: stream ID 1; INTERNAL_ERROR
If the client performing the watch isn't handling errors gracefully, applications can get into an inconsistent state. Impacted applications include, but are not limited to, nginx-ingress and tiller (helm).
A specific manifestation of this bug is the following error when attempting a helm deployment.
Error: watch closed before Until timeout
Root Cause
Workaround
For the pods/containers that see the
INTERNAL_ERROR
in their logs add the following environment variables to the container spec. Be sure to replace<your-fqdn-prefix>
and<region>
so the aks kube-apiserver FQDN is correct.The text was updated successfully, but these errors were encountered: