Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weave agents failing to connect for minikube started Kubernetes RBAC cluster #158

Closed
lilic opened this issue Apr 4, 2018 · 10 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@lilic
Copy link
Contributor

lilic commented Apr 4, 2018

This issue happen when a user uses the platform Kubernetes and environment minikube on weave cloud and their cluster is started with minikube that has RBAC enabled.

If the correct roles are not setup the DNS pods error out because of the permission problems. This is expected behaviour from minikube, and from what I understand will only be fixed for kubeadm starting in a few releases. On weave cloud setting up gets stuck on: Waiting for Weave Cloud agents to connect. When looking at the pod logs the following error actually occurs:

time="2018-04-03T14:48:53Z" level=error msg="Failed to execute kubectl apply: Unable to connect to the server: dial tcp: i/o timeout\nFull output:\nUnable to connect to the server: dial tcp: i/o timeout"

The fix for this is to apply this RBAC manifest file, which gives the correct permissions and the DNS pods are up and running. Because the weave-agent pod does not error out, but simply log the error, the weave-agent pod needs to be deleted/restarted as well for weave cloud process to succeed.

@lilic lilic self-assigned this Apr 4, 2018
@lilic lilic added the bug Something isn't working label Apr 4, 2018
@leth
Copy link
Contributor

leth commented Apr 4, 2018

How do you think we should solve this issue? Detect broken DNS from the bootstrap program and error out, or ask the user whether they'd like us to fix it for them?

@lilic
Copy link
Contributor Author

lilic commented Apr 4, 2018

@leth Yes, I would definitely do that in the bootstrap part. Not sure if we should fix it behind the users back. Think for now we can just check if the DNS pods are up and running, if they error out then I would error out for now as well in the bootstrap part and leave a suggestion that maybe not the correct DNS pods rules are configured and what manifest would need to be applied. But I would leave it up to user to apply the RBAC manifest file, WDYT?

@rade
Copy link
Member

rade commented Apr 4, 2018

This "broken out of the box" behaviour of minikube --extra-config=apiserver.Authorization.Mode=RBAC surely is a bug. Is it recorded somewhere?

@lilic
Copy link
Contributor Author

lilic commented Apr 4, 2018

@rade More details in the following issues. kubernetes/minikube#1734 (comment) and kubernetes/minikube#1722

@rade
Copy link
Member

rade commented Apr 4, 2018

There's also kubernetes/minikube#2510. Looks like there's a fix that involves fiddling with roles rather than having to re-do the DNS config. This would match what we do on GKE.

@rade
Copy link
Member

rade commented Apr 4, 2018

Looks like there's a fix that involves fiddling with roles

i.e.

kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default

@lilic
Copy link
Contributor Author

lilic commented Apr 4, 2018

@rade Yes, that's more or less what the above RBAC manifest that I pasted does as well. https://raw.githubusercontent.com/coreos/prometheus-operator/master/scripts/minikube-rbac.yaml

@rade
Copy link
Member

rade commented Apr 4, 2018

Right. I'd be reluctant to apply something as complicated as that to a user's cluster. But a one-liner would be fine. Indeed, as I mentioned, that's what we do on GKE. HOWEVER, the situation here is different since a) here the user made a conscious decision to enable RBAC, unlike on GKE where it's on by default, and b) here DNS is broken which has nothing to do with the Weave Cloud agent, whereas on GKE only the Weave Cloud agent is broken.

I am a bit puzzled why users are falling into this trap. Are there instructions out there that tell users to run minikube in this way but fail to mention the extra steps required to make DNS work?

@lilic
Copy link
Contributor Author

lilic commented Apr 4, 2018

Exactly, that's why I am reluctant to just "fix" this for the user behind their back. I think giving them an option/nudge might be better. WDYT?

I am a bit puzzled why users are falling into this trap. Are there instructions out there that tell users to run minikube in this way but fail to mention the extra steps required to make DNS work?

In the docs it just tells you how to start with RBAC, but nothing much else if you quickly glance. https://kubernetes.io/docs/getting-started-guides/minikube/ And TBH I have seen other devs fall into this same trap, where maybe at first they did not have a need for DNS working correctly but afterwards they spent some time trying to figure out why things stopped working.

@leth
Copy link
Contributor

leth commented Apr 4, 2018

Exactly, that's why I am reluctant to just "fix" this for the user behind their back. I think giving them an option/nudge might be better. WDYT?

That sounds good; we can make the nudge message helpful :)

I am a bit puzzled why users are falling into this trap. Are there instructions out there that tell users to run minikube in this way but fail to mention the extra steps required to make DNS work?

Did someone mention this happens with kubeadm too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants