Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable publicly available cAdvisor #356

Merged
merged 1 commit into from
Jun 29, 2017

Conversation

luxas
Copy link
Member

@luxas luxas commented Jun 26, 2017

cc @kubernetes/sig-auth-pr-reviews @kubernetes/sig-node-pr-reviews @piosz @DirectXMan12 @mwielgus @kubernetes/sig-cluster-lifecycle-pr-reviews @mtaufen

After chatting with @DirectXMan12 about what the cAdvisor port actually is used for, I realized that we might be able to turn it off. I tested it locally and it worked just fine, heapster could fetch metrics normally.

Problem: Many kubeadm users spawn their clusters on public VMs from providers like DigitalOcean, Packet, GCE or their own solution where the master has a public IP to the internet. Currently, the 4194 cAdvisor port, is exposed publicly to anyone. If the machine has a public IP address; it is publicly exposed to the internet. Most users don't know about this nor that they should block 4194 in their firewalls. However, that's unnecessarily complex when we can disable it on the kubelet side.

Proposed solution: Disabling the publicly available port. Only running cAdvisor internally in the kubelet. cAdvisor metrics are available anyway from {node-ip}:10250/stats, but this time with proper authentication and authorization.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 26, 2017
@piosz
Copy link
Member

piosz commented Jun 26, 2017

+1 for disabling this.
In general we should replace cadvisor in Kubelet by light weight monitoring library.

cc @dashpole

@liggitt
Copy link
Member

liggitt commented Jun 26, 2017

I'm definitely in favor of disabling this, but I'm a little unclear about what guarantees kubeadm makes about changes to enabled services.

@dims
Copy link
Member

dims commented Jun 26, 2017

@luxas @liggitt - seen this change? kubernetes/kubernetes#47195

@liggitt
Copy link
Member

liggitt commented Jun 26, 2017

@luxas @liggitt - seen this change? kubernetes/kubernetes#47195

Yes, I still wouldn't expose unnecessary services by default.

@mtaufen
Copy link
Contributor

mtaufen commented Jun 26, 2017

+1 turning this off. Are these really the only scripts that need to be edited?

Also I've no idea what might break when we disable this, but if we do this as soon as 1.8 opens, it should have plenty of soak time.

@luxas
Copy link
Member Author

luxas commented Jun 26, 2017

@mtaufen Actually I'm targeting this for v1.7...

@dims Yes, but kubelet listens to 0.0.0.0 by default (and it mostly should do, so that API Servers can contact it). Most often there is only one iface anyway; a public one so then that wouldn't change anything.

I tried to test things out, but I saw nothing that broke.
I think this is one of the more undocumented parts of kubernetes, but after grepping 4194 in the codebase I found these comments:

https://github.com/kubernetes/kubernetes/blob/master/test/e2e/cadvisor.go#L68

			// cadvisor is not accessible directly unless its port (4194 by default) is exposed.
			// Here, we access '/stats/' REST endpoint on the kubelet which polls cadvisor internally.
			statsResource := fmt.Sprintf("api/v1/proxy/nodes/%s/stats/", node.Name)
			By(fmt.Sprintf("Querying stats from node %s using url %s", node.Name, statsResource))
			_, err = c.Core().RESTClient().Get().AbsPath(statsResource).Timeout(timeout).Do().Raw()
			if err != nil {
				errors = append(errors, err)
			}

https://github.com/kubernetes/kubernetes/blob/master/test/e2e/proxy.go#L62

		It("should proxy logs on node with explicit kubelet port [Conformance]", func() { nodeProxyTest(f, prefix+"/proxy/nodes/", ":10250/logs/") })
		It("should proxy logs on node [Conformance]", func() { nodeProxyTest(f, prefix+"/proxy/nodes/", "/logs/") })
		It("should proxy to cadvisor", func() { nodeProxyTest(f, prefix+"/proxy/nodes/", ":4194/containers/") })

		It("should proxy logs on node with explicit kubelet port using proxy subresource [Conformance]", func() { nodeProxyTest(f, prefix+"/nodes/", ":10250/proxy/logs/") })
		It("should proxy logs on node using proxy subresource [Conformance]", func() { nodeProxyTest(f, prefix+"/nodes/", "/proxy/logs/") })
		It("should proxy to cadvisor using proxy subresource", func() { nodeProxyTest(f, prefix+"/nodes/", ":4194/proxy/containers/") })

So to me it seems like all consumers (like heapster) use {node}:10250/stats which internally redirects directly to cAdvisor, regardless of whether cAdvisor listens publically on :4194 or not.

@mtaufen
Copy link
Contributor

mtaufen commented Jun 26, 2017

@luxas @dchen1107 would need to make the call on this for 1.7.

@ixdy
Copy link
Member

ixdy commented Jun 26, 2017

@luxas is there an associated issue open in kubernetes/kubernetes?

@luxas
Copy link
Member Author

luxas commented Jun 26, 2017

@ixdy Not in k/k, but here: kubernetes/kubeadm#321

@dchen1107
Copy link
Member

@luxas and I discussed at slack, and we agreed to proceed with this change.

@luxas
Copy link
Member Author

luxas commented Jun 27, 2017

Attack plan is:

  1. Merge Reflect kubeadm-specific kubelet changes in the bazel debs kubernetes#48042 to get a signal from the kubeadm e2e CI
  2. If still green (which I think and hope), merge this one as well
  3. Update docs and release notes
    • It's easy for the user to enable cAdvisor again, just a
      sed -e "/CADVISOR_ARGS=/d" -i /etc/systemd/system/kubelet.service.d/10-kubeadm.conf,
      which will be documented

@luxas
Copy link
Member Author

luxas commented Jun 29, 2017

Both CI at head and CI for v1.7 have been green as before since kubernetes/kubernetes#48042 merged, so this is good to go.

Documentation is here: kubernetes/website#4229

And I'll add a release note about this.

Given the discussion with and approval by @dchen1107 and the other LGTM's here in this thread, I'm merging this to move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants