Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Kubernetes objects to be reported just once in a cluster #3274

Merged
merged 8 commits into from
Oct 19, 2018

Conversation

bboreham
Copy link
Collaborator

This set of changes allows you to configure the probe on each node with Kubernetes probing disabled, and run one extra probe with Kubernetes enabled and processes, containers, etc., disabled. This is quite simple to arrange with one DaemonSet and one Deployment.

Benefits:

  • less impact on the Kubernetes api-server
  • less work on each node gathering and reporting the data in each probe
  • less work in the app to merge N copies of identical nodes
  • less network traffic to send them to the app.
    Disappointingly the CPU-usage benefit wasn't huge when I tried it in our staging cluster, but I didn't spend long looking at why.

It has one disruptive change: pods never get tagged with a host-id. The rendering code is changed to find this on a container node. When reporting Kubernetes on just one node in the cluster, it doesn't know what hostID has been given to any other nodes.

Alternatives considered to the above change:

  • Add the Kubernetes node-name to each host node, then have the renderer match them up.
  • use Kubernetes node-name instead of OS-supplied hostname as the host ID. This seems like a better idea, but is a more disruptive change if people are used to seeing a certain format on the screen. Maybe we can separate the internal ID from the on-screen presentation?

@2opremio 2opremio self-requested a review August 14, 2018 16:38
@2opremio
Copy link
Contributor

2opremio commented Aug 15, 2018

The rendering code is changed to find this on a container node

I think this is acceptable and not worse than the other solutions

@2opremio
Copy link
Contributor

2opremio commented Aug 16, 2018

A few comments:

  1. Shouldn't this PR remove https://github.com/weaveworks/scope/blob/052ff39bf1b1e92f3127aeb4cfbe9cf554d105b9/probe/kubernetes/kubelet.go (and its test file)?

  2. I think t's important to make sure that the apps in Weave Cloud are updated ASAP (as I understand it, k8s rendering will break for probes using this code until the counterpart app is updated).

  3. Did you test backwards compatibility with probes using the hostID?

@@ -320,7 +322,7 @@ func setupFlags(flags *flags) {
flag.StringVar(&flags.probe.kubernetesClientConfig.User, "probe.kubernetes.user", "", "The name of the kubeconfig user to use")
flag.StringVar(&flags.probe.kubernetesClientConfig.Username, "probe.kubernetes.username", "", "Username for basic authentication to the API server")
flag.StringVar(&flags.probe.kubernetesNodeName, "probe.kubernetes.node-name", "", "Name of this node, for filtering pods")
flag.UintVar(&flags.probe.kubernetesKubeletPort, "probe.kubernetes.kubelet-port", 10255, "Node-local TCP port for contacting kubelet")
flag.UintVar(&flags.probe.kubernetesKubeletPort, "probe.kubernetes.kubelet-port", 10255, "Node-local TCP port for contacting kubelet (zero to disable)")

This comment was marked as abuse.

This comment was marked as abuse.

@2opremio
Copy link
Contributor

If I understood properly, after this PR:

  1. The normal probes (daemon sets) will run with --probe.kubernetes-tag=true
  2. The deployment gathering k8s information will run with --probe.kubernetes =true

If that's correct I think it would be more intuitive to something like:

  1. Run the daemon set with --probe.kubernetes=true``--probe.kubernetes.role=host-agent
  2. Run the deployment with --probe.kubernetes=true --probe.kubernetes.role=api-server-agent

(we could optionally drop the --probe.kubernetes=true since it's redundant.)

I think that exposing whether it tags or not is an implementation detail which may be confusing for the end user. Indicating what the probe is used for would more useful.

@@ -552,7 +559,7 @@ func (r *Reporter) podTopology(services []Service, deployments []Deployment, dae
}

var localPodUIDs map[string]struct{}
if r.nodeName == "" {
if r.nodeName == "" && r.kubeletPort != 0 {
// We don't know the node name: fall back to obtaining the local pods from kubelet
var err error
localPodUIDs, err = GetLocalPodUIDs(fmt.Sprintf("127.0.0.1:%d", r.kubeletPort))

This comment was marked as abuse.

This enables us to run Kubernetes probing on one node for the whole cluster.
This gives us the option of disabling the function
So they can be reported centrally, find the pod host ID from the child
containers.
@bboreham
Copy link
Collaborator Author

I have rebased and updated the flag settings in line with what @2opremio suggested. Now it is:

  1. Run the daemon set with --probe.kubernetes.role=host
  2. Run the deployment with --probe.kubernetes.role=cluster

Talking directly to kubelet is now disabled in both modes, although the code remains - removing it would be a separate PR.

Now you specify a role instead of controlling the internal behaviour
@bboreham
Copy link
Collaborator Author

I tested this again in our staging cluster; backwards-compatibility is fine.
However I hadn't quite got the host parent propagation right, so have fixed that.

Impact on our staging cluster was about a 10% reduction in CPU usage by probes. In bigger clusters the net impact should be better.

I think this is good to go now.

@2opremio
Copy link
Contributor

Are there followup issues/PRs for:

@bboreham
Copy link
Collaborator Author

#3242 is the latter. Yes we are very keen to update the cloud.weave.works config, but need to do a Scope release first, which was waiting on a review here.

@bboreham bboreham merged commit 8cccbb6 into master Oct 19, 2018
@bboreham bboreham deleted the kubernetes-tagger branch October 19, 2018 15:48
@bboreham bboreham mentioned this pull request Nov 22, 2018
bboreham added a commit that referenced this pull request Feb 13, 2019
We stop the per-host probes talking to Kubernetes and run an extra
Deployment with one more probe process to collect all information for
the cluster, which is less resource-intensive overall.

This feature was added at #3274
bboreham added a commit that referenced this pull request Feb 13, 2019
We stop the per-host probes talking to Kubernetes and run an extra
Deployment with one more probe process to collect all information for
the cluster, which is less resource-intensive overall.

This feature was added at #3274
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants