Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about controller metrics #4138

Merged
merged 1 commit into from
Jun 26, 2017

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Jun 19, 2017

Add documentation for controller metrics.

cc @chenopis @saad-ali


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 19, 2017
@gnufied gnufied force-pushed the controller-metrics branch from ea94f6d to 4bd17b2 Compare June 19, 2017 22:58
@chenopis chenopis assigned chenopis and unassigned kelseyhightower Jun 19, 2017
@chenopis chenopis requested a review from saad-ali June 19, 2017 23:17
@chenopis chenopis added this to the 1.7 milestone Jun 19, 2017
@gnufied
Copy link
Member Author

gnufied commented Jun 20, 2017

cc @brancz @piosz

@brancz
Copy link
Member

brancz commented Jun 20, 2017

Prometheus configuration wise this looks ok.

I'm not sure what the completeness goal for this PR is. I just did a quick query to figure out the metrics my currently running 1.6.4 kube-controller-manager exposes. This is my result.

@gnufied
Copy link
Member Author

gnufied commented Jun 20, 2017

I agree with @piosz that, we may not want to list all metrics because maintaining such a list is hard and metric names or types can change. I would limit scope of this document to show a sneak preview of what cluster admin can expect from these metrics rather than making the list exhaustive.

@piosz
Copy link
Member

piosz commented Jun 20, 2017

How about writing a doc that is not relevant to only one component (controller-manager) and one collecting pipeline (Prometheus) instead?

@gnufied
Copy link
Member Author

gnufied commented Jun 20, 2017

@piosz I do not disagree with you there, but I do not think such a document should be owned by #sig-storage. Also, prometheus's configuration of kubernetes service discovery has been a moving target between prometheus versions. In fact, I found most internet docs on prometheus kube sd configuration to be subtly wrong. There is stuff like, changing values from arrays to strings, renaming api_servers key to api_server etc. So, if we are going to document collection pipeline, we will have to pick up a version of prometheus and a version of kubernetes and stick with it. I feel, #sig-instrumentation should make such a decision that which prometheus version will be documented.

@gnufied
Copy link
Member Author

gnufied commented Jun 20, 2017

Also, there is bit of opaqueness that needs documenting in some form or the other. Most users of Kubernetes I spoke of, are unware that controller metrics are available at http://localhost:10252/metrics in default configuration. AFAIK - we don't document this anywhere... :(

So I wanted to make that information as part of end user documentation. I think most users can configure the metric pipeline starting from that point onwards. It is tricky to pick a particular version of prometheus and document (at the risk of repeating myself).

@saad-ali
Copy link
Member

CC @bowei @msau42

protocol: TCP
```

After that prometheus's service discovery mechanism can automatically discover controller metrics and scrap them periodically as per configuration. Please refer to [Prometheus Configuration](https://prometheus.io/docs/operating/configuration/) for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrap -> scrape

{% endcapture %}

{% capture body %}
## What are controller metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are controller manager metrics

{% capture body %}
## What are controller metrics

Controller manager metrics provide important insight into performance and health of controller manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... health of the controller manager


Controller manager metrics provide important insight into performance and health of controller manager.
These metrics include common Go language runtime metrics such as go_routine count and controller specif
ic metrics such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specif\nic unintentional newline?

Controller manager metrics provide important insight into performance and health of controller manager.
These metrics include common Go language runtime metrics such as go_routine count and controller specif
ic metrics such as
etcd request latencies or cloudprovider (AWS, GCE, Openstack) api latencies that can be used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api -> API

to gauge health of cluster.

Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and Openstack.
These metrics can be used to monitor health of persistent volume operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics cover all cloud operations, not just PV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for GCE metrics are available for all operations, not for other cloudproviders..


## Configuration

Typically in a cluster, controller metrics are available at - `http://localhost:10252/metrics` assuming
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simpler to just say:

Controller metrics are available from `http://localhost:10252/metrics` from the host where the controller manager is running.

Typically in a cluster, controller metrics are available at - `http://localhost:10252/metrics` assuming
metrics are being retrieved locally from host where controller manager is running.

The metrics are emitted in prometheus format and are human readable (go ahead curl that url!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include link to prometheus format. Remove "(go head and curl that url!)"


The metrics are emitted in prometheus format and are human readable (go ahead curl that url!).

In production environment though - you may want to configure prometheus or some other metrics scraper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a production environment you may ...

protocol: TCP
```

After that prometheus's service discovery mechanism can automatically discover controller metrics and scrap them periodically as per configuration. Please refer to [Prometheus Configuration](https://prometheus.io/docs/operating/configuration/) for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to give a snippet of Prometheus configuration for accessing the service endpoint? I remember it wasn't super obvious from reading the Prometheus docs.

The Prometheus service discovery mechanism can scrape the service endpoint defined above.

Copy link
Member Author

@gnufied gnufied Jun 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible yes, but it varies between versions. I have ran prometheus-1.1 in a pod and it was different from running prometheus-1.7 outside. Also, it depends on things like if controller-manager is running in a pod or not. I am bit hesitant to document something which will be soon out of date and doesn't necessarily aligns with release cycles of Kubernetes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@chenopis chenopis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor grammar nits.


{% capture overview %}
Controller manager metrics provide important insight into performance and health of
controller manager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "the": "...the controller manager."

---

{% capture overview %}
Controller manager metrics provide important insight into performance and health of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "the": "...insight into the performance and health..."

{% capture body %}
## What are controller manager metrics

Controller manager metrics provide important insight into performance and health of the controller manager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "the": "...insight into the performance and health..."

Controller manager metrics provide important insight into performance and health of the controller manager.
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
etcd request latencies or cloudprovider (AWS, GCE, Openstack) API latencies that can be used
to gauge health of cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "the" and "a": "...to gauge the health of a cluster."


Controller manager metrics provide important insight into performance and health of the controller manager.
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
etcd request latencies or cloudprovider (AWS, GCE, Openstack) API latencies that can be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, Cloudprovider should be capitalized here.

Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and Openstack.
These metrics can be used to monitor health of persistent volume operations.

For example for GCE these metrics are called:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comma: "For example, for GCE..."


The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.

In production environment you may want to configure prometheus or some other metrics scraper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "a" and comma: "In a production environment, you may..."

to periodically gather these metrics and make them available in some kind of time series database.


Prometheus itself can gather controller metrics via built-in service discovery mechanism provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "its", "that the", and a comma: "...controller metrics via its built-in service discovery mechanism, provided that the controller's metrics..."

protocol: TCP
```

After that prometheus's service discovery mechanism can automatically discover controller metrics and scrape them periodically as per configuration. Please refer to [Prometheus Configuration](https://prometheus.io/docs/operating/configuration/) for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comma and "the": "After that, prometheus's service discovery mechanism can automatically discover controller metrics and scrape them periodically as per the configuration."

@gnufied
Copy link
Member Author

gnufied commented Jun 21, 2017

@chenopis sorted the requested changes. Thanks for pointing them out!

@chenopis
Copy link
Contributor

No worries. Ping me when we're good on the tech side and I'll merge this.

@gnufied
Copy link
Member Author

gnufied commented Jun 22, 2017

I am hoping we get lgtm from someone in #sig-instrumentation. cc @piosz @brancz

Copy link
Member

@piosz piosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned I'm not a big fan of documenting a subset of metrics from one component in a separate page, but I can't find a better spot as for now. Long term we should introduce a page about monitoring Kubernetes components.

Please address the issue I raised below. Otherwise LGTM

to periodically gather these metrics and make them available in some kind of time series database.


Prometheus itself can gather controller metrics via its built-in service discovery mechanism, provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have this monitoring solution agnostic, so how about skipping the rest of this doc (starting with this line)?

cc @brancz @fabxc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think it would be good to have a document describing metric collection for different mechanisms, but I don't think this document is the place for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I dropped that whole section.

@chenopis
Copy link
Contributor

@gnufied FYI, all feedback must be addressed and LGTMs given by EOD Tue, June 27th so that this can be merged for the 1.7 release on June 28th.

@gnufied
Copy link
Member Author

gnufied commented Jun 26, 2017

@chenopis I believe I have addressed comments by @piosz . PTAL

@gnufied
Copy link
Member Author

gnufied commented Jun 26, 2017

As strange as it may sound but Github just ate a commit I pushed for fixing the problem. Trying to figure out whats going on.

@gnufied gnufied force-pushed the controller-metrics branch 2 times, most recently from 5676dfd to 99ff364 Compare June 26, 2017 17:44
@chenopis
Copy link
Contributor

Oh, GitHub. :(

@chenopis
Copy link
Contributor

@gnufied I see the commit now, so is this ready to be merged then?

@gnufied
Copy link
Member Author

gnufied commented Jun 26, 2017

@chenopis yes, I think Github was having issues. It looks all good now.

@piosz
Copy link
Member

piosz commented Jun 26, 2017

/lgtm

@chenopis chenopis merged commit 3d41757 into kubernetes:release-1.7 Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants