Prometheus metrics for docker registry & haproxy #3916

jimmidyson · 2015-07-27T09:51:10Z

Most components expose metrics for ingestion into Prometheus. I'd like to see the same for haproxy & the docker registry.

HAProxy should be simple enough, running a container in the same pod using the Prometheus HAProxy exporter (https://github.com/prometheus/haproxy_exporter/).

Does the docker registry expose prometheus metrics natively?

smarterclayton · 2015-08-09T16:16:00Z

For the routers we'd also like to expose tenant metrics so we have a central gathering point. @ramr I would prefer to expose a stats endpoint via prometheus so that we don't have two stat gathering technologies at play. If you think there is something that would work better let me know - raw stats are not as effective and I'd like to correlate service+namespace to the traffic metric.

@ncdc on the registry, i think the answer is no but I think we should do so.

jimmidyson · 2015-08-09T16:21:33Z

We should be able to use Prometheus relabelling to add service & namespace as labels, as long as we can parse out that info from backed/frontend config. Alternatively it would be quite simple to write a custom bridge to add this metadata in.

smarterclayton · 2015-08-09T17:12:10Z

With routers we need to be sensitive to scale - we may have a hundred
thousand or more routes to a single router pair, and in HA setups we'll
want gather from both. We also need to be able to support metrics for
other kinds of front ends like Apache and Nginx, even if we don't do that
initial implementation. It seems like the router manager proc is going to
sample the stats endpoint anyway.

Any alternative solution will have to be scalable and flexible in a similar
way. I know there was a simple HAProxy scraper for Prometheus but I have
no idea what it's gaps would be.

On Aug 9, 2015, at 12:21 PM, Jimmi Dyson notifications@github.com wrote:

We should be able to use Prometheus relabelling to add service & namespace
as labels, as long as we can parse out that info from backed/frontend
config. Alternatively it would be quite simple to write a custom bridge to
add this metadata in.

—
Reply to this email directly or view it on GitHub
#3916 (comment).

ncdc · 2015-08-10T12:42:08Z

Correct, the registry doesn't have any Prometheus integration at the moment. It currently has reporting integration points with bugsnag and newrelic. What is needed to add support for Prometheus? What sort of data are you looking for?

jimmidyson · 2015-08-10T14:35:24Z

@ncdc Details of how to add Prometheus metrics & expose them are at https://godoc.org/github.com/prometheus/client_golang/prometheus. As for what data, I'm not really sure... Anything that can be used to monitor the performance of the registry - that requires knowledge of the internals of the registry I guess. Stuff like response times, number of images per namespace, storage used, etc sound like good candidates, but as I said anything that could be used to monitor the registry, both for alerting on issues & to build trends over time.

ncdc · 2015-08-10T14:46:51Z

@jimmidyson ok, we'll want to ultimately turn this into an upstream proposal for docker/distribution. At the very least, we could probably wrap the main app http.Handler with https://godoc.org/github.com/prometheus/client_golang/prometheus#InstrumentHandler, similar to how they already are doing for bugsnag and newrelic.

jimmidyson · 2015-08-10T15:06:28Z

@ncdc That sounds like a quick (hopefully easy) win.

jimmidyson · 2015-08-10T19:11:03Z

@smarterclayton Here's an example output from Prometheus exporter for HAProxy. There's only one route in there - fabric8 with only one endpoint - 172.17.0.5:9090. You can see that the metrics are labelled appropriately, e.g.:

haproxy_server_bytes_in_total{backend="be_http_default-fabric8",server="172.17.0.5:9090"} 22020

During prometheus relabelling when ingesting metrics, we could roll stats up to namespace (default in this case) & service (fabric8 in the this case), dropping labels we're not interested in, perhaps server (endpoint). We can also aggregate these metrics on ingestion so that we can have stats per namespace, etc. as required.

What do you think? Adding the prometheus haproxy_exporter as a sidecar container in the router pod would be simplest, although can also get it remotely if need be.

smarterclayton · 2015-08-10T19:33:49Z

Sidecar is a good place to start - because that decouples the router
component from the Go code we use (that way you can switch to apache and
you just need to get its own sidecar).

On Mon, Aug 10, 2015 at 3:11 PM, Jimmi Dyson notifications@github.com
wrote:

@smarterclayton https://github.com/smarterclayton Here
http://git.io/v3GTb's an example output from Prometheus exporter for
HAProxy. There's only one route in there - fabric8 with only one endpoint

172.17.0.5:9090. You can see that the metrics are labelled
appropriately, e.g.:

haproxy_server_bytes_in_total{backend="be_http_default-fabric8",server="172.17.0.5:9090"} 22020

During prometheus relabelling when ingesting metrics, we could roll stats
up to namespace (default in this case) & service (fabric8 in the this
case), dropping labels we're not interested in, perhaps server
(endpoint). We can also aggregate these metrics on ingestion so that we can
have stats per namespace, etc. as required.

What do you think? Adding the prometheus haproxy_exporter as a sidecar
container in the router pod would be simplest, although can also get it
remotely if need be.

—
Reply to this email directly or view it on GitHub
#3916 (comment).

Clayton Coleman | Lead Engineer, OpenShift

ramr · 2015-08-10T20:33:41Z

Just saw this - bad filter rules!! Yeah given that there maybe different router implementations - exposing the metrics via some standard interface ala prometheus is definitely better. Just fyi, we do expose the stats host/port for haproxy today, so collecting the metrics is easy enough with a prometheus ${router-type}.exporter sidecar container.
Though that said, the main router command code is sorta generic and that creates the deployment configuration, so adding a sidecar container for one type of router (haproxy) and not for the other might be somewhat klunky. An alternative might be to run the infra router (which runs as the docker container watching for routes/endpoints and launches/reconfigures haproxy) with the collection sidecar code - for the specific plugin type - running in-process rather than outside as a sidecar. That might work better from a process management standpoint as well.

smarterclayton · 2015-08-10T21:02:04Z

Ultimately the router command probably should just be a template. It was
kind of a bridge until we had service accounts and some other tools.

Where possible, I would prefer not to have to have code plugins for the
router, because it requires a much higher bar for 3rd parties.

On Aug 10, 2015, at 4:33 PM, ramr notifications@github.com wrote:

Just saw this - bad filter rules!! Yeah given that there maybe different
router implementations - exposing the metrics via some standard interface
ala prometheus is definitely better. Just fyi, we do expose the stats
host/port for haproxy today, so collecting the metrics is easy enough with
a prometheus ${router-type}.exporter sidecar container.
Though that said, the main router command code is sorta generic and that
creates the deployment configuration, so adding a sidecar container for one
type of router (haproxy) and not for the other might be somewhat klunky. An
alternative might be to run the infra router (which runs as the docker
container watching for routes/endpoints and launches/reconfigures haproxy)
with the collection sidecar code - for the specific plugin type - running
in-process rather than outside as a sidecar. That might work better from a
process management standpoint as well.

—
Reply to this email directly or view it on GitHub
#3916 (comment).

jimmidyson · 2015-08-11T07:50:50Z

@ramr Using the stats HAProxy endpoint & Pormetheus haproxy_exporter as sidecar is exactly how I ingested metrics into Pormetheus - worked nicely & allows us to re-label metrics with namespace & service which is nice.

I prefer the idea of running the exporter as a sidecar container - for one thing, it allows us to swap/upgrade impls if need be without affecting the core infra router code. Also getting fixes/features into exporters as required (which I'm sure there will be) without vendoring & carrying in infra router is going to be simpler.

jimmidyson · 2015-09-22T13:31:13Z

@ramr @smarterclayton Any news on this? I'd like to get this in, but with the current implementation of oadm router cmd this is pretty tricky.

I could make the addition of the prometheus exporter sidecar optional via a flag (defaulted to true?). Also could only add the sidecar if there's a compatible sidecar for the router type so only for haproxy & nginx to begin with.

Thoughts?

ramr · 2015-09-24T03:30:43Z

@jimmidyson - I can only look at it sometime towards the end of next week. But that plan does that sound good - doing it only for the compatible router and a flag to add it in (the default am on the fence about - but true should be ok I think).

jimmidyson · 2015-11-24T22:48:58Z

We have metrics for the router now.

@ncdc Any thoughts on registry metrics?

ncdc · 2015-11-30T14:25:45Z

@pweil- @miminar for registry metrics ideas

ramr · 2015-12-17T19:27:00Z

@danmcp the router bits are complete - I guess the registry bits are pending, so can you please assign to @pweil- or @miminar Thx

miminar · 2015-12-18T13:20:29Z

There is already upstream request for providing prometheus metrics. Which was turned down. The upstream prefers to stay metrics backend agnostic and suggests to process registry log which contains all the information needed.

Registry's logging framework supports a wide variety of logging sinks. We could use another sidetrack container inside the registry pod to process the log and provide the metrics.

Also there are webhooks that could be used to gather metrics. I would have to make a deeper analysis because I'm not sure if it provides all the data needed.

Other ideas?

jimmidyson · 2015-12-18T13:41:33Z

Using logs sounds fine - might be worth looking at https://github.com/google/mtail?

smarterclayton · 2016-02-04T02:07:57Z

Can we easily convert expvar to prometheus? If not, let's just expose a simple prometheus endpoint and collect the metrics we do have.

jimmidyson · 2016-02-04T08:17:05Z

Prometheus does have an expvar collector (https://godoc.org/github.com/prometheus/client_golang/prometheus#ExpvarCollector):

ExpvarCollector collects metrics from the expvar interface. It provides a quick way to expose numeric values that are already exported via expvar as Prometheus metrics. Note that the data models of expvar and Prometheus are fundamentally different, and that the ExpvarCollector is inherently slow. Thus, the ExpvarCollector is probably great for experiments and prototying, but you should seriously consider a more direct implementation of Prometheus metrics for monitoring production systems.

I guess we'd need to quantify what slow means & what the impact is. It's a shame we can't do more direct instrumentation of course.

smarterclayton · 2016-02-04T15:05:53Z

We could simply have the prometheus expvar collector shim inside of the
registry code.

On Thu, Feb 4, 2016 at 3:17 AM, Jimmi Dyson notifications@github.com
wrote:

Prometheus does have an expvar collector (
https://godoc.org/github.com/prometheus/client_golang/prometheus#ExpvarCollector
):

ExpvarCollector collects metrics from the expvar interface. It provides a
quick way to expose numeric values that are already exported via expvar as
Prometheus metrics. Note that the data models of expvar and Prometheus are
fundamentally different, and that the ExpvarCollector is inherently slow.
Thus, the ExpvarCollector is probably great for experiments and prototying,
but you should seriously consider a more direct implementation of
Prometheus metrics for monitoring production systems.

I guess we'd need to quantify what slow means & what the impact is. It's a
shame we can't do more direct instrumentation of course.

—
Reply to this email directly or view it on GitHub
#3916 (comment).

smarterclayton · 2017-04-12T19:21:25Z

Router now has metrics as of v3.6.0-alpha.1. Registry is in the process of getting some.

pweil- · 2017-06-26T09:35:50Z

registry metrics implemented in #12711

danmcp added component/metrics kind/enhancement priority/P2 labels Aug 18, 2015

danmcp assigned ramr Aug 19, 2015

jimmidyson mentioned this issue Sep 23, 2015

Add prometheus exporter to haproxy router #4766

Merged

pweil- assigned miminar and unassigned ramr Dec 17, 2015

pweil- closed this as completed Jun 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus metrics for docker registry & haproxy #3916

Prometheus metrics for docker registry & haproxy #3916

jimmidyson commented Jul 27, 2015

smarterclayton commented Aug 9, 2015

jimmidyson commented Aug 9, 2015

smarterclayton commented Aug 9, 2015

ncdc commented Aug 10, 2015

jimmidyson commented Aug 10, 2015

ncdc commented Aug 10, 2015

jimmidyson commented Aug 10, 2015

jimmidyson commented Aug 10, 2015

smarterclayton commented Aug 10, 2015

ramr commented Aug 10, 2015

smarterclayton commented Aug 10, 2015

jimmidyson commented Aug 11, 2015

jimmidyson commented Sep 22, 2015

ramr commented Sep 24, 2015

jimmidyson commented Nov 24, 2015

ncdc commented Nov 30, 2015

ramr commented Dec 17, 2015

miminar commented Dec 18, 2015

jimmidyson commented Dec 18, 2015

smarterclayton commented Feb 4, 2016

jimmidyson commented Feb 4, 2016

smarterclayton commented Feb 4, 2016

smarterclayton commented Apr 12, 2017

pweil- commented Jun 26, 2017

Prometheus metrics for docker registry & haproxy #3916

Prometheus metrics for docker registry & haproxy #3916

Comments

jimmidyson commented Jul 27, 2015

smarterclayton commented Aug 9, 2015

jimmidyson commented Aug 9, 2015

smarterclayton commented Aug 9, 2015

ncdc commented Aug 10, 2015

jimmidyson commented Aug 10, 2015

ncdc commented Aug 10, 2015

jimmidyson commented Aug 10, 2015

jimmidyson commented Aug 10, 2015

smarterclayton commented Aug 10, 2015

ramr commented Aug 10, 2015

smarterclayton commented Aug 10, 2015

jimmidyson commented Aug 11, 2015

jimmidyson commented Sep 22, 2015

ramr commented Sep 24, 2015

jimmidyson commented Nov 24, 2015

ncdc commented Nov 30, 2015

ramr commented Dec 17, 2015

miminar commented Dec 18, 2015

jimmidyson commented Dec 18, 2015

smarterclayton commented Feb 4, 2016

jimmidyson commented Feb 4, 2016

smarterclayton commented Feb 4, 2016

smarterclayton commented Apr 12, 2017

pweil- commented Jun 26, 2017