Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for --listen-metrics-url parameter #157

Closed
suhrud-kumar opened this issue Apr 26, 2020 · 8 comments
Closed

Adding support for --listen-metrics-url parameter #157

suhrud-kumar opened this issue Apr 26, 2020 · 8 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@suhrud-kumar
Copy link

suhrud-kumar commented Apr 26, 2020

etcd has support for providing separate address for metrics endpoint.

This can be useful if the standard endpoint is configured with mutual (client) TLS authentication, but a load balancer or monitoring service still needs access to the metrics or health check. (etcd api will be served over https but /metrics endpoint will served over http )

Since etcdadm creates etcd in TLS be default, this will be useful for folks who don't to add etcd certs in their monitoring systems (for security reasons).

Currently we implemented a small hack to get around this, but it would be great if this is supported by etcdadm.

Let me know what you guys think, I can try to implement this with some guidance.

References:
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/configuration.md#--listen-metrics-urls
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/monitoring.md#metrics-endpoint

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Apr 27, 2020

I agree, exposing the metrics endpoint would be useful. It's a great idea, thanks for suggesting it @suhrud-kumar!

Since etcdadm creates etcd in TLS be default, this will be useful for folks who don't to add etcd certs in their monitoring systems (for security reasons).

I'm skeptical of this motivation. Can you explain why someone might not want to "add etcd certs in their monitoring system?" (Thinking out loud: The etcd metrics endpoint could be used by a "pull" monitoring system (like Prometheus), and that system would only need a client certificate. The metrics client certificate, as far as I can tell, would have to be the same as the normal client certificate. So, unless etcd authorization is enabled, the monitoring system would have access to etcd data. I see why that's not desirable.)

I found some related discussion here: etcd-io/etcd#8060 (comment)

If you're able to work on it, please use the "/lifecycle active" command. If you have any implementation questions, please ask in #etcdadm channel in Kubernetes slack.

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 27, 2020
@suhrud-kumar
Copy link
Author

I agree, exposing the metrics endpoint would be useful. It's a great idea, thanks for suggesting it @suhrud-kumar!

Just to clarify, metrics endpoint will be exposed by default but in https (if installed using etcdadm), by adding this parameter an additional metrics endpoint will be exposed in http

Since etcdadm creates etcd in TLS be default, this will be useful for folks who don't to add etcd certs in their monitoring systems (for security reasons).

I'm skeptical of this motivation. Can you explain why someone might not want to "add etcd certs in their monitoring system?" (Thinking out loud: The etcd metrics endpoint could be used by a "pull" monitoring system (like Prometheus), and that system would only need a client certificate. The metrics client certificate, as far as I can tell, would have to be the same as the normal client certificate. So, unless etcd authorization is enabled, the monitoring system would have access to etcd data. I see why that's not desirable.)

Yes that's correct, since these certs can be used to access etcd data (thereby k8s secrets), we don't want to use them in other places unless absolutely required. And etcd provides a way to not use them for metrics endpoint (using above parameter).

Before starting working on this, I have a query on implementation. I have thought of adding one of three below arguments to implement this parameter in etcdadm.

  1. --listen-metrics-url
    Use etcd parameter as-is. But user would have to specify complete address of this endpoint including scheme (http or https), ip address (mostly node ip), and port
    (for example --listen-metrics-url http://:2381)

This requires user to know ip address of node for etcd init or join (can be done using automation scripts) but most configurable.

  1. --enable-http-metrics-on-port
    User only needs to specify port, we can create endpoint in http and use node ip by default

  2. --enable-http-metrics
    If user sets this to true, create additional metrics endpoint in http, on port 2381 (since etcd already uses 2379 & 2380) and use node ip by default

@dlipovetsky
Let me know your thoughts on this.

@dlipovetsky
Copy link
Contributor

--listen-metrics-url
Use etcd parameter as-is. But user would have to specify complete address of this endpoint including scheme (http or https), ip address (mostly node ip), and port
(for example --listen-metrics-url http://:2381)

As you point out, this is the most configurable option. And etcdadm can provide a sane default (as it would have to do, if we chose one of the other options).

The URL breaks down into scheme://host:port.

  • Scheme: Has to be http. I don't see any way to provide a cert the metrics URL, so I don't think etcd supports https here. Corect me if I'm wrong, please.
  • Host: I think the default URL host can be equal to the default advertised client URL host.
  • Port: The default port, I think, could be the one "officially" allocated to the etcd exporter in the Prometheus wiki, 9379. (As an aside, 2381 looks like it was once officially allocated, but is not anymore).

@suhrud-kumar
Copy link
Author

suhrud-kumar commented May 7, 2020

The URL breaks down into scheme://host:port.

  • Scheme: Has to be http. I don't see any way to provide a cert the metrics URL, so I don't think etcd supports https here. Corect me if I'm wrong, please.

Well technically it can also be https, and users will have to access endpoint using etcd client certs. But a https /metrics endpoint will be available by default anyway, so this may not be much useful. We can keep default scheme as http.

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 7, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Aug 5, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 4, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

4 participants