Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] tlsConfig support for servicemonitors and default alertingEndpoints to work with istio strict mtls #145

Closed
infa-ddeore opened this issue Sep 27, 2020 · 27 comments
Labels
enhancement New feature or request lifecycle/stale

Comments

@infa-ddeore
Copy link

infa-ddeore commented Sep 27, 2020

Is your feature request related to a problem? Please describe.
prometheus stack installed with istio sidecar with STRICT mtls policy

  1. prometheus cant scrape grafana, alertmanager, kube-state-metrics and prometheus-operator
  2. prometheus cant connect to alertmanager

Describe the solution you'd like

  1. Provide scheme and mtls options in grafana, alertmanager, kube-state-metrics and prometheus-operator servicemonitor definition
  2. provide an option to update alertmanager's scheme and tlsconfig

Describe alternatives you've considered

  1. manually update the default servicemonitors with https scheme and tlsConfig using istio tls certs
  2. use prometheus. prometheusSpec.alertingEndpoints https and tlsconfig

Additional context
below is the override values (considering "prom" is the installed helm chart name, eg. helm install prom) that makes istio write the tls cert on the memory volume and prometheus mounts that volume
NOTE:

Prerequisites:

  1. install istio 1.6.1 or higher
  2. enable istio sidecar injection (eg. label the namespace with istio-injection=enabled)

Istio 1.6.1 and higher needs to be installed to make it work, prometheus will have istio certs mounted under /etc/istio-certs/ dir, once PeerAuthentication is set to STRICT, prometheus will fail to scrape connect to other components like grafana, operator, alertmenager.

prometheusOperator:
  podAnnotations:
    sidecar.istio.io/rewriteAppHTTPProbers: "true"
  admissionWebhooks:
    patch:
      podAnnotations:
        sidecar.istio.io/inject: "false"

grafana:
  podAnnotations:
    sidecar.istio.io/rewriteAppHTTPProbers: "true"

prometheus:
  prometheusSpec:
    logLevel: info
    alertingEndpoints:
    - name: prom-prometheus-operator-alertmanager
      namespace: default
      port: web
      scheme: https
      pathPrefix: /
      apiVersion: v2
      tlsConfig:
        caFile: /etc/istio-certs/root-cert.pem
        keyFile: /etc/istio-certs/key.pem
        certFile: /etc/istio-certs/cert-chain.pem
        insecureSkipVerify: true    
    volumeMounts:
    - name: istio-certs
      mountPath: /etc/istio-certs/
      readOnly: true
    podMetadata:
      annotations:
        sidecar.istio.io/rewriteAppHTTPProbers: "true"
        sidecar.istio.io/userVolume: '[{"name": "istio-certs", "emptyDir": {"medium": "Memory"}}]'
        sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs", "mountPath": "/etc/istio-certs/"}]'

        proxy.istio.io/config: |-
          proxyMetadata:
            OUTPUT_CERTS: /etc/istio-certs/     

alertmanager:
  alertmanagerSpec:
    podMetadata:
      annotations:
        sidecar.istio.io/rewriteAppHTTPProbers: "true"

  serviceMonitor:
    tlsConfig:
      caFile: /etc/istio-certs/root-cert.pem
      keyFile: /etc/istio-certs/key.pem
      certFile: /etc/istio-certs/cert-chain.pem
      insecureSkipVerify: true

kube-state-metrics:
  podAnnotations:
    sidecar.istio.io/rewriteAppHTTPProbers: "true"

edit servicemonitor to make prometheus scrape the target over https with tlsconfig as below kubectl edit servicemonitors.monitoring.coreos.com prom-prometheus-operator-alertmanager, now prometheus can scrape the target over mtls.

spec:
  endpoints:
  - path: /metrics
    port: web
    scheme: https
    tlsConfig:
      caFile: /etc/istio-certs/root-cert.pem
      certFile: /etc/istio-certs/cert-chain.pem
      insecureSkipVerify: true
      keyFile: /etc/istio-certs/key.pem
  namespaceSelector:
@infa-ddeore infa-ddeore added the enhancement New feature or request label Sep 27, 2020
@infa-ddeore infa-ddeore changed the title [kube-prometheus-stack] tlsConfig support for servicemonitors and default alertingEndpoints [kube-prometheus-stack] tlsConfig support for servicemonitors and default alertingEndpoints to work with istio strict mtls Sep 27, 2020
@stale
Copy link

stale bot commented Oct 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@infa-ddeore
Copy link
Author

@bismarck @gianrubio @gkarthiks @scottrigby @vsliouniaev @Xtigyro
could you take a look at this feature request?

it will help many who want to use promeththeus operator with istio STRICT mTLS between prometheus component

@stale stale bot removed the lifecycle/stale label Oct 27, 2020
@bismarck
Copy link
Contributor

@infa-ddeore We would be happy to accept a PR implementing this feature.

@stale
Copy link

stale bot commented Nov 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale
Copy link

stale bot commented Dec 10, 2020

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Dec 10, 2020
@stevehipwell
Copy link
Contributor

Is nobody still interested in this?

@RenaudDemarneffe
Copy link

Yes, I'm very interested in this.

@uvw
Copy link

uvw commented Jan 13, 2021

I end up using the Helm post-renderer approach with kustomize to patch the chart on the fly. Here is a distilled example repo: https://github.com/uvw/kube-prometheus-stack-istio.

@stevehipwell
Copy link
Contributor

Looking into the Istio code and docs further it looks like Metric merging is the supported way of handling metrics with MTLS set to strict. I've asked for clarification on the correct pattern to implement this.

@Crazyigor1987
Copy link

@infa-ddeore have you tested it with the latest kube-prometheus-stack?
I've followed your settings, but unfortunately the prometheus-container cannot load the istio-certs, because they have UID 1337 instead of 1000. So I get a "permission denied".

@stevehipwell
Copy link
Contributor

@Crazyigor1987 have you set a fsGroup for the containers?

@Crazyigor1987
Copy link

@stevehipwell THANK YOU, that was the right keyword! Finally kube-prometheus-stack works like I wanted it to.

@kharouny
Copy link

How come Prometheus needs the TLS configs for the Service Monitors if it has the side-car already injected? Shouldn't that allow scraping to happen over plain http?

@infa-ddeore
Copy link
Author

How come Prometheus needs the TLS configs for the Service Monitors if it has the side-car already injected? Shouldn't that allow scraping to happen over plain http?

prometheus scrapes pod IPs directly, istio doesnt interfere the connections talking directly to the pod IPs

check this discussion, specifically this comment

@stevehipwell
Copy link
Contributor

You can use metric relabelling (we do this currently) to have the Istio proxy allow non mTLS scraping but you have to live with all metrics being associated with the Istio proxy pod. If Istio added originating pod labels we could re-write the metrics to get them back under the correct pod, this would be a pretty good solution.

@shadjac
Copy link

shadjac commented Apr 14, 2021

You can use metric relabelling (we do this currently) to have the Istio proxy allow non mTLS scraping but you have to live with all metrics being associated with the Istio proxy pod. If Istio added originating pod labels we could re-write the metrics to get them back under the correct pod, this would be a pretty good solution.

@stevehipwell This sounds interesting. Would you please give more details about this approach? Thanks already for the idea!

@stevehipwell
Copy link
Contributor

@shadjac you can setup Prometheus Operator for Istio via the control plane (istiod) service monitor and data plane (istio-proxy) pod monitor from the example template. Once you've got this set up you can enable metrics merging to collect your container metrics directly from the data plane. With merging enabled the data plane will merge pod metrics if you annotate it with the standard (non-operator) Prometheus annotations (prometheus.io/scrape, prometheus.io/path & prometheus.io/port).

@shadjac
Copy link

shadjac commented Apr 15, 2021

@stevehipwell Thanks for elaborating this. However, in our case, the application pushes its metrics to port 8080 at path /metrics/prometheus. I assume the default values of the annotations are -

prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"

and they cant be changed.
Can I still make it work?

@stevehipwell
Copy link
Contributor

@shadjac the annotations that you add to your pod should be set to your values, the istio-proxy will re-write the annotations to it's values (which will be ignored by Prometheus Operator) but it will merge the metrics from your endpoint with its own to be collected by the pod monitor.

@anoop-l
Copy link

anoop-l commented May 4, 2021

@infa-ddeore we implemented the way you suggested but we cannot read the cert/key files mounted. I can see them at the path /etc/istio-certs.

err="error creating HTTP client: unable to load specified CA cert /etc/istio-certs/root-cert.pem: open /etc/istio-certs/root-cert.pem: permission denied" scrape_pool=serviceMonitor/monitoring/test/0

Do we have to do something with the runAsUser. It's set to 1000 now. Should it be 0?

@infa-ddeore
Copy link
Author

@infa-ddeore we implemented the way you suggested but we cannot read the cert/key files mounted. I can see them at the path /etc/istio-certs.

err="error creating HTTP client: unable to load specified CA cert /etc/istio-certs/root-cert.pem: open /etc/istio-certs/root-cert.pem: permission denied" scrape_pool=serviceMonitor/monitoring/test/0

Do we have to do something with the runAsUser. It's set to 1000 now. Should it be 0?

I haven't come across such issue, may be the "kube-prometheus-stack" I used had different user id which allowed the access.
can you try fsGroup as per #145 (comment) or https://stackoverflow.com/a/43545277/3397719 ?

@anoop-l
Copy link

anoop-l commented May 4, 2021

Thanks @infa-ddeore
Finally I got it working with user ID 1337 which was used by Istio
securityContext:
runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337
fsGroup: 1337

@frankjkelly
Copy link

Can someone comment why insecureSkipVerify is set to true? Is that required for this to work and if so why is that? Thanks in advance

@infa-ddeore
Copy link
Author

Can someone comment why insecureSkipVerify is set to true? Is that required for this to work and if so why is that? Thanks in advance

yes, it is required because the cert is signed by istio and prometheus wont trust it

@frankjkelly
Copy link

Thank you @infa-ddeore

@TamasNeumer
Copy link

TamasNeumer commented Aug 30, 2021

@infa-ddeore

Update:
On the AlertManager I started excluding proxying on the 9094and this enabled the alertmanager instances to do the initial sync among each other.

Now my problem is that Prometheus tries to fire an alert by sending it to AlertManager over HTTPS, who apperently doesn't like this...

"https://10.9.13.20:9093/api/v2/alerts\": http: server gave HTTP response to HTTPS client" Any ideas how I can force AlertManager to cooperate?

Background: As described here I'm trying to force Prometheus to use mTLS when communicating with the AlertManager

====

I have followed your guide, however the AlertManagers don't seem to be able to discover each other. I get errors like these:

Failed to join 10.9.12.9: read tcp 10.9.11.9:56142->10.9.12.9:9094: read: connection reset by peer

I believe it happens when an AlertManagers (I have three instances running) starts up, fetches the IPs of the other two and tries syncing with them.

Any idea why this might happen?

As you previously describe pod to pod communication when using IPs should not be interfered by the Istio sidecar.

In the envoy proxy config of one of the AlertManagers I see the following:

10.8.30.232   9093  Trans: raw_buffer; App: HTTP  Route: alertmanager-main.monitoring.svc.cluster.local:9093
10.8.30.232   9093  ALL                           Cluster: outbound|9093||alertmanager-main.monitoring.svc.cluster.local
10.9.11.9     9093  Trans: raw_buffer; App: HTTP  Route: alertmanager-operated.monitoring.svc.cluster.local:9093
10.9.11.9     9093  ALL                           Cluster: outbound|9093||alertmanager-operated.monitoring.svc.cluster.local
10.9.12.9     9093  Trans: raw_buffer; App: HTTP  Route: alertmanager-operated.monitoring.svc.cluster.local:9093
10.9.12.9     9093  ALL                           Cluster: outbound|9093||alertmanager-operated.monitoring.svc.cluster.local
10.9.13.10    9093  Trans: raw_buffer; App: HTTP  Route: alertmanager-operated.monitoring.svc.cluster.local:9093
10.9.13.10    9093  ALL                           Cluster: outbound|9093||alertmanager-operated.monitoring.svc.cluster.local
10.9.11.9     9094  ALL                           Cluster: outbound|9094||alertmanager-operated.monitoring.svc.cluster.local
10.9.12.9     9094  ALL                           Cluster: outbound|9094||alertmanager-operated.monitoring.svc.cluster.local
10.9.13.10    9094  ALL                           Cluster: outbound|9094||alertmanager-operated.monitoring.svc.cluster.local

@mjhoffman65
Copy link

mjhoffman65 commented Dec 6, 2021

Can this be re-opened?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lifecycle/stale
Projects
None yet
Development

No branches or pull requests