Deploy ingress-nginx ServiceMonitor in Rancher v2.5 for scraping by Monitoring v2 #30126

axeal · 2020-11-18T09:42:50Z

What kind of request is this (question/bug/enhancement/feature request):

Enhancement

Steps to reproduce (least amount of steps as possible):

ingress-nginx-controller metrics are not scraped by default with Monitoring v2 in Rancher v2.5. For Rancher provisioned clusters with the ingress-nginx ingress controller, would it be possible to auto-deploy a ServiceMonitor and metrics Service similar to the below, which will enable scraping by Prometheus:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  endpoints:
  - interval: 30s
    port: metrics
  namespaceSelector:
    matchNames:
    - ingress-nginx
  selector:
    matchLabels:
      app: ingress-nginx
---
apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx-controller-metrics
  namespace: ingress-nginx
  labels:
    app: ingress-nginx
spec:
  ports:
  - name: metrics
    port: 9913
    protocol: TCP
    targetPort: 10254
  selector:
    app: ingress-nginx
  type: ClusterIP

Environment information

Rancher v2.5.2

gz#11621

gz#13981

The text was updated successfully, but these errors were encountered:

aiyengar2 · 2021-02-24T02:03:32Z

@axeal, yes this is possible. Thanks for providing the Service / ServiceMonitor!

This issue will also encompass adding Grafana Dashboards, sourced from nginx.json and request-handling-performance.json

k8w · 2021-03-01T13:00:00Z

@aiyengar2 Where to add these json files ?
Can it be finished in Grafana GUI ?

aiyengar2 · 2021-03-01T19:29:17Z

@k8w you can add these JSON files via the Rancher UI, here are the relevant docs: https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/v2.5/persist-grafana/.

Since you already have the JSON, all you need to do is the last step.

thehejik · 2021-03-15T13:03:59Z

@aiyengar2 @sowmyav27 could you please point me how to install the enhanced chart? IIUC it should be available in rancher/charts.git in dev-v2.5 branch which is present in my cluster (well, it is https://git.rancher.io/charts - is it the same?) but latest monitoring chart version in the v2 dropdown is 9.4.203-rc04 even after the repo refresh. I was trying with rancher:v2.5-head. The new monitoring chart version should be 9.4.204-rc01.

Should I clone the charts.git locally and deploy it manually or is there a way how to do it from webUI?

thehejik · 2021-03-15T17:16:22Z

I just successfully installed the testing monitoring chart from my local git checkout.

git clone https://git.rancher.io/charts --branch dev-v2.5
cd charts/charts/rancher-monitoring/rancher-monitoring-crd/9.4.204-rc01
helm install rancher-monitoring-crd ./
cd ../../rancher-monitoring/9.4.204-rc01/
helm install rancher-monitoring ./

aiyengar2 · 2021-03-15T17:42:28Z

@thehejik weird, I see https://github.com/rancher/charts/tree/dev-v2.5/assets/rancher-monitoring contains 9.4.204-rcXX versions so it should be working. But the approach you did via bash works too.

sowmyav27 · 2021-03-15T17:57:35Z

@thehejik The RC version of the chart is not available on 2.5-head because the rancher charts by default point to release-v2.5. We can manually change it to point to dev-v2.5 now and proceed with testing. I have asked it to be looked at by the dev.

thehejik · 2021-03-16T11:58:58Z

Thanks, I had some problems with DNS but now it works correctly.

The feature is working as described but so far user has to set ingressNginx.enabled=true rancher/dashboard#2485 in the chart value YAML file (Apps -> Monitoring -> General -> Edit as YAML).

The corresponding ServiceMonitor and Service are being created
New monitoring-ingress targets are all UP in Prometheus: ingress-nginx/rancher-monitoring-ingress-nginx/0 (3/3 up)
There is a new Grafana dashboard named NGINX / Ingress Controller available

I was testing on rancher 2.5.7 with v2.5-head image tag and used http://git.rancher.io/charts dev-v2.5 branch. Monitoring helm chart was in version 9.4.204-rc04.

Edit: tested on rancher v2.5-868c715554ed7b2de7e7b89595487df66c428a41-head

h0jeZvgoxFepBQ2C · 2021-05-06T09:27:01Z

I just updated our 2.5.7 rancher installation to 2.5.8, and updated also our rancher monitoring, but can't get it running somehow.

I've enabled ingressNginx.enabled: true, and also set a different namespace (we installed the nginx ingress controller into a "default" namespace, but get only following error in the end:

Error: UPGRADE FAILED: cannot patch "rancher-monitoring-alertmanager.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-k8s.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-apiserver-availability.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-apiserver-slos" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-apiserver.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.
cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-prometheus-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-prometheus-node-recording.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-state-metrics" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubelet.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubernetes-apps" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubernetes-resources" with kind PrometheusRule: Internal error occurred: failed
calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubernetes-storage" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused

quangthe · 2021-06-29T04:03:27Z

You need to define selector for rancher-monitoring-ingress-nginx service in order to collect metric from ingress nginx controller pod

ingressNginx:
  enabled: true
  namespace: ingress-nginx
  service:
    port: 9913
    targetPort: 10254
    selector:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/name: ingress-nginx
  serviceMonitor:
    interval: 1m
    metricRelabelings: []
    relabelings: []

aiyengar2 · 2021-06-30T16:17:35Z

I just updated our 2.5.7 rancher installation to 2.5.8, and updated also our rancher monitoring, but can't get it running somehow.

I've enabled ingressNginx.enabled: true, and also set a different namespace (we installed the nginx ingress controller into a "default" namespace, but get only following error in the end:

Error: UPGRADE FAILED: cannot patch "rancher-monitoring-alertmanager.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-k8s.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-apiserver-availability.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-apiserver-slos" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-apiserver.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.
cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-prometheus-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-prometheus-node-recording.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kube-state-metrics" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubelet.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubernetes-apps" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubernetes-resources" with kind PrometheusRule: Internal error occurred: failed
calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused 
&& cannot patch "rancher-monitoring-kubernetes-storage" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://rancher-monitoring-operator.cattle-monitoring-system.svc:443/admission-prometheusrules/validate?timeout=30s": dial tcp 10.245.26.67:443: connect: connection refused

@h0jeZvgoxFepBQ2C see #32416 (comment). If you re-trigger an install, it should successfully install.

h0jeZvgoxFepBQ2C · 2021-09-04T22:14:04Z

You need to define selector for rancher-monitoring-ingress-nginx service in order to collect metric from ingress nginx controller pod

ingressNginx:
  enabled: true
  namespace: ingress-nginx
  service:
    port: 9913
    targetPort: 10254
    selector:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/name: ingress-nginx
  serviceMonitor:
    interval: 1m
    metricRelabelings: []
    relabelings: []

Do you have a link where this is documented? thx!

aiyengar2 · 2021-09-08T23:02:20Z

Hmm @thehejik when we tested this did we need to update the selector?

@quangthe sorry for the late ask, but could you share your environment information that required you to provide a selector in #30126 (comment) so we can attempt to reproduce? Specifically:

Rancher version: ``
Kubernetes Version: ``
Type of cluster
- Custom: Running a docker command on a node
- Imported: Running kubectl apply onto an existing k8s cluster
- EKS
- GKE
- AKS
- Infrastructure Provider: Rancher provisioning the nodes using different node drivers (e.g. AWS, Digital Ocean, etc)
- Other: ``
Monitoring Chart Version: ``

Based on the labels shown in https://github.com/kubernetes/ingress-nginx/blob/8b3a6f02526e8bcf8b6fff2bf8d3613e20211b15/deploy/static/provider/cloud/deploy.yaml#L298, it seems like you are right that the default should be to use app.kubernetes.io/name: ingress-nginx instead of app: ingress-nginx, as used in Rancher's chart.

But I wonder if RKE / RKE2 still deploy ingress-nginx with the other label, which is why we left it as app: ingress-nginx.

thehejik · 2021-09-09T08:14:09Z

@aiyengar2 I don't think I was using the proposed selector, on the other hand I'm not sure if metrics from ingress were shown.

aiyengar2 · 2021-09-09T16:37:50Z

Hmm in that case, @sowmyav27 I'm re-opening this issue and putting it to test.

We need to confirm that setting ingressNginx.enabled: true shows the ServiceMonitor on the Prometheus UI for targets and that it has active targets that are emitting metrics.

We should also double check that the Grafana dashboards are eventually populated with metrics if an ingress is added to the cluster.

thehejik · 2021-09-10T12:53:26Z

Retested on 2.5.9 with v2 monitoring 14.5.100 from official release-v2.5 chart repo on RKE1 cluster.

I was using default values for the chart installation where ingressNginx.enabled: true is enabled by default (switched to Edit as YAML to validate). I didn't change any selector value - there is still only the original label app: nginx-ingress set on nginx-ingress-controller-* pods.

This is what was used for monitoring installation:

ingressNginx:
  enabled: true
  namespace: ingress-nginx
  service:
    port: 9913
    targetPort: 10254
  serviceMonitor:
    interval: ''
    metricRelabelings: []
    relabelings: []

Grafana
For testing purposes I've deployed an nginx deployement and created L7 Ingress entry for the nginx using default nginx-ingress as loadbalancer, then I was doing curl <nginx_lb> in a endless loop while checking the grafana board NGINX / Ingress Controller and its graphs. The metrics data were being processed there:

Prometheus

ServiceMonitor for rancher-monitoring-ingress-nginx exists:
Ingress related targets are up, healthy and are giving data:

I'd say everything is working as expected with default values.

aiyengar2 · 2021-09-10T15:19:37Z

Perfect! Thanks for checking @thehejik.

@h0jeZvgoxFepBQ2C @quangthe if you are still encountering this issue, please let us know your environment information! Perhaps there’s something we could document that is specific to the environment that you are deploying Monitoring into that requires selectors, but by default updating it should not be required.

axeal added internal enhancement labels Nov 18, 2020

aiyengar2 mentioned this issue Feb 24, 2021

[Monitoring V2] RFE: Ingress Controller Metrics #31451

Closed

aiyengar2 self-assigned this Feb 24, 2021

aiyengar2 added this to the v2.5.7 milestone Feb 24, 2021

aiyengar2 added the [zube]: Next Up label Feb 24, 2021

aiyengar2 added [zube]: Working and removed [zube]: Next Up labels Mar 1, 2021

aiyengar2 added [zube]: Review and removed [zube]: Working labels Mar 4, 2021

This was referenced Mar 4, 2021

Support NGINX Ingress Metrics rancher/charts#1029

Merged

[Monitoring V2] Set ingressNginx.enabled to true if cluster type is RKE1 rancher/dashboard#2485

Closed

sowmyav27 assigned slickwarren Mar 5, 2021

sowmyav27 added the QA/S label Mar 8, 2021

deniseschannon added kind/enhancement Issues that improve or augment existing functionality and removed enhancement labels Mar 8, 2021

aiyengar2 added [zube]: To Test and removed [zube]: Review labels Mar 9, 2021

sowmyav27 assigned thehejik and unassigned slickwarren Mar 11, 2021

sowmyav27 added [zube]: QA Next up and removed [zube]: To Test labels Mar 12, 2021

thehejik added [zube]: QA Working and removed [zube]: QA Next up labels Mar 15, 2021

thehejik closed this as completed Mar 16, 2021

thehejik added [zube]: Done and removed [zube]: QA Working labels Mar 16, 2021

daddz mentioned this issue Apr 22, 2021

rancher-monitoring installation fails in v2.5.8-rc8 when nginx-ingress is disabled #32287

Closed

zube bot removed the [zube]: Done label Jun 15, 2021

aiyengar2 reopened this Sep 9, 2021

aiyengar2 modified the milestones: v2.5.8, v2.5.10 Sep 9, 2021

Jono-SUSE-Rancher added the [zube]: To Test label Sep 9, 2021

aiyengar2 closed this as completed Sep 10, 2021

zube bot added [zube]: Done and removed [zube]: To Test labels Sep 10, 2021

zube bot removed the [zube]: Done label Dec 10, 2021

Rhymen mentioned this issue Feb 13, 2023

[BUG] Rancher Monitoring NGINX Controller Dashboard only works partially #40517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy ingress-nginx ServiceMonitor in Rancher v2.5 for scraping by Monitoring v2 #30126

Deploy ingress-nginx ServiceMonitor in Rancher v2.5 for scraping by Monitoring v2 #30126

axeal commented Nov 18, 2020 •

edited by jambajaar

Loading

aiyengar2 commented Feb 24, 2021 •

edited

Loading

k8w commented Mar 1, 2021

aiyengar2 commented Mar 1, 2021

thehejik commented Mar 15, 2021

thehejik commented Mar 15, 2021

aiyengar2 commented Mar 15, 2021

sowmyav27 commented Mar 15, 2021

thehejik commented Mar 16, 2021 •

edited

Loading

h0jeZvgoxFepBQ2C commented May 6, 2021 •

edited

Loading

quangthe commented Jun 29, 2021 •

edited

Loading

aiyengar2 commented Jun 30, 2021

h0jeZvgoxFepBQ2C commented Sep 4, 2021

aiyengar2 commented Sep 8, 2021

thehejik commented Sep 9, 2021

aiyengar2 commented Sep 9, 2021

thehejik commented Sep 10, 2021 •

edited

Loading

aiyengar2 commented Sep 10, 2021

Deploy ingress-nginx ServiceMonitor in Rancher v2.5 for scraping by Monitoring v2 #30126

Deploy ingress-nginx ServiceMonitor in Rancher v2.5 for scraping by Monitoring v2 #30126

Comments

axeal commented Nov 18, 2020 • edited by jambajaar Loading

aiyengar2 commented Feb 24, 2021 • edited Loading

k8w commented Mar 1, 2021

aiyengar2 commented Mar 1, 2021

thehejik commented Mar 15, 2021

thehejik commented Mar 15, 2021

aiyengar2 commented Mar 15, 2021

sowmyav27 commented Mar 15, 2021

thehejik commented Mar 16, 2021 • edited Loading

h0jeZvgoxFepBQ2C commented May 6, 2021 • edited Loading

quangthe commented Jun 29, 2021 • edited Loading

aiyengar2 commented Jun 30, 2021

h0jeZvgoxFepBQ2C commented Sep 4, 2021

aiyengar2 commented Sep 8, 2021

thehejik commented Sep 9, 2021

aiyengar2 commented Sep 9, 2021

thehejik commented Sep 10, 2021 • edited Loading

aiyengar2 commented Sep 10, 2021

axeal commented Nov 18, 2020 •

edited by jambajaar

Loading

aiyengar2 commented Feb 24, 2021 •

edited

Loading

thehejik commented Mar 16, 2021 •

edited

Loading

h0jeZvgoxFepBQ2C commented May 6, 2021 •

edited

Loading

quangthe commented Jun 29, 2021 •

edited

Loading

thehejik commented Sep 10, 2021 •

edited

Loading