Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][opensearch] cant connect to metrics endpoint #590

Open
gabriel-suela opened this issue Sep 5, 2024 · 11 comments
Open

[BUG][opensearch] cant connect to metrics endpoint #590

gabriel-suela opened this issue Sep 5, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@gabriel-suela
Copy link

Describe the bug
Im trying to see the metrics for opensearch but it seems that it isn't working.

To Reproduce

serviceMonitor:
  enabled: true

image

Expected behavior
See all metrics

Chart Name
Specify the Chart which is affected?

Screenshots
image

Host/Environment (please complete the following information):

  • Helm Version: 3.15.4
  • Kubernetes Version: 1.29.6

Additional context
Also, prometheus is alerting that the metrics are unreachable
image

@gabriel-suela gabriel-suela added bug Something isn't working untriaged Issues that have not yet been triaged labels Sep 5, 2024
@BernhardGruen
Copy link

Same problem here. Thank you for your report.

@eyenx
Copy link
Contributor

eyenx commented Sep 10, 2024

TBH this looks more like a configuration issue. I have the same problem. Looking into it, probably opensearch is either missing a configuration option or even the prometheus-exporter-plugin

@eyenx
Copy link
Contributor

eyenx commented Sep 10, 2024

Looking at what port is by default configured (9600) I'd think the idea is to use the performance analyzer which is already included and installed as plugin on the opensearch container, but not enabled for what I can tell
https://opensearch.org/docs/latest/monitoring-your-cluster/pa/index/

@eyenx
Copy link
Contributor

eyenx commented Sep 10, 2024

Okay, when trying to start it inside the container I just see that the address is already bound but not reachable.

I'm gonna continue troubleshoot this.

sh-5.2$ OPENSEARCH_HOME=/usr/share/opensearch/ OPENSEARCH_JAVA_HOME=/usr/share/opensearch/jdk/ OPENSEARCH_PATH_CONF=/usr/share/opensearch/config/ bin/opensearch-performance-analyzer/performance-analyzer-agent-cli &
[1] 1181
sh-5.2$ 09:05:50.833 [main] ERROR org.opensearch.performanceanalyzer.PerformanceAnalyzerWebServer - Could not create HttpServer on port 9600
java.net.BindException: Address already in use
        at java.base/sun.nio.ch.Net.bind0(Native Method) ~[?:?]
        at java.base/sun.nio.ch.Net.bind(Net.java:565) ~[?:?]
        at java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:344) ~[?:?]
        at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:301) ~[?:?]
        at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocke
        ```

@eyenx
Copy link
Contributor

eyenx commented Sep 10, 2024

According to the initial PR, the SM is meant to be used in conjunction with https://github.com/Aiven-Open/prometheus-exporter-plugin-for-opensearch

I'll try to integrate it and make sure it's installed by default when enabling serviceMonitor

Reference: #537 (comment)

@gabriel-suela
Copy link
Author

According to the initial PR, the SM is meant to be used in conjunction with https://github.com/Aiven-Open/prometheus-exporter-plugin-for-opensearch

I'll try to integrate it and make sure it's installed by default when enabling serviceMonitor

Reference: #537 (comment)

with the plugin i managed to access the metrics on port 9200, but the ServiceMonitor manifest still have a port pointing to metrics port name:

 spec:                                                                                                                                                                                                                                                                    │
│   endpoints:                                                                                                                                                                                                                                                             │
│   - interval: 10s                                                                                                                                                                                                                                                        │
│     path: /_prometheus/metrics                                                                                                                                                                                                                                           │
│     port: metrics    

i believe that is why prometheus keeps warning me that the opensearch metrics are unreachable even so i can access it through port forward.

@eyenx
Copy link
Contributor

eyenx commented Sep 10, 2024

You can configure the port with metricsPort

@prudhvigodithi prudhvigodithi removed the untriaged Issues that have not yet been triaged label Sep 12, 2024
@KristijanL
Copy link

i have try to make servicemonitor work for a while now, but i think the implementation is broken. we have
service.metricsPortName and metricsPort

but setting either of those, breaks deployment, example:

service:
  metricsPortName: "http"
metricsPort: 9200
* Service "opensearch-cluster-master-headless" is invalid: spec.ports[2].name: Duplicate value: "http"
* Service "opensearch-cluster-master" is invalid: spec.ports[2].name: Duplicate value: "http"
* Service "opensearch-cluster-master-headless" is invalid: spec.ports[2]: Duplicate value: core.ServicePort{Name:"", Protocol:"TCP", AppProtocol:(*string)(nil), Port:9200, TargetPort: intstr.IntOrString{Type:0, IntVal:0, StrVal:""}, NodePort:0}                                                                                                                                                       
* Service "opensearch-cluster-master" is invalid: spec.ports[2]: Duplicate value: core.ServicePort{Name:"", Protocol:"TCP", AppProtocol:(*string)(nil), Port:9200, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:""}, NodePort:0}

the best way for this to work, would be to leave the users an option to edit the servicemonitor but add defaults to values
example:

  endpoints:
  - targetPort: {{ .Values.serviceMonitor.targetPort }}
    path: {{ .Values.serviceMonitor.path }}
    interval: {{ .Values.serviceMonitor.interval }}
    scrapeTimeout: {{ .Values.serviceMonitor.scrapeTimeout }}
    honorLabels: {{ .Values.serviceMonitor.honorLabels }}
    {{- with .Values.serviceMonitor.endpointAdditionalProperties }}
    {{- toYaml . | nindent 4 }}
    {{- end }}

another issue that i notice, i get duplicate endpoints, beacuse the selector selects both headless and non headless service, so the selector needs to be updated to only one service.

Screenshot 2024-10-01 at 23 40 11

@gabriel-suela
Copy link
Author

gabriel-suela commented Oct 1, 2024

@KristijanL i found out that serviceMonitor needs to use basic auth to properly connect to prometheus endpoint:

  - apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    labels:
      app.kubernetes.io/component: opensearch-cluster-master
      app.kubernetes.io/instance: opensearch
    metadata:
      namespace: opensearch
      name: opensearch-cluster-master-service-monitor
    spec:
      endpoints:
        - basicAuth:
            password:
              key: password
              name: '{{ .Release.Name }}-initial-admin-password'
            username:
              key: username
              name: '{{ .Release.Name }}-initial-admin-password'
          interval: 30s
          path: /_prometheus/metrics
          port: http
      selector:
        matchLabels:
          app.kubernetes.io/instance: opensearch
          app.kubernetes.io/name: opensearch
          name: opensearch-headless

you can disable the chart default serviceMonitor and put a custom one in the extraObjects.

@KristijanL
Copy link

KristijanL commented Oct 1, 2024

That is certainly an option. However, at present, the charts ServiceMonitor appears to be non-functional and effectively redundant. If it is intended for use, it will need modifications to make it functional, or detailed documentation should be provided on how to use it effectively, as its current purpose is unclear.

@gabriel-suela
Copy link
Author

totally agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

5 participants