Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter suddenly fails to show any metrics. #86

Closed
tanisdlj opened this issue Jan 22, 2021 · 8 comments
Closed

Exporter suddenly fails to show any metrics. #86

tanisdlj opened this issue Jan 22, 2021 · 8 comments

Comments

@tanisdlj
Copy link

Ok, I've got myself an X Files case here.
Our druid-exporter suddenly died yesterday around 20:00 (CET).
This are the logs:

Jan 21 18:55:31 druid-exporter-1 druid-prometheus-exporter[2114]: time="2021-01-21T18:55:31Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Main process exited, code=killed, status=9/KILL
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Failed with result 'signal'.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Service RestartSec=100ms expired, scheduling restart.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Scheduled restart job, restart counter is at 1.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: Stopped Apache Druid Prometheus Exporter..
Jan 21 18:56:12 druid-exporter-1 systemd[1]: Started Apache Druid Prometheus Exporter..
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Druid exporter started listening on: 8080"
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Metrics endpoint - http://0.0.0.0:8080/metrics"
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:8080/druid"

Since then, we've got 0 metrics. Tried reinstalling, rebooting the instance, restarting the router, broker, overlord, coordinator... 0 metrics.

Logs are empty, besides the typical "starting 8080, endpoint X, endpoint Y"

A tcpdump shows that the instance is receiving metrics from druid.
curl-ing the metrics in druid-exporter returns metrics, but takes ages, just now I timed it:

$ time curl localhost:8080/metrics
[...]
real	1m38.608s
user	0m0.007s
sys	0m0.011s

Any idea what might have happened?

@tanisdlj
Copy link
Author

As additional data: druid-exporter host is near 0% CPU usage and 0% RAM usage all the time, monitored with prometheus and htop while querying /metrics and it's a dedicated host so that's not the issue either :(

@tanisdlj
Copy link
Author

Downloaded v0.8 of the exporter, the issue stops happening. Looks like there is something with v0.9

@tanisdlj
Copy link
Author

Ignore this issue. Apparently the host invoked the oom-killer, druid-exporter was the target and for reasons I cannot see the service we created (that just runs druid-exporter) is not working. Doesn't matter which version run. If I run the exporter myself (or with any other user) it works. Time to investigate this on my own. Sorry for bothering you! :)

@tanisdlj tanisdlj reopened this Jan 22, 2021
@tanisdlj
Copy link
Author

Sorry for all the back and forth. Apparently yes, it only happens in v0.9 released. v0.9-beta works OK

@iamabhishek-dubey
Copy link
Member

I'll check this over the weekend

@iamabhishek-dubey
Copy link
Member

Fixed in #89

@iamabhishek-dubey
Copy link
Member

@tanisdlj
Copy link
Author

The release one was the affected by this issue. How is it fixed on that one? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants