Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alert on error creating a POD #290

Closed
2opremio opened this issue Jan 14, 2016 · 3 comments
Closed

Add alert on error creating a POD #290

2opremio opened this issue Jan 14, 2016 · 3 comments
Labels

Comments

@2opremio
Copy link

As a continuation of #280

We have already experienced two issues creating pods in kubernetes of which we should be alerted automatically:

@2opremio
Copy link
Author

Copying a Slack conversation on this with @peterbourgon for further reference: https://weaveworks.slack.com/archives/service/p1452792646004629

​[5:30] 
how would you go about generating an alert when kubernetes fails to create new pods?

​[5:31] 
would you know how to do this with prometheus?

​[5:32] 
it already talks to the API server, so I guess it should also be receiving error events like the “no more available ips” or “I am not going to run any more pods because the CPU consumption is too high"

peter [5:41 PM] 
I guess we'd need to find out if Kubernetes exports a metric that maps to that condition

​[5:41] 
If so, Prometheus is already scraping it...

​[5:43] 
kubelet_docker_errors is a fun one

fons [5:43 PM] 
​_Exports a metric_​ <- how can we check?(edited)

peter [5:43 PM] 
well, reading the source, or seeing what timeseries pop up in the Prometheus graph dropdown:
http://monitoring.default.svc.cluster.local:9090/graph

​[5:44] 
`rate(kubelet_docker_errors[1m])` might be a good general candidate

​[5:45] 
that reads as: how many kubelet_docker_errors per second (1 minute avg) do we see?

fons [5:45 PM] 
OK, I will check

new messages
peter [5:47 PM] 
`sort_desc(sum(kubelet_docker_errors) by (operation_type))` — also a fun one

​[5:47] 
this will definitely capture it

@errordeveloper
Copy link

Is to do with spawning app pods?

@tomwilkie
Copy link

We don't need this anymore (with m/t)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants