Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/scollector: Send a hi metric with a value of 1 #1319

Merged
merged 1 commit into from
Sep 15, 2015
Merged

cmd/scollector: Send a hi metric with a value of 1 #1319

merged 1 commit into from
Sep 15, 2015

Conversation

kylebrandt
Copy link
Member

This can be used to create an alert for scollector being alive and then used as a dependency.

This can be used to create an alert for scollector being alive and then used as a dependency.
@gbrayut
Copy link
Contributor

gbrayut commented Sep 15, 2015

Code looks good. Not sure what your plan is, but you might get better results using scollector.collect.queued instead. It should have the same frequency as this metric (every 15s by default), but if you get a high count you know there is more data waiting to be sent. Basically scollector.collect.queued should function exactly the same as this in the average case (only set to 0 instead of 1) and in cases where there was an outage it will actually give a better signal indicating how many data points are waiting to be sent. Also alerts on unknown values are tricky since you might just be seeing old datapoints that are being burned down in the queue (i think we are FIFO not LIFO)

Now that I think about it something like a scollector.collect.queued_since that is the seconds since the latest timestamp of the last batch of sent metric might work best, since that could easily be used in the alert logic to determine if you should go crit or just warn that scollector is back online and burning down the queue. Not sure how long the burndown time is on average, but I know if things get backed up significantly it can take a bit to return to 0.

@kylebrandt
Copy link
Member Author

The timestamp of the value 1 should be when the collector created it. The thing I like about the way I did it is it is very straightforward in what it does. Using scollector.collect.queued as an additional (or additional logic) check makes sense to me though, as does adding the scollector.collect.queued_since metric (not sure entirely what you mean there though as to which timestamp).

My plan is this can be a dependency, and if the host can be pinged but this isn't there, it isn't host down but rather (scollector down, not current, etc).

kylebrandt added a commit that referenced this pull request Sep 15, 2015
cmd/scollector: Send a hi metric with a value of 1
@kylebrandt kylebrandt merged commit b525d33 into master Sep 15, 2015
@kylebrandt kylebrandt deleted the sHi branch September 15, 2015 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants