-
Notifications
You must be signed in to change notification settings - Fork 95
add a metrics to indicate store replication job run #286
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -267,6 +267,7 @@ func (runner *replicationJobRunner) run() { | |
runner.m3Client.UpdateGauge(metrics.ReplicateExtentScope, metrics.StorageReplicationJobCurrentFailures, int64(len(currentFailedJobs))) | ||
runner.m3Client.UpdateGauge(metrics.ReplicateExtentScope, metrics.StorageReplicationJobMaxConsecutiveFailures, int64(maxConsecutiveFailures)) | ||
runner.m3Client.UpdateGauge(metrics.ReplicateExtentScope, metrics.StorageReplicationJobCurrentSuccess, int64(jobsStarted)) | ||
runner.m3Client.UpdateGauge(metrics.ReplicateExtentScope, metrics.StorageReplicationJobRun, 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this 'gauge' would be stuck at '1'. you need to increment a counter instead .. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because this is a routine that runs only every ten minutes. The '1' here indicates the goroutine runs instead of stuck somewhere, so that I can use something like 'movingSum 1h' to setup an alert. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am still a little confused though. But with this, don't you then need to reset to zero when the run has finished? And perhaps check if there is a "0" to indicate that the run is completing and restarting, ketc. I still feel a 'counter' can probably help more .. if the rate of change of count is zero, then that indicates a problem, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No I don't need to reset it. After every run, the gauge metrics will increase by 1, then I use 'movingSum 1h' to get how many times the routine has ran in the past hour. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay, as discussed offline .. that should work, since "no report"/null corresponds to "0". |
||
|
||
runner.logger.WithFields(bark.Fields{ | ||
`stats`: fmt.Sprintf(`total extents: %v, remote extents:%v, opened for replication: %v, primary: %v, secondary: %v, failed: %v, success: %v`, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you need a "counter" instead of a "gauge"?