-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stn/correctly mark api silences #1733
Conversation
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
ee85580
to
5786f49
Compare
On my review queue. Note about the plans to make this more efficient: My idea would be to not just add some kind of freshness (as this would still allow out-of-date results, i.e. a naive user will still be confused if they create a silence, immediately after that reload the alerts page, and still see the silenced alerts as not-silenced). My idea is around establishing a versioning of the silence state, to be increased each time a new silence is created or an old silence is expired. Then record for each alert the highest version against which its silenced status was checked. Then, we only need to re-check if the current silence state version is higher than the recorded version for the alert. This will also help during the normal sending of notifications (where we currently have to check against all silences each time a notification is due for sending). A similar technique can be applied for inhibitions. |
api/v2/api.go
Outdated
@@ -53,6 +53,8 @@ type API struct { | |||
peer *cluster.Peer | |||
silences *silence.Silences | |||
alerts provider.Alerts | |||
muter types.Muter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If muter
is only about inhibition, let's also call it that way. In addition, this could also just be a function. What do you think?
muter types.Muter | |
isInhibited func(model.LabelSet) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already type MuteFunc
. A function f
of the suggested signature can be converted into a Muter
by MuteFunc(f)
. I'd say accepting the interface here (which could be satisfied by the function you suggest) is preferred.
But I'm with you WRT naming the member variable with something that suggests this is only about inhibitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Muter
interface's description of its method, and its single implementater Inhibitor
, is a bit confusing. Our implementation relies on a side effect of the particular impl from Inhibitor
, and isn't actually checking if the alert is inhibited -- instead, if it is, internal to the method, the alert's status is updated.
This is a long way of saying I'm thinking of naming the struct member something along the lines of setInhibitedStatus
, and separately (an issue or something) discuss how we should either drop or update this interface.
Because we're relying on a particular implementation detail, I think the "correct" thing to do is NOT use the interface, but the actual concrete implementation.
Comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a bit has changed now. It looks good to me as it is now, but I leave the final call to @mxinden as he raised this point in the first place.
api/v2/api.go
Outdated
// Will check if the alert is inhibited, and update its status. | ||
api.muter.Mutes(a.Labels) | ||
// Will check if the alert is silenced, and update its status. | ||
sils, err := api.silences.Query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this logic determines whether a given alert is silenced or not. If so, that domain logic should rather live in github.com/prometheus/alertmanager/silence
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. this is now mostly (but not exactly) replicated from https://github.com/prometheus/alertmanager/blob/master/notify/notify.go#L407-L425
Perhaps this could be unified in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The silences
package exposes this Query
api so as to be externally consumed. It seems to me that this is being used correctly. If we want to change/simplify the silence
package's Query
api, such as adding a QueryActiveAlerts
method or something, maybe that should be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with solving this later, but perhaps add a TODO then in the setAlertStatus
function to somehow dedupe the code with https://github.com/prometheus/alertmanager/blob/master/notify/notify.go#L407-L425 eventually?
@mxinden is currently working on un-munging the many pieces of logic munged together in main.go
, and this puts this quite significant (and partially duplicated) piece of logic somewhere in main.go
. I can believe that doing the un-munging in a separate and coordinated effort might even make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like a good start. Let's tackle the performance optimizations later once this has proven to do the right thing.
api/v2/api.go
Outdated
@@ -53,6 +53,8 @@ type API struct { | |||
peer *cluster.Peer | |||
silences *silence.Silences | |||
alerts provider.Alerts | |||
muter types.Muter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already type MuteFunc
. A function f
of the suggested signature can be converted into a Muter
by MuteFunc(f)
. I'd say accepting the interface here (which could be satisfied by the function you suggest) is preferred.
But I'm with you WRT naming the member variable with something that suggests this is only about inhibitions.
api/v2/api.go
Outdated
// Will check if the alert is inhibited, and update its status. | ||
api.muter.Mutes(a.Labels) | ||
// Will check if the alert is silenced, and update its status. | ||
sils, err := api.silences.Query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. this is now mostly (but not exactly) replicated from https://github.com/prometheus/alertmanager/blob/master/notify/notify.go#L407-L425
Perhaps this could be unified in some way?
I guess @stuartnelson3 is not planning to work on this during skiing. But let's see… :o) |
This encapsulates the actual logic being done previously into a single closure that gets passed to the api when loaded/reloaded. Additionally, the naming is now a bit clearer. The use of the Muter interface is a bit confusing as it was previously being used for its side effect of marking an alert as inhibited or not, and the return value was being ignored. Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
00967ec
to
e4e404f
Compare
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
85d86fa
to
a26e464
Compare
Thanks, @stuartnelson3 , for doing this in your one day in between vacations. @mxinden what do you think? I'd go for merging this (more or less) as is, and then get it straightened in the planned refactoring. |
I merged master into this to now check if things work as intended with the recent changes to master. |
Signed-off-by: beorn7 <beorn@soundcloud.com>
I added a commit for comment nits, but also found an issue. Will comment in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a question after all.
if err != nil { | ||
return fmt.Errorf("failed to query silences: %v", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the code in notify.go
, don't we have to unset the silence marker of no silences match the alert? I.e.
n.marker.SetSilenced(labels.Fingerprint())
. (And if that's true, it would prove the point that we should not replicate the code but have it shared somewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, good catch. Updating the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. (Would prefer to actually de-replicate the code, but we can do that later.)
This was forgotten in the previous code. If an alert is not silenced, we need to make sure the status is updated as such when the API returns the alert. Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review @stuartnelson3, last week has been pretty busy at Red Hat.
I got one code-placement comment. Thanks for investing into the thorough end-to-end test!
@@ -476,3 +478,28 @@ func md5HashAsMetricValue(data []byte) float64 { | |||
copy(bytes, smallSum) | |||
return float64(binary.LittleEndian.Uint64(bytes)) | |||
} | |||
|
|||
func setAlertStatus(inhibitor *inhibit.Inhibitor, marker types.Marker, silences *silence.Silences) func(model.LabelSet) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be inside main.go
? I feel like it would be better placed close to types.Marker
, given that marker.Status
represents the getAlertStatus
call. What do you think? Might be included in the TODO below.
// TODO(beorn7): The following code is almost exactly replicated in notify/notify.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I put this here is because it seemed like a bundling of imperative things specific to this use case, and then passing them on down to the api. I sort of view marker.Status
being getAlertStatus
as a coincidence, and that we created the getAlertStatus
type as a way to abstract away which specific impl we're using.
Since this particular function is tying together three different packages (silences
, inhibit
, and types
), it felt more natural to me to happen in main. My personal (and debatable, of course) philosophy is that it's easier to define interfaces (easier to unit test), and then assemble all of them together at the top level (for a full acceptance test).
This was original inspired by a destroy all software screencast, but I've probably diverged from the author's original intent. Happy to talk more, but maybe not in a PR :)
The short answer is: I'm happy to move it if you feel strongly about it. Binding several different packages together didn't leave me with a clear signal where to put it, so main
felt fine for now. When we figure out how to deduplicate this code this should be easy to remove/replace.
Definitely! I'm hoping that we can start adding more now that there's sort of a "model" for how it works, and turn some of the verbose testing into helper methods as we see more patterns. The actual testing is fairly simple, it's just the struct creation + api call generates a lot of visual noise. |
fixes #979
to note: