-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stn/correctly mark api silences #1733
Changes from 2 commits
618cb70
5786f49
e4e404f
2e75c64
a26e464
37d66a8
d32e8dc
26b504b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,6 +53,8 @@ type API struct { | |
peer *cluster.Peer | ||
silences *silence.Silences | ||
alerts provider.Alerts | ||
muter types.Muter | ||
marker types.Marker | ||
getAlertStatus getAlertStatusFn | ||
uptime time.Time | ||
|
||
|
@@ -72,9 +74,17 @@ type API struct { | |
type getAlertStatusFn func(prometheus_model.Fingerprint) types.AlertStatus | ||
|
||
// NewAPI returns a new Alertmanager API v2 | ||
func NewAPI(alerts provider.Alerts, sf getAlertStatusFn, silences *silence.Silences, peer *cluster.Peer, l log.Logger) (*API, error) { | ||
func NewAPI( | ||
alerts provider.Alerts, | ||
marker types.Marker, | ||
sf getAlertStatusFn, | ||
stuartnelson3 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
silences *silence.Silences, | ||
peer *cluster.Peer, | ||
l log.Logger, | ||
) (*API, error) { | ||
api := API{ | ||
alerts: alerts, | ||
marker: marker, | ||
getAlertStatus: sf, | ||
peer: peer, | ||
silences: silences, | ||
|
@@ -115,13 +125,14 @@ func NewAPI(alerts provider.Alerts, sf getAlertStatusFn, silences *silence.Silen | |
} | ||
|
||
// Update sets the configuration string to a new value. | ||
stuartnelson3 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
func (api *API) Update(cfg *config.Config, resolveTimeout time.Duration) error { | ||
func (api *API) Update(cfg *config.Config, resolveTimeout time.Duration, muter types.Muter) error { | ||
api.mtx.Lock() | ||
defer api.mtx.Unlock() | ||
|
||
api.resolveTimeout = resolveTimeout | ||
api.alertmanagerConfig = cfg | ||
api.route = dispatch.NewRoute(cfg.Route, nil) | ||
api.muter = muter | ||
return nil | ||
} | ||
|
||
|
@@ -238,6 +249,28 @@ func (api *API) getAlertsHandler(params alert_ops.GetAlertsParams) middleware.Re | |
continue | ||
} | ||
|
||
// TODO: Add a timestamp now()+X, after which the Mutes() will | ||
// be refreshed? | ||
stuartnelson3 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// Will check if the alert is inhibited, and update its status. | ||
api.muter.Mutes(a.Labels) | ||
// Will check if the alert is silenced, and update its status. | ||
sils, err := api.silences.Query( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand correctly, this logic determines whether a given alert is silenced or not. If so, that domain logic should rather live in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup. this is now mostly (but not exactly) replicated from https://github.com/prometheus/alertmanager/blob/master/notify/notify.go#L407-L425 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with solving this later, but perhaps add a TODO then in the @mxinden is currently working on un-munging the many pieces of logic munged together in |
||
silence.QState(types.SilenceStateActive), | ||
silence.QMatches(a.Labels), | ||
) | ||
if err != nil { | ||
level.Error(api.logger).Log("msg", "Querying silences failed", "err", err) | ||
stuartnelson3 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
if len(sils) > 0 { | ||
ids := make([]string, len(sils)) | ||
for i, s := range sils { | ||
ids[i] = s.Id | ||
} | ||
api.marker.SetSilenced(a.Labels.Fingerprint(), ids...) | ||
} | ||
|
||
// Get alert's current status after seeing if it is suppressed. | ||
status := api.getAlertStatus(a.Fingerprint()) | ||
|
||
if !*params.Active && status.State == types.AlertStateActive { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
// Copyright 2018 Prometheus Team | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package test | ||
|
||
import ( | ||
"fmt" | ||
"testing" | ||
"time" | ||
|
||
"github.com/go-openapi/strfmt" | ||
a "github.com/prometheus/alertmanager/test/with_api_v2" | ||
"github.com/prometheus/alertmanager/test/with_api_v2/api_v2_client/client/alert" | ||
"github.com/prometheus/alertmanager/test/with_api_v2/api_v2_client/client/silence" | ||
"github.com/prometheus/alertmanager/test/with_api_v2/api_v2_client/models" | ||
"github.com/prometheus/common/model" | ||
"github.com/stretchr/testify/require" | ||
) | ||
|
||
// TestAlertGetReturnsCurrentStatus checks that querying the API returns the | ||
// current status of each alert, i.e. if it is silenced or inhibited. | ||
func TestAlertGetReturnsCurrentAlertStatus(t *testing.T) { | ||
t.Parallel() | ||
|
||
conf := ` | ||
route: | ||
receiver: "default" | ||
group_by: [] | ||
group_wait: 1s | ||
group_interval: 10m | ||
repeat_interval: 1h | ||
|
||
inhibit_rules: | ||
- source_match: | ||
severity: 'critical' | ||
target_match: | ||
severity: 'warning' | ||
equal: ['alertname'] | ||
|
||
receivers: | ||
- name: "default" | ||
webhook_configs: | ||
- url: 'http://%s' | ||
` | ||
|
||
at := a.NewAcceptanceTest(t, &a.AcceptanceOpts{ | ||
Tolerance: 1 * time.Second, | ||
}) | ||
co := at.Collector("webhook") | ||
wh := a.NewWebhook(co) | ||
|
||
amc := at.AlertmanagerCluster(fmt.Sprintf(conf, wh.Address()), 1) | ||
require.NoError(t, amc.Start()) | ||
defer amc.Terminate() | ||
|
||
am := amc.Members()[0] | ||
|
||
labelName := "alertname" | ||
labelValue := "test1" | ||
|
||
now := time.Now() | ||
startsAt := strfmt.DateTime(now) | ||
endsAt := strfmt.DateTime(now.Add(5 * time.Minute)) | ||
|
||
labels := models.LabelSet(map[string]string{labelName: labelValue, "severity": "warning"}) | ||
fp := model.LabelSet{model.LabelName(labelName): model.LabelValue(labelValue), "severity": "warning"}.Fingerprint() | ||
pa := &models.PostableAlert{ | ||
StartsAt: startsAt, | ||
EndsAt: endsAt, | ||
Alert: models.Alert{Labels: labels}, | ||
} | ||
alertParams := alert.NewPostAlertsParams() | ||
alertParams.Alerts = models.PostableAlerts{pa} | ||
_, err := am.Client().Alert.PostAlerts(alertParams) | ||
require.NoError(t, err) | ||
|
||
resp, err := am.Client().Alert.GetAlerts(nil) | ||
require.NoError(t, err) | ||
// No silence has been created or inhibiting alert sent, alert should | ||
// be active. | ||
for _, alert := range resp.Payload { | ||
require.Equal(t, models.AlertStatusStateActive, *alert.Status.State) | ||
} | ||
|
||
// Wait for group_wait, so that we are in the group_interval period, | ||
// when the pipeline won't update the alert's status. | ||
time.Sleep(2 * time.Second) | ||
|
||
// Create silence and verify that the alert is immediately marked | ||
// silenced via the API. | ||
silenceParams := silence.NewPostSilencesParams() | ||
|
||
cm := "a" | ||
isRegex := false | ||
ps := &models.PostableSilence{ | ||
Silence: models.Silence{ | ||
StartsAt: &startsAt, | ||
EndsAt: &endsAt, | ||
Comment: &cm, | ||
CreatedBy: &cm, | ||
Matchers: models.Matchers{ | ||
&models.Matcher{Name: &labelName, Value: &labelValue, IsRegex: &isRegex}, | ||
}, | ||
}, | ||
} | ||
silenceParams.Silence = ps | ||
silenceResp, err := am.Client().Silence.PostSilences(silenceParams) | ||
require.NoError(t, err) | ||
silenceID := silenceResp.Payload.SilenceID | ||
|
||
resp, err = am.Client().Alert.GetAlerts(nil) | ||
require.NoError(t, err) | ||
for _, alert := range resp.Payload { | ||
require.Equal(t, models.AlertStatusStateSuppressed, *alert.Status.State) | ||
require.Equal(t, fp.String(), *alert.Fingerprint) | ||
require.Equal(t, 1, len(alert.Status.SilencedBy)) | ||
require.Equal(t, silenceID, alert.Status.SilencedBy[0]) | ||
} | ||
|
||
// Create inhibiting alert and verify that original alert is | ||
// immediately marked as inhibited. | ||
labels["severity"] = "critical" | ||
_, err = am.Client().Alert.PostAlerts(alertParams) | ||
require.NoError(t, err) | ||
|
||
inhibitingFP := model.LabelSet{model.LabelName(labelName): model.LabelValue(labelValue), "severity": "critical"}.Fingerprint() | ||
|
||
resp, err = am.Client().Alert.GetAlerts(nil) | ||
require.NoError(t, err) | ||
for _, alert := range resp.Payload { | ||
require.Equal(t, 1, len(alert.Status.SilencedBy)) | ||
require.Equal(t, silenceID, alert.Status.SilencedBy[0]) | ||
if fp.String() == *alert.Fingerprint { | ||
require.Equal(t, models.AlertStatusStateSuppressed, *alert.Status.State) | ||
require.Equal(t, fp.String(), *alert.Fingerprint) | ||
require.Equal(t, 1, len(alert.Status.InhibitedBy)) | ||
require.Equal(t, inhibitingFP.String(), alert.Status.InhibitedBy[0]) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
muter
is only about inhibition, let's also call it that way. In addition, this could also just be a function. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already
type MuteFunc
. A functionf
of the suggested signature can be converted into aMuter
byMuteFunc(f)
. I'd say accepting the interface here (which could be satisfied by the function you suggest) is preferred.But I'm with you WRT naming the member variable with something that suggests this is only about inhibitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Muter
interface's description of its method, and its single implementaterInhibitor
, is a bit confusing. Our implementation relies on a side effect of the particular impl fromInhibitor
, and isn't actually checking if the alert is inhibited -- instead, if it is, internal to the method, the alert's status is updated.This is a long way of saying I'm thinking of naming the struct member something along the lines of
setInhibitedStatus
, and separately (an issue or something) discuss how we should either drop or update this interface.Because we're relying on a particular implementation detail, I think the "correct" thing to do is NOT use the interface, but the actual concrete implementation.
Comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a bit has changed now. It looks good to me as it is now, but I leave the final call to @mxinden as he raised this point in the first place.