Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

metallb: Add alerts for metallb #140

Merged
merged 5 commits into from
Mar 31, 2020
Merged

metallb: Add alerts for metallb #140

merged 5 commits into from
Mar 31, 2020

Conversation

surajssd
Copy link
Member

This commit adds a prometheus operator CR PrometheusRule which includes
alertmanager rules for metallb.

  • This sends an alert when it sees that a session is not established for
    more than 2mins.

  • This sends an alert when any of the pods have stale config.

Signed-off-by: Suraj Deshmukh suraj@kinvolk.io

@surajssd
Copy link
Member Author

TODO: Add retry logic to the tests. This will buy some time for the prometheus to reconcile.

@surajssd surajssd marked this pull request as ready for review March 20, 2020 13:21
@invidian
Copy link
Member

The alternative for PLATFORM environment variable would be something like this:

new file mode 100644
index 00000000..4939ffc1
--- /dev/null
+++ b/test/components/util/platform_generic.go
@@ -0,0 +1,20 @@
+// Copyright 2020 The Lokomotive Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build !packet !aws
+
+package util
+
+const Platform = PlatformAll
diff --git a/test/components/util/platform_aws.go b/test/components/util/platform_aws.go
new file mode 100644
index 00000000..ad6050f9
--- /dev/null
+++ b/test/components/util/platform_aws.go
@@ -0,0 +1,20 @@
+// Copyright 2020 The Lokomotive Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build aws
+
+package util
+
+const Platform = PlatformAWS
diff --git a/test/components/util/platform.go b/test/components/util/platform.go
new file mode 100644
index 00000000..ad6050f9
--- /dev/null
+++ b/test/components/util/platform.go
@@ -0,0 +1,20 @@
+// Copyright 2020 The Lokomotive Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package util
+
+const (
+  PlatformAWS = "aws"
+  PlatformPacket = "packet"
+  PlatformAll = "generic"
+)
+
+type PlatformSelector struct {
+  Include []string
+  Exlude []string
+}
+
+func IsPlatformSupported(t *testing.T, ps *PlatformSelector) {
+  if ps == nil {
+    return
+  }
+
+  for _, p := range ps.Include {
+    if p == Platform {
+      return
+    }
+  }
+
+  for _, p := range ps.Exclude {
+    if p == Platform {
+      t.Skip("Not supported on platform %q", Platform)
+    }
+  }
+
+  return t.Skip("Not supported on platform %q", Platform)
+  }

This way, the test would define following value in test matrix:

PlatformSelector: util.PlatformSelector{
  Exclude: []string{
    util.PlatformBareMetal,
  },
}

Or the field would be skipped if test is intended to run on all platforms.

Then, when running tests, following needs to be used:

  util.IsPlatformSupported(t, testCase.PlatformSelector)

@surajssd surajssd force-pushed the surajssd/add-metallb-alert branch 3 times, most recently from f3e525f to e207e51 Compare March 24, 2020 10:47
@surajssd surajssd requested a review from iaguis March 24, 2020 11:04
test/monitoring/component-alerts_test.go Outdated Show resolved Hide resolved
pkg/components/metallb/manifests.go Outdated Show resolved Hide resolved
pkg/components/metallb/manifests.go Outdated Show resolved Hide resolved
pkg/components/metallb/manifests.go Outdated Show resolved Hide resolved
test/monitoring/component-alerts_test.go Outdated Show resolved Hide resolved
@surajssd
Copy link
Member Author

I think we are using build-time information for something that should be run-time information. IMO tags should represent things like the kind of tests (integration, unit, e2e, and so on) and for run-time information (like the platform we're running on) we should use env variables that we can check at run-time.

The platform is already passed as an env variable to Make, we're just placing that into build tags, which I don't think makes sense if we want to have shared logic in tests. I'd remove all build tags and use env variables, but that's a bigger task.

About the structuring of tests in this PR it seems OK to me.

@iaguis Most of these concerns will be addressed in #212.

@surajssd surajssd force-pushed the surajssd/add-metallb-alert branch 5 times, most recently from b6dd1ff to 0cb5014 Compare March 27, 2020 09:05
@surajssd surajssd force-pushed the surajssd/add-metallb-alert branch 2 times, most recently from d2200bd to 8b47a07 Compare March 31, 2020 09:03
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last nit, otherwise LGTM

test/monitoring/components_alerts_test.go Outdated Show resolved Hide resolved
test/monitoring/components_alerts_test.go Outdated Show resolved Hide resolved
invidian
invidian previously approved these changes Mar 31, 2020
This commit adds a prometheus operator CR PrometheusRule which includes
alertmanager rules for metallb.

* This sends an alert when it sees that a session is not established for
more than 2mins.

* This sends an alert when any of the pods have stale config.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
When metallb controller deployment pod is not available for more than a
minute then an alert will be triggered.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
When metallb speaker daemonset pods are not available for more than a
minute then an alert will be triggered.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
@iaguis
Copy link
Contributor

iaguis commented Mar 31, 2020

@invidian can you review again?

@iaguis iaguis merged commit 24f0c13 into master Mar 31, 2020
@iaguis iaguis deleted the surajssd/add-metallb-alert branch March 31, 2020 15:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants