-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: Use cluster monitoring alertmanager by default on openshift clusters #7272
Conversation
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.1%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0.6% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks good to me. Only a couple of code separation improvements suggested to keep openshift things in a single package.
if stack.Spec.Tenants != nil && opts.Stack.Tenants.Mode == lokiv1.OpenshiftLogging { | ||
var svc corev1.Service | ||
key := client.ObjectKey{Name: manifests.MonitoringSVCOperated, Namespace: manifests.MonitoringNS} | ||
|
||
err = k.Get(ctx, key, &svc) | ||
if err != nil && !apierrors.IsNotFound(err) { | ||
return kverrors.Wrap(err, "failed to lookup alertmanager service", "name", key) | ||
} | ||
|
||
if err == nil { | ||
opts.Ruler.OCPAlertManagerEnabled = true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to push this code block in a separate package internal to handlers package. This handler is already too big and needs some uncluttering, let's start with new additions like this one.
if opt.Stack.Tenants != nil && opt.Ruler.OCPAlertManagerEnabled && opt.Stack.Tenants.Mode == lokiv1.OpenshiftLogging { | ||
rulerEnabled = true | ||
|
||
if opt.Ruler.Spec == nil { | ||
opt.Ruler.Spec = &lokiv1beta1.RulerConfigSpec{ | ||
AlertManagerSpec: &lokiv1beta1.AlertManagerSpec{}, | ||
} | ||
} | ||
|
||
if opt.Ruler.Spec.AlertManagerSpec == nil || len(opt.Ruler.Spec.AlertManagerSpec.Endpoints) == 0 { | ||
ams := &lokiv1beta1.AlertManagerSpec{ | ||
Endpoints: []string{"https://_web._tcp.alertmanager-operated.openshift-monitoring.svc"}, | ||
EnableV2: true, | ||
DiscoverySpec: &lokiv1beta1.AlertManagerDiscoverySpec{ | ||
EnableSRV: true, | ||
RefreshInterval: "1m", | ||
}, | ||
} | ||
|
||
if err := mergo.Merge(opt.Ruler.Spec.AlertManagerSpec, ams); err != nil { | ||
return config.Options{}, kverrors.Wrap(err, "failed merging RulerSpec options") | ||
} | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general pattern to apply openshift partial patches for automation looks like this:
- Create a configure method in the openshift package
- Apply the configure after
ConfigOptions()
returns and handle any error
See ConfigureDeploymentForTenantMode.
operator/internal/manifests/var.go
Outdated
monitoringSVCMain = "alertmanager-main" | ||
// MonitoringNS is the namespace containing cluster monitoring objects such as alertmanager. | ||
MonitoringNS = "openshift-monitoring" | ||
// MonitoringSVCOperated is the name of the alertmanager service used for alerts. | ||
MonitoringSVCOperated = "alertmanager-operated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these belongs to the openshift package.
wantOptions: nil, | ||
}, | ||
{ | ||
desc: "openshift-logging mode", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a fourth mode to test nowadays openshift-network
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
- querier/queryrange -0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
if opt.OpenShiftOptions.BuildOpts.OCPAlertManagerEnabled { | ||
if err = ConfigureOptionsForMode(&cfg, opt.Stack.Tenants.Mode); err != nil { | ||
return nil, "", err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you flip the outer if block inside openshift.ConfigureOptions(cfg)
. This way we limit exposing OpenShiftOptions outside of the openshift package.
// OCPAlertManagerEnabled returns true if the Openshift AlertManager is present in the cluster. | ||
func OCPAlertManagerEnabled(ctx context.Context, opts manifests.Options, k k8s.Client) (bool, error) { | ||
if opts.Stack.Tenants != nil && opts.Stack.Tenants.Mode == lokiv1.OpenshiftLogging { | ||
var svc corev1.Service | ||
key := client.ObjectKey{Name: manifests.MonitoringSVCOperated, Namespace: manifests.MonitoringNS} | ||
|
||
err := k.Get(ctx, key, &svc) | ||
if err != nil && !apierrors.IsNotFound(err) { | ||
return false, kverrors.Wrap(err, "failed to lookup alertmanager service", "name", key) | ||
} | ||
|
||
return err == nil, nil | ||
} | ||
|
||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we create a separate openshift
package and place the code in a file alertmanager.go
, naming the function as openshift.alertManagerSvcExists
func ConfigureOptionsForMode(cfg *config.Options, mode lokiv1.ModeType) error { | ||
switch mode { | ||
case lokiv1.Static, lokiv1.Dynamic: | ||
return nil // nothing to configure | ||
case lokiv1.OpenshiftLogging, lokiv1.OpenshiftNetwork: | ||
return openshift.ConfigureOptions(cfg) | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the gateway_tenants.go
file where all the other configure function live.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why gateway_tenants? it has nothing to do with the gateway..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the filename is wrong nowadays it should be more tenants.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I see, I will rename it then to avoid confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename in a separate PR, because this is a crucial file that will break many PRs right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy that!
GatewaySvcTargetPort string | ||
RulerName string | ||
Labels map[string]string | ||
OCPAlertManagerEnabled bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can skip the OCP
prefix here, as the field is accessible via openshift.BuiltOptions.AlertManagerEnabled
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
if opts.Stack.Tenants != nil && opts.Stack.Tenants.Mode == lokiv1.OpenshiftLogging { | ||
var svc corev1.Service | ||
key := client.ObjectKey{Name: openshift.MonitoringSVCOperated, Namespace: openshift.MonitoringNS} | ||
|
||
err := k.Get(ctx, key, &svc) | ||
if err != nil && !apierrors.IsNotFound(err) { | ||
return false, kverrors.Wrap(err, "failed to lookup alertmanager service", "name", key) | ||
} | ||
|
||
return err == nil, nil | ||
} | ||
|
||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if opts.Stack.Tenants != nil && opts.Stack.Tenants.Mode == lokiv1.OpenshiftLogging { | |
var svc corev1.Service | |
key := client.ObjectKey{Name: openshift.MonitoringSVCOperated, Namespace: openshift.MonitoringNS} | |
err := k.Get(ctx, key, &svc) | |
if err != nil && !apierrors.IsNotFound(err) { | |
return false, kverrors.Wrap(err, "failed to lookup alertmanager service", "name", key) | |
} | |
return err == nil, nil | |
} | |
return false, nil | |
if opts.Stack.Tenants == nil || opts.Stack.Tenants.Mode != lokiv1.OpenshiftLogging { | |
return false, nil | |
} | |
var svc corev1.Service | |
key := client.ObjectKey{Name: openshift.MonitoringSVCOperated, Namespace: openshift.MonitoringNS} | |
err := k.Get(ctx, key, &svc) | |
if err != nil && !apierrors.IsNotFound(err) { | |
return false, kverrors.Wrap(err, "failed to lookup alertmanager service", "name", key) | |
} | |
return err == nil, nil |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor bits and we are set to go.
enabled, err := openshift.AlertManagerSVCExists(ctx, opts, k) | ||
if err != nil { | ||
ll.Error(err, "failed to check OCP AlertManager") | ||
return err | ||
} | ||
|
||
opts.OpenShiftOptions.BuildOpts.AlertManagerEnabled = enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this up to the the if-branch on Line 173 to make this call optional only if rules are enabled.
@@ -45,7 +55,7 @@ func LokiConfigMap(opt Options) (*corev1.ConfigMap, string, error) { | |||
} | |||
|
|||
// ConfigOptions converts Options to config.Options | |||
func ConfigOptions(opt Options) config.Options { | |||
func ConfigOptions(opt Options) (config.Options, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is throwing an error here anymore. Let's remove the error return type.
…o operator_alertmanager
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
What this PR does / why we need it:
This PR enables the use of cluster monitoring alertmanager by default on Openshift clusters when applicable. The configuration can however be overridden by the user config.
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guideCHANGELOG.md
updateddocs/sources/upgrading/_index.md