-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Alert Manager API Concurrency configurable #5412
Make Alert Manager API Concurrency configurable #5412
Conversation
a442430
to
35cbe38
Compare
@@ -416,6 +416,10 @@ cluster: | |||
# CLI flag: -experimental.alertmanager.enable-api | |||
[enable_api: <boolean> | default = false] | |||
|
|||
# Maximum number of concurrent GET API requests before returning 503. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a limit that we can override in tenant limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it makes sense that we can override this limit per tenant so we can adjust per tenant use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the configuration to tenant limit in the latest commit 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved back to static config because we don't create a new am instance when loading runtime config so we are not able to update this value at runtime.
0e5d3ef
to
a19569b
Compare
7f8ed47
to
394e166
Compare
@@ -3074,6 +3074,11 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s | |||
# alerts will fail with a log message and metric increment. 0 = no limit. | |||
# CLI flag: -alertmanager.max-alerts-size-bytes | |||
[alertmanager_max_alerts_size_bytes: <int> | default = 0] | |||
|
|||
# Maximum number of concurrent GET API requests before returning 503. If 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally if we are specific about the 503, we should probably have a test around it in case prometheus/alertmanager changes the HTTP status code. This way we can catch the change and update this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, do you think changing the wording from "returning 503" to something like "returning an error" would be enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think "returning an error" is better because I don't think prometheus/alertmanager specifies the behaviour when concurrency is reached (503 is not defined in the API model at least :-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
pkg/alertmanager/alertmanager.go
Outdated
@@ -266,6 +268,7 @@ func New(cfg *Config, reg *prometheus.Registry) (*Alertmanager, error) { | |||
GroupFunc: func(f1 func(*dispatch.Route) bool, f2 func(*types.Alert, time.Time) bool) (dispatch.AlertGroups, map[model.Fingerprint][]string) { | |||
return am.dispatcher.Groups(f1, f2) | |||
}, | |||
Concurrency: apiConcurrency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't respect the run-time configuration unless tenant re-created the am config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this insight. We are moving this back as a static config.
76613d8
to
70c4b62
Compare
Signed-off-by: Emmanuel Lodovice <lodovice@amazon.com>
70c4b62
to
13c81e4
Compare
What this PR does:
This PR makes the API Concurrency config value of alertmanager configurable via cortex alertmanager config.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]