Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Alert Manager API Concurrency configurable #5412

Conversation

emanlodovice
Copy link
Contributor

@emanlodovice emanlodovice commented Jun 18, 2023

What this PR does:
This PR makes the API Concurrency config value of alertmanager configurable via cortex alertmanager config.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@emanlodovice emanlodovice force-pushed the expose-am-api-concurrency-config branch 2 times, most recently from a442430 to 35cbe38 Compare June 19, 2023 17:05
@@ -416,6 +416,10 @@ cluster:
# CLI flag: -experimental.alertmanager.enable-api
[enable_api: <boolean> | default = false]

# Maximum number of concurrent GET API requests before returning 503.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a limit that we can override in tenant limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it makes sense that we can override this limit per tenant so we can adjust per tenant use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the configuration to tenant limit in the latest commit 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved back to static config because we don't create a new am instance when loading runtime config so we are not able to update this value at runtime.

@emanlodovice emanlodovice force-pushed the expose-am-api-concurrency-config branch 2 times, most recently from 0e5d3ef to a19569b Compare June 20, 2023 01:22
@emanlodovice emanlodovice requested a review from alvinlin123 June 20, 2023 01:39
@emanlodovice emanlodovice force-pushed the expose-am-api-concurrency-config branch 2 times, most recently from 7f8ed47 to 394e166 Compare June 20, 2023 05:12
@@ -3074,6 +3074,11 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# alerts will fail with a log message and metric increment. 0 = no limit.
# CLI flag: -alertmanager.max-alerts-size-bytes
[alertmanager_max_alerts_size_bytes: <int> | default = 0]

# Maximum number of concurrent GET API requests before returning 503. If 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally if we are specific about the 503, we should probably have a test around it in case prometheus/alertmanager changes the HTTP status code. This way we can catch the change and update this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, do you think changing the wording from "returning 503" to something like "returning an error" would be enough here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think "returning an error" is better because I don't think prometheus/alertmanager specifies the behaviour when concurrency is reached (503 is not defined in the API model at least :-) )

Copy link
Contributor

@alvinlin123 alvinlin123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -266,6 +268,7 @@ func New(cfg *Config, reg *prometheus.Registry) (*Alertmanager, error) {
GroupFunc: func(f1 func(*dispatch.Route) bool, f2 func(*types.Alert, time.Time) bool) (dispatch.AlertGroups, map[model.Fingerprint][]string) {
return am.dispatcher.Groups(f1, f2)
},
Concurrency: apiConcurrency,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't respect the run-time configuration unless tenant re-created the am config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this insight. We are moving this back as a static config.

@emanlodovice emanlodovice force-pushed the expose-am-api-concurrency-config branch 2 times, most recently from 76613d8 to 70c4b62 Compare June 20, 2023 19:04
Signed-off-by: Emmanuel Lodovice <lodovice@amazon.com>
@emanlodovice emanlodovice force-pushed the expose-am-api-concurrency-config branch from 70c4b62 to 13c81e4 Compare June 20, 2023 20:33
@emanlodovice emanlodovice requested a review from qinxx108 June 20, 2023 21:25
docs/configuration/config-file-reference.md Outdated Show resolved Hide resolved
@alvinlin123 alvinlin123 merged commit 2b551b6 into cortexproject:master Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants