Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Prometheus rule query offset #6085

Merged
merged 3 commits into from
Jul 30, 2024

Conversation

mustafain117
Copy link
Contributor

@mustafain117 mustafain117 commented Jul 12, 2024

What this PR does:

This PR adds support for query_offset field on RuleGroup and global rule_query_offset configuration in Cortex Ruler.
This feature was added in Prometheus 2.53.0

Which issue(s) this PR fixes:

Fixes #6055

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pkg/ruler/ruler.go Outdated Show resolved Hide resolved
@mustafain117 mustafain117 marked this pull request as ready for review July 12, 2024 20:46
@yeya24
Copy link
Contributor

yeya24 commented Jul 15, 2024

There is already a per tenant limit called -ruler.evaluation-delay-duration. Should we deprecate that in this PR?

@mustafain117
Copy link
Contributor Author

There is already a per tenant limit called -ruler.evaluation-delay-duration. Should we deprecate that in this PR?

Yes that will be a duplicate after this PR. Can I simply remove the usages of that limit and mention it as deprecated in the changelog?

@yeya24
Copy link
Contributor

yeya24 commented Jul 17, 2024

Yes I think that works. Please also mark the flag as deprecated. You can find some examples in the codebase.

Comment on lines 31 to 32
google.protobuf.Duration queryOffset = 11
[(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to use this branch to set a global rule query offset, and that setting isn't being honored in my local setup. In my cortex config file, I'm using:

limits:
  ruler_query_offset: 1m

I think the issue is that this queryOffset field needs to be nullable. Since it's non-nullable, it's being initialized asDuration(0) in compat.go, and the ruler only falls back to the global offset if the field is actually nil, which you can see here:

https://github.com/prometheus/prometheus/blob/b75e6353742e32c7e18997ebd6a75b77563e33b5/rules/group.go#L644-L654

This change seems to fix it:

diff --git a/pkg/ruler/ruler.go b/pkg/ruler/ruler.go
index d53f78506..8fec45acb 100644
--- a/pkg/ruler/ruler.go
+++ b/pkg/ruler/ruler.go
@@ -910,6 +910,10 @@ func (r *Ruler) getLocalRules(userID string, rulesRequest RulesRequest, includeB
 			}
 		}
 		interval := group.Interval()
+		var queryOffset *time.Duration
+		if offset := group.QueryOffset(); offset != 0 {
+			queryOffset = &offset
+		}
 
 		groupDesc := &GroupStateDesc{
 			Group: &rulespb.RuleGroupDesc{
@@ -918,7 +922,7 @@ func (r *Ruler) getLocalRules(userID string, rulesRequest RulesRequest, includeB
 				Interval:    interval,
 				User:        userID,
 				Limit:       int64(group.Limit()),
-				QueryOffset: group.QueryOffset(),
+				QueryOffset: queryOffset,
 			},
 
 			EvaluationTimestamp: group.GetLastEvaluation(),
diff --git a/pkg/ruler/rulespb/compat.go b/pkg/ruler/rulespb/compat.go
index 4f916d09c..7526062ad 100644
--- a/pkg/ruler/rulespb/compat.go
+++ b/pkg/ruler/rulespb/compat.go
@@ -13,9 +13,10 @@ import (
 
 // ToProto transforms a formatted prometheus rulegroup to a rule group protobuf
 func ToProto(user string, namespace string, rl rulefmt.RuleGroup) *RuleGroupDesc {
-	queryOffset := time.Duration(0)
+	var queryOffset *time.Duration
 	if rl.QueryOffset != nil {
-		queryOffset = time.Duration(*rl.QueryOffset)
+		offset := time.Duration(*rl.QueryOffset)
+		queryOffset = &offset
 	}
 	rg := RuleGroupDesc{
 		Name:        rl.Name,
@@ -48,13 +49,17 @@ func formattedRuleToProto(rls []rulefmt.RuleNode) []*RuleDesc {
 
 // FromProto generates a rulefmt RuleGroup
 func FromProto(rg *RuleGroupDesc) rulefmt.RuleGroup {
-	queryOffset := model.Duration(rg.QueryOffset)
+	var queryOffset *model.Duration
+	if rg.QueryOffset != nil {
+		offset := model.Duration(*rg.QueryOffset)
+		queryOffset = &offset
+	}
 	formattedRuleGroup := rulefmt.RuleGroup{
 		Name:        rg.GetName(),
 		Interval:    model.Duration(rg.Interval),
 		Rules:       make([]rulefmt.RuleNode, len(rg.GetRules())),
 		Limit:       int(rg.Limit),
-		QueryOffset: &queryOffset,
+		QueryOffset: queryOffset,
 	}
 
 	for i, rl := range rg.GetRules() {
diff --git a/pkg/ruler/rulespb/rules.proto b/pkg/ruler/rulespb/rules.proto
index 189c050eb..ed4e98a76 100644
--- a/pkg/ruler/rulespb/rules.proto
+++ b/pkg/ruler/rulespb/rules.proto
@@ -29,7 +29,7 @@ message RuleGroupDesc {
   repeated google.protobuf.Any options = 9;
   int64 limit =10;
   google.protobuf.Duration queryOffset = 11
-      [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
+      [(gogoproto.nullable) = true, (gogoproto.stdduration) = true];
 }
 
 // RuleDesc is a proto representation of a Prometheus Rule

Copy link
Contributor

@klingerf klingerf Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a related discussion of this in the original prometheus PR, here: prometheus/prometheus#14061 (comment)

Signed-off-by: Mustafain Ali Khan <mustalik@amazon.com>
Signed-off-by: Mustafain Ali Khan <mustalik@amazon.com>
Signed-off-by: Mustafain Ali Khan <mustalik@amazon.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This LGTM!

@alanprot
Copy link
Member

LGTM!

@alanprot alanprot merged commit 85bc555 into cortexproject:master Jul 30, 2024
16 checks passed
@yeya24
Copy link
Contributor

yeya24 commented Jul 30, 2024

@mustafain117 Let's make sure that we create a follow up PR to deprecate the old configuration.

@klingerf
Copy link
Contributor

Hey folks, thanks for putting this change together. I'm still seeing the issue that I mentioned here #6085 (comment) on the version of this PR that merged to master. As far as I can tell, all rules are being initialized with a hardcoded query offset of 0, and that prevents them from falling back to the global rule query offset.

I can dig into it and try to submit a follow-up PR, but at least wanted to give you all a heads up.

klingerf added a commit to klingerf/cortex that referenced this pull request Jul 30, 2024
This is a follow-up to cortexproject#6085, which added support for setting the
`queryOffset` field on individual recording rules, as well as a global
per-tenant ruler query offset that should be used when no individual
rule offset is set.

It turns out that compatability code to convert from a protobuf
RuleGroup to a prometheus RuleGroup that's consumed by the ruler was
coercing null value query offsets to explicit 0s, which meant that no
rules would ever fallback to the global-offset.

This PR fixes that issue, and cleans up handling of the query offset in
a few other ruler files.

Signed-off-by: Kevin Ingelman <ki@buoyant.io>
klingerf added a commit to klingerf/cortex that referenced this pull request Jul 30, 2024
This is a follow-up to cortexproject#6085, which added support for setting the
`query_offset` field on individual recording rule groups, as well as a
per-tenant `ruler_query_offset` limit that should be used when no
individual recording rule group offset is set.

It turns out that compatibility code to convert from a protobuf
RuleGroup to a prometheus RuleGroup was coercing null-value query
offsets to explicit 0s, which meant that no rule groups would ever
fall back to the per-tenant offset.

This PR fixes that issue, and it cleans up handling of the query offset
in a few other ruler files.

Signed-off-by: Kevin Ingelman <ki@buoyant.io>
@klingerf
Copy link
Contributor

I created #6131 with a fix.

danielblando pushed a commit that referenced this pull request Aug 6, 2024
* Fix rule group query offset issue

This is a follow-up to #6085, which added support for setting the
`query_offset` field on individual recording rule groups, as well as a
per-tenant `ruler_query_offset` limit that should be used when no
individual recording rule group offset is set.

It turns out that compatibility code to convert from a protobuf
RuleGroup to a prometheus RuleGroup was coercing null-value query
offsets to explicit 0s, which meant that no rule groups would ever
fall back to the per-tenant offset.

This PR fixes that issue, and it cleans up handling of the query offset
in a few other ruler files.

Signed-off-by: Kevin Ingelman <ki@buoyant.io>

* Revert change to rules API response

Signed-off-by: Kevin Ingelman <ki@buoyant.io>

---------

Signed-off-by: Kevin Ingelman <ki@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for setting the ruler's rule query offset param
4 participants