Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix query offset issue on individual rule groups #6131

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

klingerf
Copy link
Contributor

This is a follow-up to #6085, which added support for setting the query_offset field on individual recording rule groups, as well as a per-tenant ruler_query_offset limit that should be used when no individual recording rule group offset is set.

It turns out that compatibility code to convert from a protobuf RuleGroup to a prometheus RuleGroup was coercing null-value query offsets to explicit 0s, which meant that no rule groups would ever fall back to the per-tenant offset.

This PR fixes that issue, and it cleans up handling of the query offset in a few other ruler files.

Testing

To test this locally, I started cortex with a 1m per-tenant ruler_query_offset limit:

$ curl -s http://localhost:9009/config | grep query_offset
  ruler_query_offset: 1m

Then I created some recording rules that did not have their query_offset field set:

$ curl -sH "X-Scope-OrgID: oqCK3ORVEuV7kFT7" localhost:9009/api/v1/rules | grep rules: | wc -l
      51
$ curl -sH "X-Scope-OrgID: oqCK3ORVEuV7kFT7" localhost:9009/api/v1/rules | grep query_offset: | wc -l
       0

By contrast, for the version of cortex that's on master right now, with the same setup, I see:

$ curl -sH "X-Scope-OrgID: oqCK3ORVEuV7kFT7" localhost:9009/api/v1/rules | grep rules: | wc -l
      51
$ curl -sH "X-Scope-OrgID: oqCK3ORVEuV7kFT7" localhost:9009/api/v1/rules | grep query_offset: | wc -l
      51
$ curl -sH "X-Scope-OrgID: oqCK3ORVEuV7kFT7" localhost:9009/api/v1/rules | grep query_offset: | head -n5
      query_offset: 0s
      query_offset: 0s
      query_offset: 0s
      query_offset: 0s
      query_offset: 0s

Having query_offset set in that API response means that the ruler never falls back to the per-tenant offset, as intended.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

This is a follow-up to cortexproject#6085, which added support for setting the
`query_offset` field on individual recording rule groups, as well as a
per-tenant `ruler_query_offset` limit that should be used when no
individual recording rule group offset is set.

It turns out that compatibility code to convert from a protobuf
RuleGroup to a prometheus RuleGroup was coercing null-value query
offsets to explicit 0s, which meant that no rule groups would ever
fall back to the per-tenant offset.

This PR fixes that issue, and it cleans up handling of the query offset
in a few other ruler files.

Signed-off-by: Kevin Ingelman <ki@buoyant.io>
@klingerf klingerf changed the title Fix query offset issue on indvidual recording rules Fix query offset issue on individual rule groups Jul 30, 2024
@yeya24
Copy link
Contributor

yeya24 commented Jul 31, 2024

Thanks for testing this change actively and the PR @klingerf. I think we shouldn't add query_offset to the API. It is the same behavior as what Prometheus does. query_offset is not exposed as part of the list rules API.

https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L1347

Good catch of the query_offset: 0s issue!

Signed-off-by: Kevin Ingelman <ki@buoyant.io>
@pull-request-size pull-request-size bot added size/S and removed size/M labels Jul 31, 2024
@klingerf
Copy link
Contributor Author

klingerf commented Jul 31, 2024

@yeya24 Great, thanks for reviewing. I've backed out the API response change.

Comment on lines +1069 to 1071
QueryOffset: group.QueryOffset,
},
// We are keeping default value for EvaluationTimestamp and EvaluationDuration since the backup is not evaluating
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeya24 I also wasn't sure about this change, but added the new field for consistency. That said, the comment below this says:

We are keeping default value for EvaluationTimestamp and EvaluationDuration since the backup is not evaluating

So maybe it deserves to be treated like EvaluationTimestamp and EvaluationDuration for the purpose of backups? If that's the case I can revert this change and update the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the backup stores the rule inmemory to serve List rules API requests only. Since query offset is not part of the response then we probably don't need to store that. I am also fine to store it incase it is used in the future.

@rajagopalanand or @rapphil please correct me if I was wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we also use the same path to reply to rpc requests which will export the entire obj. This would add it to the rpc response.
https://github.com/cortexproject/cortex/blob/master/pkg/ruler/ruler.go#L1230

I am not sure how well used is the rpc api, but maybe we want to maintain both the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional info. So it sounds like this change is fine to leave as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we also use the same path to reply to rpc requests which will export the entire obj

This RPC seems only being used when listing rules, which doesn't need this field.
But let's leave it as is. There is another field used as response https://github.com/cortexproject/cortex/blob/master/pkg/ruler/api.go#L185 so we are good.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the contribution!

Copy link
Contributor

@danielblando danielblando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@danielblando danielblando merged commit df270ee into cortexproject:master Aug 6, 2024
16 checks passed
@klingerf klingerf deleted the ki/offset-fix branch August 6, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants