Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(query engine): Include lines with ts equal to end timestamp of the query range when executing range aggregations #13448

Merged
merged 3 commits into from
Jul 11, 2024

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Jul 8, 2024

What this PR does / Why we need it

Background

When performing range vector aggregations, such as count_over_time({env="dev"}[1h]), the query range is divided into multiple steps at which the aggregation operation (e.g. counting the log lines) is evaluated.
Each step starts at current step - step interval and ends at current step, as depicted in the following chart. The select range for the logs is extended by the step interval into the past, in order to select logs for calculating the first step.

screenshot_20240711_092352

However, the select range for logs is start inclusive and end exclusive (written as [start, end)), but the evaluation of the steps for the range aggregation is start exclusive and end inclusive (written as (start, end]).

This leads to the problem that the very first timestamp at the beginning of the select range and the very last timestamp at the end of the select range are not included in the range aggregation. The "missing" last timestamp is not a problem, because a) in an instant query it is not supposed to be included anyway because of the [start, end) inclusivity of the query range and b) in a range query the last point of the previous step will be part of the next step evaluation.

Issue

The missing first timestamp, however, gets problematic when executing an instant query and the log timestamps are exactly at the start of the query range. This can happen when the query is split in the query frontend into multiple smaller time ranges, e.g. 1h, 30m, ...
Since the sub queries are executed independently on the queriers, all logs that have a timestamp exactly a multiple of the split interval, e.g. 00:00, 01:00, 02:00, ... for a 1h interval, are dismissed and therefore missing in the query result over the full time range of the original query.

Fix

In order to avoid the missing logs that have a timestamp a multiple of the split interval in instant queries, we need to adjust the query range for logs to also include the end timestamp (written as [start, end]). This is done by adding a "leap nanosecond" to the end timestamp of the log select range. This ensures that the included end timestamp of the step evaluation is also included in the log selection.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@chaudum chaudum requested a review from a team as a code owner July 8, 2024 15:42
@chaudum chaudum requested a review from owen-d July 8, 2024 15:42
Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be noted as a breaking change, and/or require a call out in the upgrading guide?

@chaudum
Copy link
Contributor Author

chaudum commented Jul 9, 2024

Should this be noted as a breaking change, and/or require a call out in the upgrading guide?

This is not a breaking change, but a fix to include chunks that were otherwise dismissed because the overlap check failed.

@chaudum chaudum marked this pull request as draft July 9, 2024 08:35
@pull-request-size pull-request-size bot added size/M and removed size/S labels Jul 9, 2024
@chaudum chaudum changed the title fix: Include start timestamp in overlap check fix: Include lines with ts equal to end timestamp of the query range when executing range aggregations Jul 9, 2024
chaudum added 3 commits July 9, 2024 17:35
So that the range is start inclusive and end is exclusive.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum force-pushed the chaudum/fix-zero-range-bounds-check-2 branch from c778d52 to 6cb62f2 Compare July 9, 2024 15:35
@chaudum chaudum requested a review from trevorwhitney July 9, 2024 19:59
Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@@ -115,7 +115,7 @@ func (i *MultiIndex) forMatchingIndices(ctx context.Context, from, through model
queryBounds := newBounds(from, through)

return i.iter.For(ctx, i.maxParallel, func(ctx context.Context, idx Index) error {
if Overlap(queryBounds, idx) {
if Overlap(idx, queryBounds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: wondering if any other places of Overlap usage need this swap of arguments as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, but all other places already use the Overlap function with the correct argument order.

@chaudum chaudum marked this pull request as ready for review July 11, 2024 08:00
@chaudum chaudum changed the title fix: Include lines with ts equal to end timestamp of the query range when executing range aggregations fix(query enging): Include lines with ts equal to end timestamp of the query range when executing range aggregations Jul 11, 2024
@chaudum chaudum changed the title fix(query enging): Include lines with ts equal to end timestamp of the query range when executing range aggregations fix(query engine): Include lines with ts equal to end timestamp of the query range when executing range aggregations Jul 11, 2024
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaudum chaudum merged commit e0ca67d into main Jul 11, 2024
60 checks passed
@chaudum chaudum deleted the chaudum/fix-zero-range-bounds-check-2 branch July 11, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants