Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in instant query splitting when using unwrapped rate #6348

Merged
merged 3 commits into from
Jun 22, 2022

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Jun 9, 2022

What this PR does / why we need it:

The range aggregation rate() supports both log ranges and unwrapped ranges, e.g.

rate({app="foo"} [$__interval]) and rate({app="foo"} | unwrap bar [$__interval])

Since rate() was split into multiple count_over_time() over total duration, but count_over_time() does not support unwrap, unwrapped rate queries caused panics.

This fix changes the splitting of rate({app="foo"} | unwrap bar [$__interval] into multiple sum_over_time() over total duration.

Signed-off-by: Christian Haudum christian.haudum@gmail.com

Which issue(s) this PR fixes:

Fixes #6344

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@chaudum chaudum requested a review from a team as a code owner June 9, 2022 08:34
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
-              logql	-0.1%
+               loki	0%

if expr.Left.Unwrap != nil {
// rate({app="foo"} | unwrap bar [2m])
// => (sum without (sum_over_time({app="foo"}[1m]) ++ sum_over_time({app="foo"}[1m]) offset 1m) / 120)
op = syntax.OpRangeTypeSum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused 😕 are we sure this is equivalent to sum_over_time?
In this case, it would also be useful to add a test to pkg/logql/downstream_test.go to compare with the normal engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should add a downstream test 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of counting log lines for the rate, it sums the individual unwrap values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Maybe not ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood unwrapped rate incorrectly. It does not sum the unwrapped values, but treats them as counter, which is treated as reset, when a subsequent value is smaller than the preceding one.
See
https://github.com/grafana/loki/blob/main/pkg/logql/range_vector.go#L227
and
https://github.com/grafana/loki/blob/main/pkg/logql/range_vector.go#L248-L256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for #6361 to revert rate calculation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssncferreira Rebased this PR with main and now the added tests pass :)

The range aggregation `rate()` supports both log ranges and unwrapped
ranges, e.g.

`rate({app="foo"} [$__interval])`
and
`rate({app="foo"} | unwrap bar [$__interval])`

Since `rate()` was split into multiple `count_over_time()` over total
duration, but `count_over_time()` does not support `unwrap`, unwrapped
rate queries caused panics.

This fix changes the splitting of `rate({app="foo"} | unwrap bar [$__interval]`
into multiple `sum_over_time()` over total duration.

Fixes #6344

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0.3%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@ssncferreira ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great job 🚀
Just a small nit on the test to ease readability

pkg/logql/rangemapper_test.go Outdated Show resolved Hide resolved
Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@slim-bean slim-bean merged commit 798677a into main Jun 22, 2022
@slim-bean slim-bean deleted the chaudum/issue-6344 branch June 22, 2022 14:38
@ssncferreira ssncferreira added the backport release-2.6.x Tag a PR with this label to create a PR which cherry pics it into the release-2.6.x branch label Jun 30, 2022
grafanabot pushed a commit that referenced this pull request Jun 30, 2022
* Fix panic in instant query splitting when using unwrapped rate

The range aggregation `rate()` supports both log ranges and unwrapped
ranges, e.g.

`rate({app="foo"} [$__interval])`
and
`rate({app="foo"} | unwrap bar [$__interval])`

Since `rate()` was split into multiple `count_over_time()` over total
duration, but `count_over_time()` does not support `unwrap`, unwrapped
rate queries caused panics.

This fix changes the splitting of `rate({app="foo"} | unwrap bar [$__interval]`
into multiple `sum_over_time()` over total duration.

Fixes #6344

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

* Add tests

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

* Integrate review feedback

Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com>

Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com>
(cherry picked from commit 798677a)
ssncferreira pushed a commit that referenced this pull request Jun 30, 2022
#6557)

* Fix panic in instant query splitting when using unwrapped rate

The range aggregation `rate()` supports both log ranges and unwrapped
ranges, e.g.

`rate({app="foo"} [$__interval])`
and
`rate({app="foo"} | unwrap bar [$__interval])`

Since `rate()` was split into multiple `count_over_time()` over total
duration, but `count_over_time()` does not support `unwrap`, unwrapped
rate queries caused panics.

This fix changes the splitting of `rate({app="foo"} | unwrap bar [$__interval]`
into multiple `sum_over_time()` over total duration.

Fixes #6344

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

* Add tests

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

* Integrate review feedback

Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com>

Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com>
(cherry picked from commit 798677a)

Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-2.6.x Tag a PR with this label to create a PR which cherry pics it into the release-2.6.x branch size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic due to unwrap in mapped range vector in query planning
4 participants