-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic in instant query splitting when using unwrapped rate #6348
Conversation
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.1%
+ loki 0% |
if expr.Left.Unwrap != nil { | ||
// rate({app="foo"} | unwrap bar [2m]) | ||
// => (sum without (sum_over_time({app="foo"}[1m]) ++ sum_over_time({app="foo"}[1m]) offset 1m) / 120) | ||
op = syntax.OpRangeTypeSum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused 😕 are we sure this is equivalent to sum_over_time
?
In this case, it would also be useful to add a test to pkg/logql/downstream_test.go
to compare with the normal engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should add a downstream test 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of counting log lines for the rate, it sums the individual unwrap values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Maybe not ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understood unwrapped rate incorrectly. It does not sum the unwrapped values, but treats them as counter, which is treated as reset, when a subsequent value is smaller than the preceding one.
See
https://github.com/grafana/loki/blob/main/pkg/logql/range_vector.go#L227
and
https://github.com/grafana/loki/blob/main/pkg/logql/range_vector.go#L248-L256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for #6361 to revert rate
calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssncferreira Rebased this PR with main
and now the added tests pass :)
The range aggregation `rate()` supports both log ranges and unwrapped ranges, e.g. `rate({app="foo"} [$__interval])` and `rate({app="foo"} | unwrap bar [$__interval])` Since `rate()` was split into multiple `count_over_time()` over total duration, but `count_over_time()` does not support `unwrap`, unwrapped rate queries caused panics. This fix changes the splitting of `rate({app="foo"} | unwrap bar [$__interval]` into multiple `sum_over_time()` over total duration. Fixes #6344 Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
6b1b1cf
to
ef72a4c
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great job 🚀
Just a small nit on the test to ease readability
Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com>
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
* Fix panic in instant query splitting when using unwrapped rate The range aggregation `rate()` supports both log ranges and unwrapped ranges, e.g. `rate({app="foo"} [$__interval])` and `rate({app="foo"} | unwrap bar [$__interval])` Since `rate()` was split into multiple `count_over_time()` over total duration, but `count_over_time()` does not support `unwrap`, unwrapped rate queries caused panics. This fix changes the splitting of `rate({app="foo"} | unwrap bar [$__interval]` into multiple `sum_over_time()` over total duration. Fixes #6344 Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Add tests Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Integrate review feedback Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com> Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com> (cherry picked from commit 798677a)
#6557) * Fix panic in instant query splitting when using unwrapped rate The range aggregation `rate()` supports both log ranges and unwrapped ranges, e.g. `rate({app="foo"} [$__interval])` and `rate({app="foo"} | unwrap bar [$__interval])` Since `rate()` was split into multiple `count_over_time()` over total duration, but `count_over_time()` does not support `unwrap`, unwrapped rate queries caused panics. This fix changes the splitting of `rate({app="foo"} | unwrap bar [$__interval]` into multiple `sum_over_time()` over total duration. Fixes #6344 Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Add tests Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Integrate review feedback Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com> Co-authored-by: Susana Ferreira <susana.ferreira@grafana.com> (cherry picked from commit 798677a) Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
What this PR does / why we need it:
The range aggregation
rate()
supports both log ranges and unwrapped ranges, e.g.rate({app="foo"} [$__interval])
andrate({app="foo"} | unwrap bar [$__interval])
Since
rate()
was split into multiplecount_over_time()
over total duration, butcount_over_time()
does not supportunwrap
, unwrapped rate queries caused panics.This fix changes the splitting of
rate({app="foo"} | unwrap bar [$__interval]
into multiplesum_over_time()
over total duration.Signed-off-by: Christian Haudum christian.haudum@gmail.com
Which issue(s) this PR fixes:
Fixes #6344
Special notes for your reviewer:
Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md