Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Sum values in unwrapped rate aggregation instead of treating them as counter #6351

Open
chaudum opened this issue Jun 9, 2022 · 2 comments
Labels
component/logql rfc Request For Comment

Comments

@chaudum
Copy link
Contributor

chaudum commented Jun 9, 2022

Is your feature request related to a problem? Please describe.

When working on #6344 (#6348#discussion_r893370459) I discovered that the unwrapped rate aggregation (rate({} | unwrap x [$__interval]) behaves differently than expected.

The documentation on unwrapped range aggregations states:

rate(unwrapped-range): calculates per second rate of all values in the specified interval.

This clearly suggests that the individual values extracted from the log lines in a time range are summed up and divided by the range to get the rate per second.

The initial implementation before #5013 also was like that.

loki/pkg/logql/functions.go

Lines 126 to 137 in a046d79

func rateLogs(selRange time.Duration, computeValues bool) func(samples []promql.Point) float64 {
return func(samples []promql.Point) float64 {
if !computeValues {
return float64(len(samples)) / selRange.Seconds()
}
var total float64
for _, p := range samples {
total += p.V
}
return total / selRange.Seconds()
}
}

For example, executing rate({} | logfmt | unwrap subqueries [10s]) on following logs lines:

ts | line
---|---------------------------------------------------
0  | msg="something regarding subqueries" subqueries=4
1  | msg="something regarding subqueries" subqueries=3
2
3  | msg="something regarding subqueries" subqueries=5
4  | msg="something regarding subqueries" subqueries=2
5  | msg="something regarding subqueries" subqueries=8
6
7
8  | msg="something regarding subqueries" subqueries=2
9  | msg="something regarding subqueries" subqueries=3

One would expect to get as result 2.7, based on (4+3+5+2+8+2+3)/10.
However, what you actually get is 1.6, based on (3-4+4+5+8)/10. This is due to the implementation of rate() when using Counter type metrics, that has been copied from Prometheus.

// extrapolatedRate function is taken from prometheus code promql/functions.go:59
// extrapolatedRate is a utility function for rate/increase/delta.
// It calculates the rate (allowing for counter resets if isCounter is true),
// extrapolates if the first/last sample is close to the boundary, and returns
// the result as either per-second (if isRate is true) or overall.
func extrapolatedRate(samples []promql.Point, selRange time.Duration, isCounter, isRate bool) float64 {
// No sense in trying to compute a rate without at least two points. Drop
// this Vector element.
if len(samples) < 2 {
return 0
}
var (
rangeStart = samples[0].T - durationMilliseconds(selRange)
rangeEnd = samples[len(samples)-1].T
)
resultValue := samples[len(samples)-1].V - samples[0].V
if isCounter {
var lastValue float64
for _, sample := range samples {
if sample.V < lastValue {
resultValue += lastValue
}
lastValue = sample.V
}
}

However, Loki does not have the notion of "counters".

The reason why the implementation was changed, was because of a tiny subset of common use of the rate() aggregation, where the value in the log lines acts as a "counter" (which can reset to 0), as explained in the PR description of #5013.

Describe the solution you'd like

I propose to revert the implementation of the rate to the previous behaviour.
Additionally, to cover the special use-case of a counter value, LogQL can provide a new aggregation function counter_rate(), which takes an unwrapped range as input and treats it as a counter.

Example:

counter_rate({app="foo"} | logfmt | unwrap counter [1m])

Describe alternatives you've considered

Leave the implementation as is and change the documentation. This will probably cause a lot of confusion to people how want to rate "normal" values.

@owen-d
Copy link
Member

owen-d commented Jun 9, 2022

Agreed. I think treating the underlying as counters is a niche use case and shouldn't be the default

@slim-bean
Copy link
Collaborator

Very well summarized @chaudum!

I support your solution as well, however I would name it rate_counter because I think it's more discoverable when you are typing a query and know you want to rate something vs trying to remember that it's counter_rate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/logql rfc Request For Comment
Projects
None yet
Development

No branches or pull requests

3 participants