Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basicstats aggregators plugin's diff and non_negative_diff provide misleading results #9715

Closed
m0 opened this issue Sep 3, 2021 · 4 comments
Labels
area/exec bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution size/m 2-4 day effort waiting for response waiting for response from contributor

Comments

@m0
Copy link
Contributor

m0 commented Sep 3, 2021

To my understanding, the basicstats aggregators plugin's diff and non_negative_diff provide at least misleading results. This might also apply to the rate stat which is computed based on diff.

The behaviour I expect from a diff is to provide the difference considering all measurements. Unfortunately the current implementation only provides diffs from the first to the last measurement of a timeframe that is shifted without overlapping (the last measurement of the previous window is not included). With A, B, C, D, E and F being measurements

A              B              C              D              E              F
|<- period 1 ->|<- period 2 ->|<- period 3 ->|<- period 4 ->|<- period 5 ->|

we currently only get a diff_value for the changes from A to B, C to D and E to F, but we are missing the changes from B to C and D to E! Taking only the current period into account is fine / required for stats like sum, max and count, but I think it does not make sense for deriving stats like diff and rate.

When we implemented the diff stat, I did not realize that the basicstats plugin exclusively moves the sliding window for measurements to be considered.

If there are usecases for the current behaviour, I am very interested in learning about those. Otherwise this behaviour should be regarded as a bug.

To my current understanding I assume this plugin should not implement deriving stats like diff and rate as this is not how the concept of the plugin works. Maybe there should be an extra plugin providing those stats or the existing derivative aggregator plugin could be extended accordingly.

Relevant telegraf.conf:

[agent]
  interval = "3s"
  flush_interval = "3s"
  omit_hostname = true
[[outputs.file]]
  files = ["stdout"]
  data_format = "influx"
[[aggregators.basicstats]]
  period = "6s"
  drop_original = false
  stats = ["count","min","max","diff"]
[[inputs.exec]]
  commands = [
    "bash -c 'echo $(($RANDOM % 10))'"
  ]
  data_format = "value"

System info:

telegraf version 1.19.3

Steps to reproduce:

start telegraf with the above config (telegraf --config telegraf.conf)

Actual behavior:

2021-09-03T09:25:02Z I! Starting Telegraf 1.19.3
2021-09-03T09:25:02Z I! Loaded inputs: exec
2021-09-03T09:25:02Z I! Loaded aggregators: basicstats
2021-09-03T09:25:02Z I! Loaded processors: 
2021-09-03T09:25:02Z I! Loaded outputs: file
2021-09-03T09:25:02Z I! Tags enabled: 
2021-09-03T09:25:02Z I! [agent] Config: Interval:3s, Quiet:false, Hostname:"", Flush Interval:3s
exec value=0i 1630661103000000000
exec value_count=1,value_min=0,value_max=0 1630661106000000000
exec value=7i 1630661106000000000
exec value=0i 1630661109000000000
exec value_count=2,value_min=0,value_max=7,value_diff=-7 1630661112000000000
exec value=3i 1630661112000000000
exec value=8i 1630661115000000000
exec value_diff=5,value_count=2,value_min=3,value_max=8 1630661118000000000
exec value=8i 1630661118000000000
exec value=5i 1630661121000000000
exec value_diff=-3,value_count=2,value_min=5,value_max=8 1630661124000000000
exec value=4i 1630661124000000000
exec value=9i 1630661127000000000
exec value_max=9,value_diff=5,value_count=2,value_min=4 1630661130000000000
exec value=5i 1630661130000000000
exec value=2i 1630661133000000000
exec value_count=2,value_min=2,value_max=5,value_diff=-3 1630661136000000000
exec value=8i 1630661136000000000
exec value=5i 1630661139000000000
^C2021-09-03T09:25:43Z I! [agent] Hang on, flushing any cached metrics before shutdown
exec value_count=2,value_min=5,value_max=8,value_diff=-3 1630661142000000000
exec value=3i 1630661142000000000
exec value_count=1,value_min=3,value_max=3 1630661144000000000
2021-09-03T09:25:43Z I! [agent] Stopping running outputs

Expected behavior:

Besides the provided value_diff entries, e.g. from 1630661106000000000 to 1630661109000000000 resulting in value_diff=-7 and from 1630661112000000000 to 1630661115000000000 resulting in value_diff=5 I would have expected additional value_diff entries, e.g. for the window from 1630661109000000000 to 1630661112000000000.

@m0 m0 added the bug unexpected problem or unintended behavior label Sep 3, 2021
@powersj powersj added the help wanted Request for community participation, code, contribution label Apr 25, 2022
@powersj
Copy link
Contributor

powersj commented Apr 25, 2022

next steps: look into writing a unit test to demonstrate this behavior and then look into what would be required to fix it

@m0 looks like you landed these features originally and have discovered the odd behavior as well. Do you have any thoughts on what needs to change to correct the behavior?

@powersj powersj added the size/m 2-4 day effort label Aug 9, 2022
@m0
Copy link
Contributor Author

m0 commented Sep 21, 2022

@powersj I think there are two issues that need to be considered:

  1. remove the broken implementation from the basicstats plugin as to my understanding it seems to be beyond the plugin's scope
  2. find a better place to implement the diff / non_negative_diff feature (maybe it even needs a separate plugin)

PS: Sorry for the late response, the notification about your post got lost on my side. 😞

@DStrand1
Copy link
Member

@m0 is this still an issue on the latest version of telegraf?

@DStrand1 DStrand1 added the waiting for response waiting for response from contributor label Oct 28, 2024
@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/exec bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution size/m 2-4 day effort waiting for response waiting for response from contributor
Projects
None yet
Development

No branches or pull requests

3 participants