Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gauges appear to randomly jump around in Datadog and Prometheus #15683

Closed
derekhuizhang opened this issue Dec 21, 2022 · 5 comments
Closed

Gauges appear to randomly jump around in Datadog and Prometheus #15683

derekhuizhang opened this issue Dec 21, 2022 · 5 comments
Labels
type: bug A code related bug.

Comments

@derekhuizhang
Copy link

derekhuizhang commented Dec 21, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Gauges appear to randomly jump around in Datadog and Prometheus.

Steps to reduce:

  1. Download and build (./build.sh): https://github.com/derekhuizhang/statsd-firehose
  2. Run a Vector instance listening on 8125
  3. Run the firehose for gauges with the random flag enabled, I've tried this with 10, 100, or 1000 gauges per second:
./statsd-firehose -countcount 0 -distcount 0 -histcount 0 -gaugecount 1 -gaugefreq 10 -random
./statsd-firehose -countcount 0 -distcount 0 -histcount 0 -gaugecount 1 -gaugefreq 100 -random
./statsd-firehose -countcount 0 -distcount 0 -histcount 0 -gaugecount 1 -gaugefreq 1000 -random

Result in Datadog:
Screen Shot 2022-12-21 at 1 28 13 PM

Result in Grafana Prometheus:
Screen Shot 2022-12-21 at 1 33 24 PM

However, if you check in the console, you will notice that the actual values are usually between -5 and 5. They never go below -20. Thus, these values should never be sent to Prometheus or Datadog.

I've also confirmed that I don't see the same behavior with datadog-agent.

Configuration

data_dir = "/var/lib/vector"

[api]
enabled = true

[sources.statsd_metrics]
type = "statsd"
mode = "udp"
address = "127.0.0.1:8125"

[transforms.aggregate_metrics]
type = "aggregate"
inputs = ["statsd_metrics"]
interval_ms = 10000

[sinks.datadog]
type = "datadog_metrics"
default_api_key = <redacted>
inputs = [ "aggregate_metrics" ]

[sinks.prometheus]
type = "prometheus_remote_write"
endpoint = <redacted>
inputs = [ "aggregate_metrics" ]

<redacted TLS info>

[sinks.console]
type = "console"
target = "stdout"
inputs = [ "aggregate_metrics" ]

[sinks.console.encoding]
codec = "json"

Version

0.26.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@derekhuizhang derekhuizhang added the type: bug A code related bug. label Dec 21, 2022
@derekhuizhang
Copy link
Author

derekhuizhang commented Dec 21, 2022

Hm, so I think this may be related to the way negative gauges are considered deltas, rather than as independent values. It appears what is happening is that positive values are being reported fine, and then negative values are being added as deltas on top of the last negative value.

https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges

  1. My guess is that Vector is just submitting negative gauge values as delta on top of the last submitted negative gauge values for DD/Prometheus. This explains why the negative values seem to jump up back to 0 when I restart Vector before becoming more and more negative again. However, this is unexpected behavior, as we end up sending very wrong-looking gauge values.
  2. Another possible explanation is that this just the default behavior of the Datadog/Prometheus API to just add negatives to the delta of the last submitted negative gauge value if a negative gauge value is submitted? If so, that's very weird and doesn't seem to be documented anywhere

@tobz
Copy link
Contributor

tobz commented Dec 21, 2022

Hey there!

Unfortunately, this is exactly how the statsd source is intended to work. When values are not sign-prefixed, they represent a "set" operation for gauges, whereas when they are prefixed with the sign, it indicates treating the value as a delta against the previous gauge value. This can be seen here in the original statsd source: https://github.com/statsd/statsd/blob/master/stats.js#L307-L311, but is also called out in the metrics type document that you linked.

@tobz tobz closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2022
@derekhuizhang
Copy link
Author

Gotcha, I think the confusion came because my statsd firehose is sending in dogstatsd, which supports negative gauges, whereas statsd parses the negative sign before the gauges differently.

  1. Is there any plan to support dogstatsd as a source, or a configuration option with statsd?
  2. If not, is there a workaround I can use to get the negative gauges to appear in DD or Prometheus?

I want to say I can use the lua transform, but I'm not sure if this would work if the statsd transformation (eg. the conversion to a delta) happens at the sink sending level or the source level, given that we are supposed to be sending dogstatsd to Datadog. Are there any other alternatives or ideas you can think of?

@derekhuizhang
Copy link
Author

I managed to figure it out in the Lua transform, but would still appreciate if we could get dogstatsd as an option!

@jszwedko
Copy link
Member

Glad you got it working @derekhuizhang ! I opened #15741 to track dogstatsd support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants