-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should step size conversions result in a warning? #356
Comments
Being able to specify the step size is useful when the metrics being stored are at different intervals than the step-size of Atlas. It's useful for Grafana integration also :) For example: Running a synthetic transaction every 2 minutes results in metrics being ingested every 2 minutes. When you look at the data at the 1 second interval, it looks like "holes", but at 2 minutes the results are correct. Any holes are "true" misses. I was surprised when my queries to Atlas auto-jumped from 1s to 5s when I specified a step-size of 2s, and ended up modifying the code to allow 1s,2s,3s,4s,5s (and a few more) Thanks! Brian |
Thanks for the comment. How does Grafana use that setting?
We typically avoid doing that. One example is cloudwatch S3 metrics that we import which get updated once a day. We report them into Atlas at minute level which has a number of benefits:
There is a bit of overhead with this, but for us it hasn't come up much and it will get compressed to a constant block in storage so the overhead isn't that high. For use-cases where we do need different step sizes we run those as separate stacks, we don't mix them in the same instance.
I haven't looked at the auto-selection in a while. In general they were selected to be evenly divisible to common time units. For example, we wouldn't want 7 because if I have 1m blocks it would cross the boundaries for a consolidated data point. We also reduced the number of available options to improve caching behavior. We could probably make it configurable. |
When you expand or decrease the time range of the metric you are viewing the Grafana datasource plugin for Atlas adjusts the step size. It doesn't "have to" do that, but that's how it was implemented. What I see for the step-size question is this scenario: I have a metric collection script (a synthetic transaction really) that can take more than a minute to execute, but always less than 2 minutes. This script is scheduled in Sensu to run every two minutes, any timeouts are "nulls" in Atlas. I then have a check that queries Atlas (via Sensu) for the metric value, with a step size of 2 minutes, and alerts if there is a null, or if the metric exceeds a threshold value. This check is also run every 2 minutes. The cloudwatch example I can understand - that's a bulk load of historical "minute stepped" data, but in my case it's always "2 minute stepped" data. It's nice to be able to set my step size to 2minutes can get back a clean series of data, and if there are nulls, they are always timeouts. Thanks for all the hard work on Atlas, it's working great for us :) |
Seems sensible to warn about, if it's specified explicitly. AFAIK our UIs don't add step= unless a user specifies it, which is pretty rare. Most of the time, it's trying to get a step size smaller than the minimum dictated by the time interval so it's probably best to be straightforward that it isn't going to work. The UI could also warn more directly but currently I don't think there's a sound way for the UI to know the minimum step size for a given interval, plus there are still plenty of queries produced manually. |
Currently if a user enters an invalid step size it will get silently converted to the next valid step. Should this result in a warning?
The side question here is usage of step in general. It is generally considered deprecated for direct use by the user.
The text was updated successfully, but these errors were encountered: