-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send metrics from oldest to newest, always #5633
Comments
I would like the batch to be old to new, but I disagree that the points should always be sent older to newer when doing catch-up. If you have downtime it seems to me that you would want to fill in the latest data immediately and then process the backlog. This way you do not need to wait for current data. I do think there is an argument for processing the backlog in either order, you might want more recent data first because its more temporally relevant, or perhaps you want it in order for some of the reasons you mentioned. This is particularly true when using certain outputs, some would prefer old to new ordering, for example the prometheus output or the stackdriver output would always want data in add order. The current ordering is the cause of #5598. I also think some inputs would prefer their data be sent in order, for example when reading from a queue consumer input old to new seems to make the most sense. I haven't thought about any solutions for this, normally with these plugins you should set a lower batch size and limit the input to the size of the batch using What I have been thinking to modify for 1.11 is to provide old-to-new ordering within the batch, and agent level and per output control over backlog order. Ordering in all cases means "add order", not timestamp order. I have some design bits worked out on paper planning a rewrite of the metric buffer code, but that's as far as I have gotten so far.
Right now they should always be consistently newer to older. |
@PierreF Sorry, I'm going to have to push this to 1.12, I haven't been able to complete the work yet. Still very high on my list of tasks though. |
Even with the recentish changes to the Stackdriver output, we are still seeing regular out-of-order errors:
Interestingly enough these errors often seem to occur with exactly 1h intervals. Config as below.
When this happens, metrics end up being >5 min late, resulting in triggered alarms and other shenanigans. @danielnelson Do you think this comes from the same issue or should I open a new one? |
Can you open a new issue, this looks like something else. |
|
The old-to-new ordering is crucial in some cases. For example, InfluxDB guarantees that if datapoints are written for the same timestamp, the last written value will be the final result. When am Influx output is coupled to a Kafka input through Telegraf, and the Kafka input contains aggregations - which can contain multiple values for the same timestamp, one for each "recalculation" when new data is processed for the aggregation time window, the ordering must be preserved, or the end result in Influx is uncertain. #6784 seems to correspond to this bug. |
I have problem with newer-to-older ordering when buffer is not empty and contains more than one gather cycle in TICK Stack. My scenario:
The newest metric (for which state duration started counting) from batch has It'd be great to have configuration option in Telegraf for older-to-newer and newer-to-older orderings. Is there any known workaround or set of configuration options to achieve older-to-newer order? (except downgrading Telegraf to 1.9.2) |
@PierreF Sorry, I haven't been able to do this like I thought I would be able to. If your offer to open a PR to swapping the order of the batches still stands, so that batches are in ascending order, I'd definitely take you up on it. |
Feature Request
Proposal:
Telegraf should always send metrics point from the older to the newer. During "normal" behavior and while catch-up backlog if an output become slow.
Current behavior:
Since #5287, when buffer is not empty and contains more than one gather cycle, metrics are sent from newest to older.
Desired behavior:
Batch() should send oldest metrics and metrics in the batch should be in increasing order of age.
We should still drop the oldest metrics if the buffer become full, so #5194 is still fixed
Use case:
Currently metric points are not in consistent order. When buffer is empty, the order is from older to newer, but if buffer start to fill, batch with be in the opposite order (newer to older).
In practice, the buffer while often contains few metrics from previous gather cycle, so it may happen even if output is not down.
At the end, that means that the output could not rely on having metrics ordered, which was (mosly) true before #5287.
Having metrics ordered allow to easily do some transformation on the fly (e.g. difference between two point to compute the rate) and generally make working with time series easier.
If this change seems good, I could come with a PR.
The text was updated successfully, but these errors were encountered: