Are huge amounts of 'sample timestamp out of order' logs normal? #3411

jakubgs · 2020-10-28T10:58:32Z

Description

Every time I modify Cortex configuration and restart the nodes they generate ungodly amounts of logs like this:

msg="push error" err="rpc error: code = Code(400) desc = user=fake: sample timestamp out of order; last timestamp: ...

I assume this is because the upstream Prometheus instance is re-trying pushing of the metrics that failed when node was down.

It generates quite a lot of them...

 > sudo journalctl -a -u cortex | grep 'sample timestamp out of order' | wc -l
173476

Questions

Is Cortex incapable of ingesting old metrics re-pushed by Prometheus? Or am I doing something wrong?
If Cortex is incapable of ingesting old metrics, why is this an error rather than a warning or even a debug message?
Can I stop this specific message from spamming my logs somehow?

The text was updated successfully, but these errors were encountered:

pstibrany · 2020-11-04T15:36:02Z

msg="push error" err="rpc error: code = Code(400) desc = user=fake: sample timestamp out of order; last timestamp: ...

This happens when Prometheus sends old samples for series, for which Cortex already has newer samples. Cortex can only ingest samples for the series in order.

I assume this is because the upstream Prometheus instance is re-trying pushing of the metrics that failed when node was down.

On restart, Prometheus tries to push all data from its WAL to remote write. It doesn't remember how much it has pushed before the restart.

Another source of "sample timestamp out of order" is when running multiple Prometheus servers pushing the same metrics, without proper HA deduplication configuration in Cortex (https://cortexmetrics.io/docs/guides/ha-pair-handling/).

It generates quite a lot of them...
 > sudo journalctl -a -u cortex | grep 'sample timestamp out of order' | wc -l
173476
Questions

Is Cortex incapable of ingesting old metrics re-pushed by Prometheus? Or am I doing something wrong?

See above.

If Cortex is incapable of ingesting old metrics, why is this an error rather than a warning or even a debug message?

It can indicate frequent Prometheus restarts or misconfiguration on Cortex side.

Can I stop this specific message from spamming my logs somehow?

Not that I know of. But you're right that it's logging too much, and we can improve it.

jakubgs · 2020-11-04T15:38:42Z

Thanks for explaining, but this lasts for a LONG time. I've restarted my Cortex 11 minutes ago and the logs are still full of these messages. How long should it take them to stop?

pstibrany · 2020-11-04T15:40:18Z

That depends on the cause of this problem. If you have multiple Prometheus servers sending the same series (common in HA setup), then it will only stop once Cortex is properly configured. See the linked article above.

jakubgs · 2020-11-04T15:42:42Z

No, I do not have a setup like that. I have a single Prometheus instance per DC, and there should be no duplicates.

For example. I restarted 2 out of 3 Cortex instances. The two restarted ones are spamming logs with copious amounts of sample timestamp out of order while the one that was not restarted is not.

pstibrany · 2020-11-04T15:48:06Z

I have a single Prometheus instance per DC

Do you have multiple DCs? Is it possible that they are sending the same series to Cortex? Those sample timestamp out of order messages include full series information, which can be used to track this down. Can you share some of them?

pstibrany · 2020-11-04T15:49:08Z

You may want to use external_labels in your Prometheus config to fix this (see https://prometheus.io/docs/prometheus/latest/configuration/configuration/)

jakubgs · 2020-11-04T15:50:37Z

No, it is not possible. Examples:

{__name__=\"netdata_services_cpu_percentage_average\", chart=\"services.cpu\", datacenter=\"gc-us-central1-a\", dimension=\"systemd-journald\", family=\"cpu\", fleet=\"thehive.prod\", group=\",monitor,metrics-source,\", instance=\"slave-02.gc-us-central1-a.thehive.prod\", job=\"netdata\", source=\"slave-01.gc-us-central1-a.metrics.hq\"}
{__name__=\"netdata_cgroup_writeback_MiB_average\", chart=\"cgroup_elasticsearch_hq.writeback\", datacenter=\"gc-us-central1-a\", dimension=\"dirty\", family=\"mem\", fleet=\"thehive.prod\", group=\",monitor,metrics-source,\", instance=\"slave-02.gc-us-central1-a.thehive.prod\", job=\"netdata\", source=\"slave-01.gc-us-central1-a.metrics.hq\"}
{__name__=\"netdata_ip_tcpreorders_packets_persec_average\", chart=\"ip.tcpreorders\", datacenter=\"aws-eu-central-1a\", dimension=\"fack\", family=\"tcp\", fleet=\"nimbus.test\", group=\",monitor,metrics-source,\", instance=\"node-04.aws-eu-central-1a.nimbus.test\", job=\"netdata\", source=\"slave-01.aws-eu-central-1a.metrics.hq\"}

Labels include fleets and hostnames. Each DC has one prometheus instance and each pulls just for that DC.

jakubgs · 2020-11-04T15:58:10Z

Looks like it calmed down for the most part on one of the hosts 30 minutes after restart. The other one still spamming.

I might add that all Prometheus instances push to the same cluster, but the one Cortex instance that has not been restarted does not log any warnings like that. Zero.

jakubgs · 2020-11-05T09:53:21Z

You may want to use external_labels in your Prometheus config to fix this

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ]

What would be the point of that if they are already labeled with fleet, datacenter, and instance?

The issue is not about uniqueness. If it was I would be getting these warnings ALL THE TIME. But I do not. I get them only when i restart a Cortex instance, and only on that instance. Why would only it get duplicates if my Prometheus instances have all 3 configured to push to?

jakubgs · 2020-11-05T09:54:26Z

Like right now, the instance I restarted yesterday is fine:

 > sudo journalctl -f -u cortex
-- Logs begin at Sun 2020-11-01 01:26:06 UTC. --
level=info ts=2020-11-05T09:35:20.375961889Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:37:20.376293268Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:39:20.376143806Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:41:20.376198511Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:43:20.376095958Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:45:20.376199033Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:47:20.376277278Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:49:20.376164123Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:51:20.376190972Z caller=table_manager.go:333 msg="synching tables" expected_tables=30
level=info ts=2020-11-05T09:53:20.376224666Z caller=table_manager.go:333 msg="synching tables" expected_tables=30

So if the issue was really that I get duplicates from different DCs these logs would be FULL of sample timestamp out of order.

Akhalaka · 2020-12-14T20:13:24Z

We are experiencing this issue as well. Same symptoms as described above; running perfectly happy for months and after having an issue with the etcd database, cortex was brought back into working order and now there are many "sample timestamp out of order for series" messages from all our ingesters (lasting for at least 6 hours now). According to the prometheus instance suspected for sending this traffic, its around 0.1% of series are being rejected based on the prometheus_remote_storage_failed_samples_total metric (but enough to trigger our alerting).

Unsure on if this is relevant, but an interesting note is that cortex is set up for multi-tenancy and as far as we can tell, all the series that get flagged for out of order are coming from our largest tenant and the prometheus mentioned above. All of the smaller users (by at least an order of magnitude) are not seeing any series rejected due to this out of order message.

bboreham · 2021-01-04T15:20:12Z

Today it was my turn to debug "sample timestamp out of order" errors.
My timestamps were about half a second apart, e.g. last timestamp: 1609766459.269, incoming timestamp: 1609766458.639.
There was only one Prometheus pod configured to send.

However, I discovered there were actually two Prometheus sending! Looking at prometheus_build_info, there was an extra one, in a pod that is not reported by Kubernetes.
I found that the older Prometheus was on a node that had "gone bad" somehow, showing NodeStatusUnknown; Kubelet stopped posting node status., so it seems Kubernetes gave up on the pod and started another Prometheus.
Restarting kubelet on the failed node brought it back, and the unwanted container was shut down.

portalsacc · 2021-02-09T17:26:51Z

I see the same issue, am getting the errors related to out of bounds and duplicate sample for timestamp. My cortex environment is still in POC stage, so I shutdown for an hour and redeployed. Even then I can still see these errors.

I would like to request answers for below questions,

The necessary arguments while deploying the ingester
I can see the datapoints in Grafana for every 15s, but I am expecting for every 60seconds. May I know which argument helps me to have data point every minute
Like Metrictank, do we also have something similar to storage-schemas

current architecture:
consul - distributor - ingester - querier - queryfrontend - memcached - storegateway and nginx
backend -- long term storage s3 (blocks)

pracucci · 2021-02-10T08:31:22Z

duplicate sample for timestamp

This typically happens when you have clashing series exported by the same exact exporter. We've seen cAdvisor exporting clashing series in some conditions.

portalsacc · 2021-02-10T20:14:22Z

If it is the clashing series from the same exporter, how to address that?
A separate telegraf microservice (the only writer to cortex) is exporting metrics to cortex as of now, I didn't get why I see the below errors:

Out of order sample
Duplicate sample for timestamp
Out of bounds

Also, can someone help me to ingest metrics for every minute interval (rather than datapoints for every 15s, I am interested in for every 60secs).

pracucci · 2021-02-19T17:26:08Z

Some general guidance.

Out of order sample

It means: given a single time-series, Cortex has received a sample with a timestamp older than the latest timestamp received. Cortex requires samples to pushed in timestamp order.

Duplicate sample for timestamp

It means: given a single time-series, Cortex has received two samples with the same exact timestamp but different value. It's not allowed in Cortex.

Out of bounds

It means: Cortex has received a sample with a timestamp "too far in the past" (eg. in the blocks storage you can ingest samples up to 1h old compared to the most recent timestamp received for the same tenant).

nehaev · 2021-03-03T23:55:41Z

Cortex requires samples to pushed in timestamp order

Are there any plans to completely eliminate or at least relax this limitation? The existing behavior is especially problematic for Loki as it leads to log records loss that can not be entirely fixed on the client side.

bboreham · 2021-03-04T09:41:36Z

No, timeseries are compressed by computing the delta from the previous sample, so re-ordering requires exponentially more work.
I don't see why this would constrain Loki though.

DandyDeveloper · 2021-05-13T01:04:38Z

@bboreham Just a side note to this, what's the general fix / expectation Prom side for these errors?

I have a tenancy proxy inbetween my Prometheus and Cortex, and when I have a bad series, this effectively backlogs and completely stop ingestion on the node until I have manually removed the series using the admin API on Prometheus.

Is there an expectation or option in Prometheus to drop the remote_write for a series if it gets too many failures?

In my case, I have a series that basically keeps sending and has flooded my proxy.

bboreham · 2021-05-13T06:36:09Z

@DandyDeveloper the message in the original description says Code(400) - such errors are dropped immediately by Prometheus and not re-tried. No reason for it to cause a backlog.

If your symptoms are different please open a different issue.

bboreham · 2021-05-20T15:14:25Z

Closing this as I don't think there is anything to fix; it can serve as documentation if people need to dig in to what "sample out of order" does or doesn't mean.

disfluxly · 2021-06-16T14:39:59Z

Some general guidance.

Out of order sample

It means: given a single time-series, Cortex has received a sample with a timestamp older than the latest timestamp received. Cortex requires samples to pushed in timestamp order.

Duplicate sample for timestamp

It means: given a single time-series, Cortex has received two samples with the same exact timestamp but different value. It's not allowed in Cortex.

Out of bounds

It means: Cortex has received a sample with a timestamp "too far in the past" (eg. in the blocks storage you can ingest samples up to 1h old compared to the most recent timestamp received for the same tenant).

@pracucci - Sorry to put this on a closed thread, but figured it fits for documentation purposes.

What is the "order" (pun intended) that these errors are raised in? For example, suppose I have a sample that is out of bounds, out of order, AND is a duplicate sample?

bboreham · 2021-06-17T09:35:42Z

A single timestamp can't be both duplicate and out of order.

Out of bounds is checked before out of order, here:

cortex/vendor/github.com/prometheus/prometheus/tsdb/head.go

Lines 1319 to 1357 in 12d1cb1

    
           if t < a.minValidTime { 
        
           	a.head.metrics.outOfBoundSamples.Inc() 
        
           	return 0, storage.ErrOutOfBounds 
        
           } 
        
           s := a.head.series.getByID(ref) 
        
           if s == nil { 
        
           	// Ensure no empty labels have gotten through. 
        
           	lset = lset.WithoutEmpty() 
        
           	if len(lset) == 0 { 
        
           		return 0, errors.Wrap(ErrInvalidSample, "empty labelset") 
        
           	} 
        
           	if l, dup := lset.HasDuplicateLabelNames(); dup { 
        
           		return 0, errors.Wrap(ErrInvalidSample, fmt.Sprintf(`label name "%s" is not unique`, l)) 
        
           	} 
        
           	var created bool 
        
           	var err error 
        
           	s, created, err = a.head.getOrCreate(lset.Hash(), lset) 
        
           	if err != nil { 
        
           		return 0, err 
        
           	} 
        
           	if created { 
        
           		a.series = append(a.series, record.RefSeries{ 
        
           			Ref:    s.ref, 
        
           			Labels: lset, 
        
           		}) 
        
           	} 
        
           } 
        
           s.Lock() 
        
           if err := s.appendable(t, v); err != nil { 
        
           	s.Unlock() 
        
           	if err == storage.ErrOutOfOrderSample { 
        
           		a.head.metrics.outOfOrderSamples.Inc() 
        
           	} 
        
           	return 0, err 
        
           }

disfluxly · 2021-06-17T14:26:51Z

A single timestamp can't be both duplicate and out of order.

@bboreham - What about something like this sequence? Obviously data integrity issues aside, but curious what Cortex error would come up with something like this:

cpu_usage{"label"="A"} 0.5 1623939771029
cpu_usage{"label"="A"} 0.7 1623939771090
cpu_usage{"label"="A"} 0.6 1623939771029

Would that third sample be: "out of order" or "duplicate timestamp with new value" ?

Likewise, if for some reason the same metric(s) got processed twice:

cpu_usage{"label"="A"} 0.5 1623939771029
cpu_usage{"label"="A"} 0.7 1623939771090
cpu_usage{"label"="A"} 0.5 1623939771029
cpu_usage{"label"="A"} 0.7 1623939771090

What would the third & fourth samples be?

bboreham · 2021-06-21T17:39:01Z

Would that third sample be: "out of order" or "duplicate timestamp with new value" ?

Out of order. It only keeps the last sample around for comparison; everything else is highly compressed and unavailable unless you run a query.

danfromtitan · 2021-07-28T15:00:23Z

Came acoss "err: out of order sample" in the ingester following a Cortex maintainance window. During this time Prometheus was denied access to Cortex with the expectation that after maintainance and within 2 hours, Prometheus will restore the missing values from its WAL.

It looks like for the 5 minutes after Prometheus stops sending, Cortex replicates the last know value in the time series, effectivelly filling in bogus data.

After Prometheus connectivity is restored, Cortex will fill in the later values that were missing, but the 5-mins auto-generated values remain and colide with the real values sent by Prometheus, thus the "err: out of order sample".

This is all illustrated in the image below:

I wonder if someone can shed some light on why Cortex maintains an interval of bogus values. If that is to approximate for possible missing values in the time series, why aren't the values coming from Prometheus allowed to overwrite after the fact.

bboreham · 2021-07-30T08:54:06Z

Filling in empty values for up to 5 minutes is standard PromQL - https://www.robustperception.io/staleness-and-promql
No "bogus data" is filled in the store; this is done at query time.

It is interesting to ponder why Prometheus did not supply values during that time; perhaps there are clues in your Prometheus logs.

(Please open a new issue if you want to continue the conversation)

danfromtitan · 2021-07-30T15:17:06Z

Thanks for the insights, you are right, the Cortex query now shows the increase pattern instead of a flat value, I guess there were cached query results lingering right after recovery that were leading me to believe otherwise.

jakubgs mentioned this issue Nov 4, 2020

How is the Query Frontend supposed to be configured? #3430

Closed

pingleig mentioned this issue Mar 12, 2021

Error "Permanent error: server returned HTTP status 400 Bad Request" aws-observability/aws-otel-collector#396

Closed

bboreham closed this as completed May 20, 2021

LouisStAmour mentioned this issue Jun 5, 2021

Loki exporter : Lot of out of order errors open-telemetry/opentelemetry-collector-contrib#3090

Closed

jakubgs mentioned this issue Aug 2, 2022

Ability to disable remote write retries prometheus/prometheus#11089

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are huge amounts of 'sample timestamp out of order' logs normal? #3411

Are huge amounts of 'sample timestamp out of order' logs normal? #3411

jakubgs commented Oct 28, 2020

pstibrany commented Nov 4, 2020

Questions

jakubgs commented Nov 4, 2020

pstibrany commented Nov 4, 2020

jakubgs commented Nov 4, 2020

pstibrany commented Nov 4, 2020

pstibrany commented Nov 4, 2020

jakubgs commented Nov 4, 2020

jakubgs commented Nov 4, 2020

jakubgs commented Nov 5, 2020

jakubgs commented Nov 5, 2020 •

edited

Loading

Akhalaka commented Dec 14, 2020

bboreham commented Jan 4, 2021

portalsacc commented Feb 9, 2021

pracucci commented Feb 10, 2021

portalsacc commented Feb 10, 2021

pracucci commented Feb 19, 2021

nehaev commented Mar 3, 2021

bboreham commented Mar 4, 2021

DandyDeveloper commented May 13, 2021 •

edited

Loading

bboreham commented May 13, 2021

bboreham commented May 20, 2021

disfluxly commented Jun 16, 2021

bboreham commented Jun 17, 2021

disfluxly commented Jun 17, 2021 •

edited

Loading

bboreham commented Jun 21, 2021

danfromtitan commented Jul 28, 2021

bboreham commented Jul 30, 2021

danfromtitan commented Jul 30, 2021

Are huge amounts of 'sample timestamp out of order' logs normal? #3411

Are huge amounts of 'sample timestamp out of order' logs normal? #3411

Comments

jakubgs commented Oct 28, 2020

Description

Questions

pstibrany commented Nov 4, 2020

Questions

jakubgs commented Nov 4, 2020

pstibrany commented Nov 4, 2020

jakubgs commented Nov 4, 2020

pstibrany commented Nov 4, 2020

pstibrany commented Nov 4, 2020

jakubgs commented Nov 4, 2020

jakubgs commented Nov 4, 2020

jakubgs commented Nov 5, 2020

jakubgs commented Nov 5, 2020 • edited Loading

Akhalaka commented Dec 14, 2020

bboreham commented Jan 4, 2021

portalsacc commented Feb 9, 2021

pracucci commented Feb 10, 2021

portalsacc commented Feb 10, 2021

pracucci commented Feb 19, 2021

nehaev commented Mar 3, 2021

bboreham commented Mar 4, 2021

DandyDeveloper commented May 13, 2021 • edited Loading

bboreham commented May 13, 2021

bboreham commented May 20, 2021

disfluxly commented Jun 16, 2021

bboreham commented Jun 17, 2021

disfluxly commented Jun 17, 2021 • edited Loading

bboreham commented Jun 21, 2021

danfromtitan commented Jul 28, 2021

bboreham commented Jul 30, 2021

danfromtitan commented Jul 30, 2021

jakubgs commented Nov 5, 2020 •

edited

Loading

DandyDeveloper commented May 13, 2021 •

edited

Loading

disfluxly commented Jun 17, 2021 •

edited

Loading