grafana · owen-d · Aug 31, 2021 · Aug 30, 2021 · Aug 31, 2021
@@ -534,13 +534,7 @@ JSON post body can be sent in the following format:
 
 You can set `Content-Encoding: gzip` request header and post gzipped JSON.
 
-> **NOTE**: logs sent to Loki for every stream must be in timestamp-ascending
-> order; logs with identical timestamps are only allowed if their content
-> differs. If a log line is received with a timestamp older than the most
-> recent received log, it is rejected with an out of order error. If a log
-> is received with the same timestamp and content as the most recent log, it is
-> silently ignored. For more details on the ordering rules, refer to the
-> [Loki Overview docs](../overview#timestamp-ordering).
+Loki can be configured to [accept out-of-order writes](../configuration/#accept-out-of-order-writes).
 
 In microservices mode, `/loki/api/v1/push` is exposed by the distributor.
 
@@ -772,10 +766,7 @@ JSON post body can be sent in the following format:
 }
 ```
 
-> **NOTE**: logs sent to Loki for every stream must be in timestamp-ascending
-> order, meaning each log line must be more recent than the one last received.
-> If logs do not follow this order, Loki will reject the log with an out of
-> order error.
+Loki can be configured to [accept out-of-order writes](../configuration/#accept-out-of-order-writes).
 
 In microservices mode, `/api/prom/push` is exposed by the distributor.
 

@@ -161,13 +161,15 @@ deduplicated.
 
 #### Timestamp Ordering
 
-The ingester validates that ingested log lines are not out of order. When an
+Loki can be configured to [accept out-of-order writes](../../configuration/#accept-out-of-order-writes).
+
+When not configured to accept out-of-order writes, the ingester validates that ingested log lines are in order. When an
 ingester receives a log line that doesn't follow the expected order, the line
 is rejected and an error is returned to the user. 
 
-The ingester validates that ingested log lines are received in
-timestamp-ascending order (i.e., each log has a timestamp that occurs at a later
-time than the log before it). When the ingester receives a log that does not
+The ingester validates that log lines are received in
+timestamp-ascending order. Each log has a timestamp that occurs at a later
+time than the log before it. When the ingester receives a log that does not
 follow this order, the log line is rejected and an error is returned.
 
 Logs from each unique set of labels are built up into "chunks" in memory and
@@ -176,7 +178,8 @@ then flushed to the backing storage backend.
 If an ingester process crashes or exits abruptly, all the data that has not yet
 been flushed could be lost. Loki is usually configured with a [Write Ahead Log](../operations/storage/wal) which can be _replayed_ on restart as well as with a `replication_factor` (usually 3) of each log to mitigate this risk.
 
-In general, all lines pushed to Loki for a given stream (unique combination of
+When not configured to accept out-of-order writes,
+all lines pushed to Loki for a given stream (unique combination of
 labels) must have a newer timestamp than the line received before it. There are,
 however, two cases for handling logs for the same stream with identical
 nanosecond timestamps:

@@ -67,7 +67,10 @@ the log line as a key=value pair you could write a query like this: `{logGroup="
 
 Loki can cache data at many levels, which can drastically improve performance. Details of this will be in a future post.
 
-## Logs must be in increasing time order per stream
+## Time ordering of logs
+
+Loki can be configured to [accept out-of-order writes](../configuration/#accept-out-of-order-writes).
+This section identifies best practices when Loki is _not_ configured to accept out-of-order writes.
 
 One issue many people have with Loki is their client receiving errors for out of order log entries.  This happens because of this hard and fast rule within Loki:
 
@@ -100,8 +103,6 @@ What can we do about this? What if this was because the sources of these logs we
 
 But what if the application itself generated logs that were out of order? Well, I'm afraid this is a problem. If you are extracting the timestamp from the log line with something like [the Promtail pipeline stage](https://grafana.com/docs/loki/latest/clients/promtail/stages/timestamp/), you could instead _not_ do this and let Promtail assign a timestamp to the log lines. Or you can hopefully fix it in the application itself.
 
-But I want Loki to fix this! Why can’t you buffer streams and re-order them for me?! To be honest, because this would add a lot of memory overhead and complication to Loki, and as has been a common thread in this post, we want Loki to be simple and cost-effective. Ideally we would want to improve our clients to do some basic buffering and sorting as this seems a better place to solve this problem.
-
 It's also worth noting that the batching nature of the Loki push API can lead to some instances of out of order errors being received which are really false positives. (Perhaps a batch partially succeeded and was present; or anything that previously succeeded would return an out of order entry; or anything new would be accepted.)
 
 ## Use `chunk_target_size`

@@ -142,25 +142,27 @@ If you don't want the `kubernetes` and `HOSTNAME` fields to appear in the log li
 
 ### Buffering
 
-Buffering refers to the ability to store the records somewhere, and while they are processed and delivered, still be able to store more. Loki output plugin in certain situation can be blocked by loki client because of its design:
+Buffering refers to the ability to store the records somewhere, and while they are processed and delivered, still be able to store more. The Loki output plugin can be blocked by the Loki client because of its design:
 
-- BatchSize is over limit, output plugin pause receiving new records until the pending batch is successfully sent to the server
-- Loki server is unreachable (retry 429s, 500s and connection-level errors), output plugin blocks new records until loki server will be available again and the pending batch is successfully sent to the server or as long as the maximum number of attempts has been reached within configured back-off mechanism
+- If the BatchSize is over the limit, the output plugin pauses receiving new records until the pending batch is successfully sent to the server
+- If the Loki server is unreachable (retry 429s, 500s and connection-level errors), the output plugin blocks new records until the Loki server is available again, and the pending batch is successfully sent to the server or as long as the maximum number of attempts has been reached within configured back-off mechanism
 
-The blocking state with some of the input plugins is not acceptable because it can have a undesirable side effects on the part that generates the logs. Fluent Bit implements buffering mechanism that is based on parallel processing and it cannot send logs in order which is loki requirement (loki logs must be in increasing time order per stream).
+The blocking state with some of the input plugins is not acceptable, because it can have an undesirable side effect on the part that generates the logs. Fluent Bit implements a buffering mechanism that is based on parallel processing. Therefore, it cannot send logs in order. There are two ways of handling the out-of-order logs: 
 
-Loki output plugin has buffering mechanism based on [`dque`](https://github.com/joncrlsn/dque) which is compatible with loki server strict time ordering and can be set up by configuration flag:
+- Configure Loki to [accept out-of-order writes](../../configuration/#accept-out-of-order-writes).
 
-```properties
-[Output]
-    Name grafana-loki
-    Match *
-    Url http://localhost:3100/loki/api/v1/push
-    Buffer true
-    DqueSegmentSize 8096
-    DqueDir /tmp/flb-storage/buffer
-    DqueName loki.0
-```
+- Configure the Loki output plugin to use the buffering mechanism based on [`dque`](https://github.com/joncrlsn/dque), which is compatible with the Loki server strict time ordering:
+
+    ```properties
+    [Output]
+        Name grafana-loki
+        Match *
+        Url http://localhost:3100/loki/api/v1/push
+        Buffer true
+        DqueSegmentSize 8096
+        DqueDir /tmp/flb-storage/buffer
+        DqueName loki.0
+    ```
 
 ### Configuration examples
 

@@ -145,7 +145,7 @@ Use with the `remove_keys kubernetes` option to eliminate metadata from the log.
 
 ### Multi-worker usage
 
-Loki doesn't currently support out-of-order inserts - if you try to insert a log entry an earlier timestamp after a log entry with identical labels but a later timestamp, the insert will fail with `HTTP status code: 500, message: rpc error: code = Unknown desc = Entry out of order`. Therefore, in order to use this plugin in a multi worker Fluentd setup, you'll need to include the worker ID in the labels or otherwise [ensure log streams are always sent to the same worker](https://docs.fluentd.org/deployment/multi-process-workers#less-than-worker-n-greater-than-directive).
+Out-of-order inserts may be configured for Loki; refer to [accept out-of-order writes](../../configuration/#accept-out-of-order-writes). If out-of-order inserts are not configured, attempting to insert a log entry with an earlier timestamp after a log entry with identical labels but a later timestamp, the insert will fail with `HTTP status code: 500, message: rpc error: code = Unknown desc = Entry out of order`. Therefore, in order to use this plugin in a multi worker Fluentd setup, you'll need to include the worker ID in the labels or otherwise [ensure log streams are always sent to the same worker](https://docs.fluentd.org/deployment/multi-process-workers#less-than-worker-n-greater-than-directive).
 
 For example, using [fluent-plugin-record-modifier](https://github.com/repeatedly/fluent-plugin-record-modifier):
 

@@ -16,9 +16,9 @@ Ephemeral jobs can quite easily run afoul of cardinality best practices. During
 Instead we can pipeline Cloudwatch logs to a set of Promtails, which can mitigate these problem in two ways:
 
 1) Using Promtail's push api along with the `use_incoming_timestamp: false` config, we let Promtail determine the timestamp based on when it ingests the logs, not the timestamp assigned by cloudwatch. Obviously, this means that we lose the origin timestamp because Promtail now assigns it, but this is a relatively small difference in a real time ingestion system like this.
-2) In conjunction with (1), Promtail can coalesce logs across  Cloudwatch log streams because it's no longer susceptible to `out-of-order` errors when combining multiple sources (lambda invocations).
+2) In conjunction with (1), Promtail can coalesce logs across  Cloudwatch log streams because it's no longer susceptible to out-of-order errors when combining multiple sources (lambda invocations).
 
-One important aspect to keep in mind when running with a set of Promtails behind a load balancer is that we're effectively moving the cardinality problems from the `number_of_log_streams` -> `number_of_promtails`. You'll need to assign a Promtail specific label on each Promtail so that you don't run into `out-of-order` errors when the Promtails send data for the same log groups to Loki. This can easily be done via a config like `--client.external-labels=promtail=${HOSTNAME}` passed to Promtail.
+One important aspect to keep in mind when running with a set of Promtails behind a load balancer is that we're effectively moving the cardinality problems from the  number of log streams -> number of Promtails. If you have not configured Loki to [accept out-of-order writes](../../configuration#accept-out-of-order-writes), you'll need to assign a Promtail-specific label on each Promtail so that you don't run into out-of-order errors when the Promtails send data for the same log groups to Loki. This can easily be done via a configuration like `--client.external-labels=promtail=${HOSTNAME}` passed to Promtail.
 
 ### Proof of concept Loki deployments
 

@@ -43,7 +43,7 @@ There are a few instances where this might be helpful:
 
 - complex network infrastructures where many machines having egress is not desirable.
 - using the Docker Logging Driver and wanting to provide a complex pipeline or to extract metrics from logs.
-- serverless setups where many ephemeral log sources want to send to Loki, sending to a Promtail instance with `use_incoming_timestamp` == false can avoid out of order errors and avoid having to use high cardinality labels.
+- serverless setups where many ephemeral log sources want to send to Loki, sending to a Promtail instance with `use_incoming_timestamp` == false can avoid out-of-order errors and avoid having to use high cardinality labels.
 
 ## Receiving logs From Syslog
 

@@ -212,8 +212,8 @@ It also support `relabeling` and `pipeline` stages just like other targets.
 When Promtail receives GCP logs the labels that are set on the GCP resources are available as internal labels. Like in the example above, the `__project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`. See [Relabeling](#relabeling) for more information.
 
 Log entries scraped by `gcplog` will add an additional label called `promtail_instance`. This label uniquely identifies each Promtail instance trying to scrape gcplog (from a single `subscription_id`).
-We need this unique identifier to avoid out-of-order errors from Loki servers.
-Because say two Promtail instances rewrite timestamp of log entries(with same labelset) at the same time may reach Loki servers at different times can cause Loki servers to reject it.
+We need this unique identifier to avoid out-of-order errors from Loki servers when Loki is not configured to [accept out-of-order writes](../../../configuration/#accept-out-of-order-writes).
+If two Promtail instances rewrite the timestamp of log entries (with same labelset) at the same time, the log entries may reach Loki servers at different times. This can cause Loki servers to reject the out-of-order log entry.
 
 ## Syslog Receiver
 

@@ -31,7 +31,7 @@ pack:
     - [<string>]
 
   # If the resulting log line should use any existing timestamp or use time.Now() when the line was processed.
-  # To avoid out of order issues with Loki, when combining several log streams (separate source files) into one
+  # To avoid out-of-order issues with Loki, when combining several log streams (separate source files) into one
   # you will want to set a new timestamp on the log line, `ingest_timestamp: true`
   # If you are not combining multiple source files or you know your log lines won't have interlaced timestamps
   # you can set this value to false.

@@ -185,9 +185,9 @@ from there. This means that if new log entries have been read and pushed to the
 ingester between the last sync period and the crash, these log entries will be
 sent again to the ingester on Promtail restart.
 
-However, it's important to note that Loki will reject all log lines received in
-what it perceives is [out of
-order](../../../overview#timestamp-ordering). If Promtail happens to
+If Loki is not configured to [accept out-of-order writes](../../../configuration/#accept-out-of-order-writes), Loki will reject all log lines received in
+what it perceives is out of
+order. If Promtail happens to
 crash, it may re-send log lines that were sent prior to the crash. The default
 behavior of Promtail is to assign a timestamp to logs at the time it read the
 entry from the tailed file. This would result in duplicate log lines being sent

@@ -1826,7 +1826,7 @@ logs in Loki.
 # CLI flag: -ingester.max-global-streams-per-user
 [max_global_streams_per_user: <int> | default = 0]
 
-# When true, out of order writes are accepted.
+# When true, out-of-order writes are accepted.
 # CLI flag: -ingester.unordered-writes
 [unordered_writes: <bool> | default = false]
 
@@ -2125,3 +2125,54 @@ multi_kv_config:
     primary: consul
 ```
 ### Generic placeholders
+
+## Accept out-of-order writes
+
+Since the beginning of Loki, log entries had to be written to Loki in order
+by time.
+This limitation has been lifted.
+Out-of-order writes may be enabled globally for a Loki cluster
+or enabled on a per-tenant basis.
+
+- To enable out-of-order writes for all tenants,
+place in the `limits_config` section:
+
+    ```
+    limits_config:
+        unordered_writes: true
+    ```
+
+- To enable out-of-order writes for specific tenants,
+configure a runtime configuration file:
+
+    ```
+    runtime_config: overrides.yaml
+    ```
+
+    In the `overrides.yaml` file, add `unordered_writes` for each tenant
+    permitted to have out-of-order writes:
+
+    ```
+    overrides:
+      "tenantA":
+        unordered_writes: true
+    ```
+
+How far into the past accepted out-of-order log entries may be
+is configurable with `max_chunk_age`.
+`max_chunk_age` defaults to 1 hour.
+Loki calculates the earliest time that out-of-order entries may have
+and be accepted with 
+
+```
+time_of_most_recent_line - (max_chunk_age/2)
+```
+
+Log entries with timestamps that are after this earliest time are accepted.
+Log entries further back in time return an out-of-order error.
+
+For example, if `max_chunk_age` is 2 hours
+and the stream `{foo="bar"}` has one entry at `8:00`,
+Loki will accept data for that stream as far back in time as `7:00`.
+If another log line is written at `10:00`,
+Loki will accept data for that stream as far back in time as `9:00`.
@@ -72,13 +72,13 @@ means that after 4 hours of running the canary will have a list of 16 entries
 it will query every minute (default `spot-check-query-rate` interval is 1m),
 so be aware of the query load this can put on Loki if you have a lot of canaries.
 
-__NOTE:__ if you are using `out-of-order-percentage` to test ingestion of out of order
+__NOTE:__ if you are using `out-of-order-percentage` to test ingestion of out-of-order
 log lines be sure not to set the two out of order time range flags too far in the past.
 The defaults are already enough to test this functionality properly, and setting them
 too far in the past can cause issues with the spot check test.
 
-When using `out-of-order-percentage` you also need to make use of pipeline stages
-in your promtail config in order to set the timestamps correctly as the logs are pushed
+When using `out-of-order-percentage`, you also need to make use of pipeline stages
+in your Promtail configuration in order to set the timestamps correctly as the logs are pushed
 to Loki. The `client/promtail/pipelines` docs have examples of how to do this.
 
 #### Metric Test
@@ -310,9 +310,9 @@ All options:
   -metric-test-range duration
         The range value [24h] used in the metric test instant-query. Note: this value is truncated to the running time of the canary until this value is reached (default 24h0m0s)
   -out-of-order-max duration
-    	  Maximum amount of time to go back for out of order entries (in seconds). (default 1m0s)
+    	  Maximum amount of time to go back for out-of-order entries (in seconds). (default 1m0s)
   -out-of-order-min duration
-    	  Minimum amount of time to go back for out of order entries (in seconds). (default 30s)
+    	  Minimum amount of time to go back for out-of-order entries (in seconds). (default 30s)
   -out-of-order-percentage int
       	Percentage (0-100) of log entries that should be sent out of order.
   -pass string