Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail: Drop stage #2496

Merged
merged 5 commits into from
Aug 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/sources/clients/promtail/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,3 +219,4 @@ Action stages:
Filtering stages:

* [match](../stages/match/): Conditionally run stages based on the label set.
* [drop](../stages/drop/): Conditionally drop log lines based on several options.
1 change: 1 addition & 0 deletions docs/sources/clients/promtail/stages/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ Action stages:
Filtering stages:

* [match](match/): Conditionally run stages based on the label set.
* [drop](drop/): Conditionally drop log lines based on several options.

193 changes: 193 additions & 0 deletions docs/sources/clients/promtail/stages/drop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
title: drop
---
# `drop` stage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a verb here? Or maybe change this to "Filter or drop logs"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now I would like to leave this as is, which is consistent with how we title all our other stages.


The `drop` stage is a filtering stage that lets you drop logs based on several options.

It's important to note that if you provide multiple options they will be treated like an AND clause,
where each option has to be true to drop the log.

If you wish to drop with an OR clause, then specify multiple drop stages.

There are examples below to help explain.

## Drop stage schema

```yaml
drop:
# Name from extracted data to parse. If empty, uses the log message.
[source: <string>]

# RE2 regular expression, if source is provided the regex will attempt to match the source
# If no source is provided, then the regex attempts to match the log line
# If the provided regex matches the log line or a provided source, the line will be dropped.
[expression: <string>]

# value can only be specified when source is specified. It is an error to specify value and regex.
# If the value provided is an exact match for the `source` the line will be dropped.
[value: <string>]

# older_than will be parsed as a Go duration: https://golang.org/pkg/time/#ParseDuration
# If the log line timestamp is older than the current time minus the provided duration it will be dropped.
[older_than: <duration>]

# longer_than is a value in bytes, any log line longer than this value will be dropped.
# Can be specified as an exact number of bytes in integer format: 8192
# Or can be expressed with a suffix such as 8kb
[longer_than: <string>|<int>]

# Every time a log line is dropped the metric `logentry_dropped_lines_total`
# will be incremented. By default the reason label will be `drop_stage`
# however you can optionally specify a custom value to be used in the `reason`
# label of that metric here.
[drop_counter_reason: <string> | default = "drop_stage"]
```

## Examples
slim-bean marked this conversation as resolved.
Show resolved Hide resolved

The following are examples showing the use of the `drop` stage.

### Simple drops

Simple `drop` stage configurations only specify one of the options, or two options when using the `source` option.

#### Regex match a line

Given the pipeline:

```yaml
- drop:
expression: ".*debug.*"
```

Would drop any log line with the word `debug` in it.

#### Regex match a source

Given the pipeline:

```yaml
- json:
expressions:
level:
msg:
- drop:
source: "level"
expression: "(error|ERROR)"
```

Would drop both of these log lines:

```
{"time":"2019-01-01T01:00:00.000000001Z", "level": "error", "msg":"11.11.11.11 - "POST /loki/api/push/ HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"}
{"time":"2019-01-01T01:00:00.000000001Z", "level": "ERROR", "msg":"11.11.11.11 - "POST /loki/api/push/ HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"}
```

#### Value match a source

Given the pipeline:

```yaml
- json:
expressions:
level:
msg:
- drop:
source: "level"
value: "error"
```

Would drop this log line:

```
{"time":"2019-01-01T01:00:00.000000001Z", "level": "error", "msg":"11.11.11.11 - "POST /loki/api/push/ HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"}
```

#### Drop old log lines

**NOTE** For `older_than` to work, you must be using the [timestamp](timestamp.md) stage to set the timestamp from the ingested log line _before_ applying the `drop` stage.

Given the pipeline:

```yaml
- json:
expressions:
time:
msg:
- timestamp:
source: time
format: RFC3339
- drop:
older_than: 24h
drop_counter_reason: "line_too_old"
```

With a current ingestion time of 2020-08-12T12:00:00Z would drop this log line when read from a file:

```
{"time":"2020-08-11T11:00:00Z", "level": "error", "msg":"11.11.11.11 - "POST /loki/api/push/ HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"}
```

However it would _not_ drop this log line:

```
{"time":"2020-08-11T13:00:00Z", "level": "error", "msg":"11.11.11.11 - "POST /loki/api/push/ HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"}
```

In this example the current time is 2020-08-12T12:00:00Z and `older_than` is 24h. All log lines which have a timestamp older than 2020-08-11T12:00:00Z will be dropped.

All lines dropped by this drop stage would also increment the `logentry_drop_lines_total` metric with a label `reason="line_too_old"`

#### Dropping long log lines

Given the pipeline:

```yaml
- drop:
longer_than: 8kb
drop_counter_reason: "line_too_long"
```

Would drop any log line longer than 8kb bytes, this is useful when Loki would reject a line for being too long.

All lines dropped by this drop stage would also increment the `logentry_drop_lines_total` metric with a label `reason="line_too_long"`

### Complex drops

Complex `drop` stage configurations specify multiple options in one stage or specify multiple drop stages

#### Drop logs by regex AND length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Drop logs by regex AND length
#### Drop logs by regex and length


Given the pipeline:

```yaml
- drop:
expression: ".*debug.*"
longer_than: 1kb
```

Would drop all logs that contain the word _debug_ *AND* are longer than 1kb bytes

#### Drop logs by time OR length OR regex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Drop logs by time OR length OR regex
#### Drop logs by time or length or regex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use capitalization for emphasis. Per the Documentation style guide, use italics for emphasis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the capitalization is less for emphasis and more to indicate that it's a binary operation, where say in a programming language you might write if x > 1 && x < 5 it's common to express this as if x >1 AND x < 5 mostly to disambiguate the logical operation from the actual word, (AND from and)


Given the pipeline:

```yaml
- json:
expressions:
time:
msg:
- timestamp:
source: time
format: RFC3339
- drop:
older_than: 24h
- drop:
longer_than: 8kb
- drop:
source: msg
regex: ".*trace.*"
```

Would drop all logs older than 24h OR longer than 8kb bytes OR have a json `msg` field containing the word _trace_
10 changes: 9 additions & 1 deletion docs/sources/clients/promtail/stages/match.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ match:
# and no later metrics will be recorded.
# Stages must be not defined when dropping entries.
[action: <string> | default = "keep"]

# If you specify `action: drop` the metric `logentry_dropped_lines_total`
# will be incremented for every line dropped. By default the reason
# label will be `match_stage` however you can optionally specify a custom value
# to be used in the `reason` label of that metric here.
[drop_counter_reason: <string> | default = "match_stage"]

# Nested set of pipeline stages only if the selector
# matches the labels of the log entries:
Expand Down Expand Up @@ -72,6 +78,7 @@ pipeline_stages:
- match:
selector: '{app="promtail"} |~ ".*noisy error.*"'
action: drop
drop_counter_reason: promtail_noisy_error
- output:
source: msg
```
Expand All @@ -97,7 +104,8 @@ label of `app` whose value is `pokey`. This does **not** match in our case, so
the nested `json` stage is not ran.

The fifth stage will drop any entries from the application `promtail` that matches
the regex `.*noisy error`.
the regex `.*noisy error`. and will also increment the `logentry_drop_lines_total`
metric with a label `reason="promtail_noisy_error"`

The final `output` stage changes the contents of the log line to be the value of
`msg` from the extracted map. In this case, the log line is changed to `app1 log
Expand Down
Loading