Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat] Add Syslog parser and processor #30541

Merged
merged 13 commits into from
Mar 21, 2022
Merged
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...main[Check the HEAD dif
- Add support for kafka message headers. {pull}29940[29940]
- Add FIPS configuration option for all AWS API calls. {pull}[28899]
- Add support for non-unique Kafka headers for output messages. {pull}30369[30369]
- Add syslog parser and processor. {issue}30139[30139] {pull}30541[30541]
- Add action_input_type for the .fleet-actions-results {pull}30562[30562]

*Auditbeat*
Expand Down
61 changes: 61 additions & 0 deletions filebeat/docs/inputs/input-filestream-reader-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ Available parsers:
* `multiline`
* `ndjson`
* `container`
* `syslog`

In this example, {beatname_uc} is reading multiline messages that consist of 3 lines
and are encapsulated in single-line JSON objects.
Expand Down Expand Up @@ -258,3 +259,63 @@ all containers under the default Kubernetes logs path:
- container:
stream: stdout
----

[float]
===== `syslog`

The `syslog` parser parses RFC 3146 and/or RFC 5424 formatted syslog messages.

The supported configuration options are:

*`format`*:: (Optional) The syslog format to use, `rfc3164`, or `rfc5424`. To automatically
detect the format from the log entries, set this option to `auto`. The default is `auto`.

*`timezone`*:: (Optional) IANA time zone name(e.g. `America/New York`) or a
fixed time offset (e.g. +0200) to use when parsing syslog timestamps that do not contain
a time zone. `Local` may be specified to use the machine's local time zone. Defaults to `Local`.

*`log_errors`*:: (Optional) If `true` the parser will log syslog parsing errors. Defaults to `false`.

*`add_error_key`*:: (Optional) If this setting is enabled, the parser adds or appends to an
`error.message` key with the parsing error that was encountered. Defaults to `true`.

Example configuration:

[source,yaml]
-------------------------------------------------------------------------------
- syslog:
format: rfc3164
timezone: America/Chicago
log_errors: true
add_error_key: true
-------------------------------------------------------------------------------

*Timestamps*

The RFC 3164 format accepts the following forms of timestamps:

* Local timestamp (`Mmm dd hh:mm:ss`):
** `Jan 23 14:09:01`
* RFC-3339*:
** `2003-10-11T22:14:15Z`
** `2003-10-11T22:14:15.123456Z`
** `2003-10-11T22:14:15-06:00`
** `2003-10-11T22:14:15.123456-06:00`

*Note*: The local timestamp (for example, `Jan 23 14:09:01`) that accompanies an
RFC 3164 message lacks year and time zone information. The time zone will be enriched
using the `timezone` configuration option, and the year will be enriched using the
{beatname_uc} system's local time (accounting for time zones). Because of this, it is possible
for messages to appear in the future. An example of when this might happen is logs
generated on December 31 2021 are ingested on January 1 2022. The logs would be enriched
with the year 2022 instead of 2021.

The RFC 5424 format accepts the following forms of timestamps:

* RFC-3339:
** `2003-10-11T22:14:15Z`
** `2003-10-11T22:14:15.123456Z`
** `2003-10-11T22:14:15-06:00`
** `2003-10-11T22:14:15.123456-06:00`

Formats with an asterisk (*) are a non-standard allowance.
1 change: 1 addition & 0 deletions libbeat/cmd/instance/imports_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ import (
_ "github.com/elastic/beats/v7/libbeat/processors/ratelimit"
_ "github.com/elastic/beats/v7/libbeat/processors/registered_domain"
_ "github.com/elastic/beats/v7/libbeat/processors/script"
_ "github.com/elastic/beats/v7/libbeat/processors/syslog"
_ "github.com/elastic/beats/v7/libbeat/processors/translate_sid"
_ "github.com/elastic/beats/v7/libbeat/processors/urldecode"
_ "github.com/elastic/beats/v7/libbeat/publisher/includes" // Register publisher pipeline modules
Expand Down
154 changes: 154 additions & 0 deletions libbeat/processors/syslog/docs/syslog.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
[[syslog]]
=== Syslog

++++
<titleabbrev>syslog</titleabbrev>
++++

experimental[]

[float]
==== Configuration

The `syslog` processor parses RFC 3146 and/or RFC 5424 formatted syslog messages
that are stored under the `field` key.

The supported configuration options are:

`field`:: (Required) Source field containing the syslog message. Defaults to `message`.

`format`:: (Optional) The syslog format to use, `rfc3164`, or `rfc5424`. To automatically
detect the format from the log entries, set this option to `auto`. The default is `auto`.

`timezone`:: (Optional) IANA time zone name(e.g. `America/New York`) or a
fixed time offset (e.g. +0200) to use when parsing syslog timestamps that do not contain
a time zone. `Local` may be specified to use the machine's local time zone. Defaults to `Local`.

`overwrite_keys`:: (Optional) A boolean that specifies whether keys that already
exist in the event are overwritten by keys from the syslog message. The
default value is `true`.

`ignore_missing`:: (Optional) If `true` the processor will not return an error
when a specified field does not exist. Defaults to `false`.

`ignore_failure`:: (Optional) Ignore all errors produced by the processor.
Defaults to `false`.

taylor-swanson marked this conversation as resolved.
Show resolved Hide resolved
`tag`:: (Optional) An identifier for this processor. Useful for debugging.

Example:

[source,yaml]
-------------------------------------------------------------------------------
processors:
- syslog:
field: message
-------------------------------------------------------------------------------

[source,json]
-------------------------------------------------------------------------------
{
"message": "<165>1 2022-01-11T22:14:15.003Z mymachine.example.com eventslog 1024 ID47 [exampleSDID@32473 iut=\"3\" eventSource=\"Application\" eventID=\"1011\"][examplePriority@32473 class=\"high\"] this is the message"
}
-------------------------------------------------------------------------------

Will produce the following output:

[source,json]
-------------------------------------------------------------------------------
{
"@timestamp": "2022-01-11T22:14:15.003Z",
"log": {
"syslog": {
"priority": 165,
"facility": {
"code": 20,
"name": "local4"
},
"severity": {
"code": 5,
"name": "Notice"
},
"hostname": "mymachine.example.com",
"appname": "eventslog",
"procid": "1024",
"msgid": "ID47",
"version": 1,
"structured_data": {
"exampleSDID@32473": {
"iut": "3",
"eventSource": "Application",
"eventID": "1011"
},
"examplePriority@32473": {
"class": "high"
}
}
}
},
taylor-swanson marked this conversation as resolved.
Show resolved Hide resolved
"message": "this is the message"
}
-------------------------------------------------------------------------------

[float]
==== Timestamps

The RFC 3164 format accepts the following forms of timestamps:

* Local timestamp (`Mmm dd hh:mm:ss`):
** `Jan 23 14:09:01`
* RFC-3339*:
** `2003-10-11T22:14:15Z`
** `2003-10-11T22:14:15.123456Z`
** `2003-10-11T22:14:15-06:00`
** `2003-10-11T22:14:15.123456-06:00`

*Note*: The local timestamp (for example, `Jan 23 14:09:01`) that accompanies an
RFC 3164 message lacks year and time zone information. The time zone will be enriched
using the `timezone` configuration option, and the year will be enriched using the
{beatname_uc} system's local time (accounting for time zones). Because of this, it is possible
for messages to appear in the future. An example of when this might happen is logs
generated on December 31 2021 are ingested on January 1 2022. The logs would be enriched
with the year 2022 instead of 2021.

The RFC 5424 format accepts the following forms of timestamps:

* RFC-3339:
** `2003-10-11T22:14:15Z`
** `2003-10-11T22:14:15.123456Z`
** `2003-10-11T22:14:15-06:00`
** `2003-10-11T22:14:15.123456-06:00`

Formats with an asterisk (*) are a non-standard allowance.

[float]
==== Metrics

Internal metrics are available to assist with debugging efforts. The metrics
are served from the metrics HTTP endpoint (for example: `http://localhost:5066/stats`)
taylor-swanson marked this conversation as resolved.
Show resolved Hide resolved
and are found under `processor.syslog.[instance ID]` or `processor.syslog.[tag]-[instance ID]`
if a *tag* is provided. See <<http-endpoint>> for more information on configuration the
metrics HTTP endpoint.

For example, here are metrics from a processor with a *tag* of `log-input` and an *instance ID* of `1`:

[source,json]
-------------------------------------------------------------------------------
{
"processor": {
"syslog": {
"log-input-1": {
"failure": 10,
"missing": 0,
"success": 3
}
}
}
}
-------------------------------------------------------------------------------

`failure`:: Measures the number of occurrences where a message was unable to be parsed.

`missing`:: Measures the number of occurrences where an event was missing the required input field.

`success`:: Measures the number of successfully parsed syslog messages.
Loading