Skip to content

Commit

Permalink
Add support for postgresql csv logs (#23334)
Browse files Browse the repository at this point in the history
Refactor PostgreSQL module to support logs in CSV format.

The fileset has three pipelines now, first one is executed
to parse the initial part of the lines, till it decides if the logs
are plain text or CSV, once decided it invokes one of the
other two pipelines, one specific for plain text logs and
the other for CSV.

Several test cases and documentation are added for CSV
support.

Additional fields are available now, and some others have
been renamed to represent more accurately the values
they store.

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
  • Loading branch information
azlev and jsoriano authored Feb 17, 2021
1 parent fb25ded commit bb78931
Show file tree
Hide file tree
Showing 34 changed files with 3,086 additions and 64 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d

*Filebeat*

- Add fileset to ingest PostgreSQL CSV logs. {pull}23334[23334]
- Fix parsing of Elasticsearch node name by `elasticsearch/slowlog` fileset. {pull}14547[14547]
- Improve ECS field mappings in panw module. event.outcome now only contains success/failure per ECS specification. {issue}16025[16025] {pull}17910[17910]
- Improve ECS categorization field mappings for nginx module. http.request.referrer only populated when nginx sets a value {issue}16174[16174] {pull}17844[17844]
Expand Down
195 changes: 184 additions & 11 deletions filebeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -112559,7 +112559,52 @@ The timestamp from the log line.
*`postgresql.log.core_id`*::
+
--
Core id

deprecated:[8.0.0]

Core id. (deprecated, there is no core_id in PostgreSQL logs, this is actually session_line_number).


type: alias

alias to: postgresql.log.session_line_number

--

*`postgresql.log.client_addr`*::
+
--
Host where the connection originated from.


example: 127.0.0.1

--

*`postgresql.log.client_port`*::
+
--
Port where the connection originated from.


example: 59700

--

*`postgresql.log.session_id`*::
+
--
PostgreSQL session.


example: 5ff1dd98.22

--

*`postgresql.log.session_line_number`*::
+
--
Line number inside a session. (%l in `log_line_prefix`).


type: long
Expand All @@ -112569,17 +112614,17 @@ type: long
*`postgresql.log.database`*::
+
--
Name of database
Name of database.


example: mydb
example: postgres

--

*`postgresql.log.query`*::
+
--
Query statement.
Query statement. In the case of CSV parse, look at command_tag to get more context.


example: SELECT * FROM users;
Expand All @@ -112589,7 +112634,7 @@ example: SELECT * FROM users;
*`postgresql.log.query_step`*::
+
--
Statement step when using extended query protocol (one of statement, parse, bind or execute)
Statement step when using extended query protocol (one of statement, parse, bind or execute).


example: parse
Expand All @@ -112606,30 +112651,153 @@ example: pdo_stmt_00000001

--

*`postgresql.log.error.code`*::
*`postgresql.log.command_tag`*::
+
--
Type of session's current command. The complete list can be found at: src/include/tcop/cmdtaglist.h


example: SELECT

--

*`postgresql.log.session_start_time`*::
+
--
Time when this session started.


type: date

--

*`postgresql.log.virtual_transaction_id`*::
+
--
Backend local transaction id.


--

*`postgresql.log.transaction_id`*::
+
--
Error code returned by Postgres (if any)
The id of current transaction.


type: long

--

*`postgresql.log.timezone`*::
*`postgresql.log.sql_state_code`*::
+
--
State code returned by Postgres (if any). See also https://www.postgresql.org/docs/current/errcodes-appendix.html


type: keyword

--

*`postgresql.log.detail`*::
+
--
More information about the message, parameters in case of a parametrized query. e.g. 'Role \"user\" does not exist.', 'parameters: $1 = 42', etc.


--

*`postgresql.log.hint`*::
+
--
A possible solution to solve an error.


--

*`postgresql.log.internal_query`*::
+
--
Internal query that led to the error (if any).


--

*`postgresql.log.internal_query_pos`*::
+
--
Character count of the internal query (if any).


type: long

--

*`postgresql.log.context`*::
+
--
Error context.


--

*`postgresql.log.query_pos`*::
+
--
Character count of the error position (if any).


type: long

--

*`postgresql.log.location`*::
+
--
Location of the error in the PostgreSQL source code (if log_error_verbosity is set to verbose).


--

*`postgresql.log.application_name`*::
+
--
Name of the application of this event. It is defined by the client.


--

*`postgresql.log.backend_type`*::
+
--
Type of backend of this event. Possible types are autovacuum launcher, autovacuum worker, logical replication launcher, logical replication worker, parallel worker, background writer, client backend, checkpointer, startup, walreceiver, walsender and walwriter. In addition, background workers registered by extensions may have additional types.


example: client backend

--

*`postgresql.log.error.code`*::
+
--

deprecated:[8.0.0]

Error code returned by Postgres (if any). Deprecated: errors can have letters. Use sql_state_code instead.


type: alias

alias to: event.timezone
alias to: postgresql.log.sql_state_code

--

*`postgresql.log.thread_id`*::
*`postgresql.log.timezone`*::
+
--
type: alias

alias to: process.pid
alias to: event.timezone

--

Expand All @@ -112645,8 +112813,13 @@ alias to: user.name
*`postgresql.log.level`*::
+
--
Valid values are DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC.


type: alias

example: LOG

alias to: log.level

--
Expand Down
34 changes: 32 additions & 2 deletions filebeat/docs/modules/postgresql.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,13 @@ include::../include/gs-link.asciidoc[]
[float]
=== Compatibility

The +{modulename}+ module was tested with logs from versions 9.5 on Ubuntu, 9.6
on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3.
This module comes in two flavours: a parser of log files based on Linux distribution
defaults, and a CSV log parser, that you need to enable in database configuration.

The +{modulename}+ module using `.log` was tested with logs from versions 9.5 on Ubuntu,
9.6 on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3.

The +{modulename}+ module using `.csv` was tested using versions 11 and 13 (distro is not relevant here).

include::../include/configuring-intro.asciidoc[]

Expand Down Expand Up @@ -71,6 +76,31 @@ image::./images/filebeat-postgresql-slowlog-overview.png[]

:has-dashboards!:

=== Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in filebeat, just the filepath

On the other hand, it's necessary to configure postgresql to emit `.csv` logs.
The recommended parameters are:

```
logging_collector = 'on';
log_destination = 'csvlog';
log_statement = 'none';
log_checkpoints = on;
log_connections = on;
log_disconnections = on;
log_lock_waits = on;
log_min_duration_statement = 0;
```

In busy servers, `log_min_duration_statement` can cause contention, so you can assign
a value greater than 0.

Both `log_connections` and `log_disconnections` can cause a lot of events if you don't have
persistent connections, so enable with care.

:fileset_ex!:

:modulename!:
Expand Down
34 changes: 32 additions & 2 deletions filebeat/module/postgresql/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,13 @@ include::../include/gs-link.asciidoc[]
[float]
=== Compatibility

The +{modulename}+ module was tested with logs from versions 9.5 on Ubuntu, 9.6
on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3.
This module comes in two flavours: a parser of log files based on Linux distribution
defaults, and a CSV log parser, that you need to enable in database configuration.

The +{modulename}+ module using `.log` was tested with logs from versions 9.5 on Ubuntu,
9.6 on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3.

The +{modulename}+ module using `.csv` was tested using versions 11 and 13 (distro is not relevant here).

include::../include/configuring-intro.asciidoc[]

Expand Down Expand Up @@ -66,6 +71,31 @@ image::./images/filebeat-postgresql-slowlog-overview.png[]

:has-dashboards!:

=== Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in filebeat, just the filepath

On the other hand, it's necessary to configure postgresql to emit `.csv` logs.
The recommended parameters are:

```
logging_collector = 'on';
log_destination = 'csvlog';
log_statement = 'none';
log_checkpoints = on;
log_connections = on;
log_disconnections = on;
log_lock_waits = on;
log_min_duration_statement = 0;
```

In busy servers, `log_min_duration_statement` can cause contention, so you can assign
a value greater than 0.

Both `log_connections` and `log_disconnections` can cause a lot of events if you don't have
persistent connections, so enable with care.

:fileset_ex!:

:modulename!:
2 changes: 1 addition & 1 deletion filebeat/module/postgresql/fields.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit bb78931

Please sign in to comment.