Skip to content

Commit

Permalink
Add docs in PostgreSQL modules about recommended configuration (#24588)
Browse files Browse the repository at this point in the history
  • Loading branch information
jsoriano authored Mar 24, 2021
1 parent b5e43fc commit 7f5a358
Show file tree
Hide file tree
Showing 7 changed files with 208 additions and 52 deletions.
102 changes: 76 additions & 26 deletions filebeat/docs/modules/postgresql.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,80 @@ The +{modulename}+ module using `.log` was tested with logs from versions 9.5 on

The +{modulename}+ module using `.csv` was tested using versions 11 and 13 (distro is not relevant here).

[float]
=== Supported log formats

This module can collect any logs from PostgreSQL servers, but to be able to
better analyze their contents and extract more information, they should be
formatted in a determined way.

There are some settings to take into account for the log format.

Log lines should be preffixed with the timestamp in milliseconds, the process
id, the user id and the database name. This uses to be the default in most
distributions, and is translated to this setting in the configuration file:

["source","sh"]
----------------------------
log_line_prefix = '%m [%p] %q%u@%d '
----------------------------

PostgreSQL server can be configured to log statements and their durations and
this module is able to collect this information. To be able to correlate each
duration with their statements, they must be logged in the same line. This
happens when the following options are used:

["source","sh"]
----------------------------
log_duration = 'on'
log_statement = 'none'
log_min_duration_statement = 0
----------------------------

Setting a zero value in `log_min_duration_statement` will log all statements
executed by a client. You probably want to configure it to a higher value, so it
logs only slower statements. This value is configured in milliseconds.

When using `log_statement` and `log_duration` together, statements and durations
are logged in different lines, and {beatname_uc} is not able to correlate both
values, for this reason it is recommended to disable `log_statement`.

NOTE: The PostgreSQL module of Metricbeat is also able to collect information
about all statements executed in the server. You may chose which one is better
for your needings. An important difference is that the Metricbeat module
collects aggregated information when the statement is executed several times,
but cannot know when each statement was executed. This information can be
obtained from logs.

Other logging options that you may consider to enable are the following ones:

["source","sh"]
----------------------------
log_checkpoints = 'on';
log_connections = 'on';
log_disconnections = 'on';
log_lock_waits = 'on';
----------------------------

Both `log_connections` and `log_disconnections` can cause a lot of events if you
don't have persistent connections, so enable with care.

[float]
=== Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in {beatname_uc}, just the filepath.

On the other hand, it's necessary to configure postgresql to emit `.csv` logs.
The recommended parameters are:

["source","sh"]
----------------------------
logging_collector = 'on';
log_destination = 'csvlog';
----------------------------


include::../include/configuring-intro.asciidoc[]

The following example shows how to set paths in the +modules.d/{modulename}.yml+
Expand Down Expand Up @@ -69,38 +143,14 @@ The first dashboard is for regular logs.
[role="screenshot"]
image::./images/filebeat-postgresql-overview.png[]

The second one shows the slowlogs of PostgreSQL.
The second one shows the slowlogs of PostgreSQL. If `log_min_duration_statement`
is not used, this dashboard will show incomplete or no data.

[role="screenshot"]
image::./images/filebeat-postgresql-slowlog-overview.png[]

:has-dashboards!:

=== Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in filebeat, just the filepath

On the other hand, it's necessary to configure postgresql to emit `.csv` logs.
The recommended parameters are:

```
logging_collector = 'on';
log_destination = 'csvlog';
log_statement = 'none';
log_checkpoints = on;
log_connections = on;
log_disconnections = on;
log_lock_waits = on;
log_min_duration_statement = 0;
```

In busy servers, `log_min_duration_statement` can cause contention, so you can assign
a value greater than 0.

Both `log_connections` and `log_disconnections` can cause a lot of events if you don't have
persistent connections, so enable with care.

:fileset_ex!:

:modulename!:
Expand Down
102 changes: 76 additions & 26 deletions filebeat/module/postgresql/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,80 @@ The +{modulename}+ module using `.log` was tested with logs from versions 9.5 on

The +{modulename}+ module using `.csv` was tested using versions 11 and 13 (distro is not relevant here).

[float]
=== Supported log formats

This module can collect any logs from PostgreSQL servers, but to be able to
better analyze their contents and extract more information, they should be
formatted in a determined way.

There are some settings to take into account for the log format.

Log lines should be preffixed with the timestamp in milliseconds, the process
id, the user id and the database name. This uses to be the default in most
distributions, and is translated to this setting in the configuration file:

["source","sh"]
----------------------------
log_line_prefix = '%m [%p] %q%u@%d '
----------------------------

PostgreSQL server can be configured to log statements and their durations and
this module is able to collect this information. To be able to correlate each
duration with their statements, they must be logged in the same line. This
happens when the following options are used:

["source","sh"]
----------------------------
log_duration = 'on'
log_statement = 'none'
log_min_duration_statement = 0
----------------------------

Setting a zero value in `log_min_duration_statement` will log all statements
executed by a client. You probably want to configure it to a higher value, so it
logs only slower statements. This value is configured in milliseconds.

When using `log_statement` and `log_duration` together, statements and durations
are logged in different lines, and {beatname_uc} is not able to correlate both
values, for this reason it is recommended to disable `log_statement`.

NOTE: The PostgreSQL module of Metricbeat is also able to collect information
about all statements executed in the server. You may chose which one is better
for your needings. An important difference is that the Metricbeat module
collects aggregated information when the statement is executed several times,
but cannot know when each statement was executed. This information can be
obtained from logs.

Other logging options that you may consider to enable are the following ones:

["source","sh"]
----------------------------
log_checkpoints = 'on';
log_connections = 'on';
log_disconnections = 'on';
log_lock_waits = 'on';
----------------------------

Both `log_connections` and `log_disconnections` can cause a lot of events if you
don't have persistent connections, so enable with care.

[float]
=== Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in {beatname_uc}, just the filepath.

On the other hand, it's necessary to configure postgresql to emit `.csv` logs.
The recommended parameters are:

["source","sh"]
----------------------------
logging_collector = 'on';
log_destination = 'csvlog';
----------------------------


include::../include/configuring-intro.asciidoc[]

The following example shows how to set paths in the +modules.d/{modulename}.yml+
Expand Down Expand Up @@ -64,38 +138,14 @@ The first dashboard is for regular logs.
[role="screenshot"]
image::./images/filebeat-postgresql-overview.png[]

The second one shows the slowlogs of PostgreSQL.
The second one shows the slowlogs of PostgreSQL. If `log_min_duration_statement`
is not used, this dashboard will show incomplete or no data.

[role="screenshot"]
image::./images/filebeat-postgresql-slowlog-overview.png[]

:has-dashboards!:

=== Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in filebeat, just the filepath

On the other hand, it's necessary to configure postgresql to emit `.csv` logs.
The recommended parameters are:

```
logging_collector = 'on';
log_destination = 'csvlog';
log_statement = 'none';
log_checkpoints = on;
log_connections = on;
log_disconnections = on;
log_lock_waits = on;
log_min_duration_statement = 0;
```

In busy servers, `log_min_duration_statement` can cause contention, so you can assign
a value greater than 0.

Both `log_connections` and `log_disconnections` can cause a lot of events if you don't have
persistent connections, so enable with care.

:fileset_ex!:

:modulename!:
4 changes: 4 additions & 0 deletions metricbeat/docs/modules/postgresql.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,10 @@ metricbeat.modules:
# Stats about every PostgreSQL process
- activity
# Stats about every statement executed in the server. It requires the
# `pg_stats_statement` library to be configured in the server.
#- statement
period: 10s
# The host must be passed as PostgreSQL URL. Example:
Expand Down
4 changes: 4 additions & 0 deletions metricbeat/metricbeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -734,6 +734,10 @@ metricbeat.modules:
# Stats about every PostgreSQL process
- activity

# Stats about every statement executed in the server. It requires the
# `pg_stats_statement` library to be configured in the server.
#- statement

period: 10s

# The host must be passed as PostgreSQL URL. Example:
Expand Down
4 changes: 4 additions & 0 deletions metricbeat/module/postgresql/_meta/config.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
# Stats about every PostgreSQL process
- activity

# Stats about every statement executed in the server. It requires the
# `pg_stats_statement` library to be configured in the server.
#- statement

period: 10s

# The host must be passed as PostgreSQL URL. Example:
Expand Down
40 changes: 40 additions & 0 deletions metricbeat/module/postgresql/statement/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
@@ -1 +1,41 @@
This is the `statement` metricset of the PostgreSQL module.

This module collects information from the `pg_stat_statements` view, that keeps
track of planning and execution statistics of all SQL statements executed by
the server.

`pg_stat_statements` is included by an additional module in PostgreSQL. This
module requires additional shared memory, and is disabled by default.

You can enable it by adding this module to the configuration as a shared
preloaded library.

["source"]
-------------------------------------------
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
-------------------------------------------

NOTE: Preloading this library in your server will increase the memory usage of
your PostgreSQL server. Use it with care.

Once the server is started with this module, it starts collecting statistics
about all statements executed. To make these statistics available in the
`pg_stat_statements` view, the following statement needs to be executed in the
server:

["source","sql"]
-------------------------------------------
CREATE EXTENSION pg_stat_statements;
-------------------------------------------

You can read more about the available options for this module in the
https://www.postgresql.org/docs/13/pgstatstatements.html[official documentation].

NOTE: The PostgreSQL module of Filebeat is also able to collect information
about statements executed in the server from its logs. You may chose which one
is better for your needings. An important difference is that the Metricbeat
module collects aggregated information when the statement is executed several
times, but cannot know when each statement was executed. This information can be
obtained from logs.
4 changes: 4 additions & 0 deletions x-pack/metricbeat/metricbeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1120,6 +1120,10 @@ metricbeat.modules:
# Stats about every PostgreSQL process
- activity

# Stats about every statement executed in the server. It requires the
# `pg_stats_statement` library to be configured in the server.
#- statement

period: 10s

# The host must be passed as PostgreSQL URL. Example:
Expand Down

0 comments on commit 7f5a358

Please sign in to comment.