getting LOG: could not fork autovacuum worker process: Cannot allocate memory #671

kumarashish071092 · 2024-06-05T08:38:45Z

I have suddenly started getting OOM errors in my Postgres cluster . I have to restart the patroni to log in to the database.

If I can log in by killing any existing session, then I am getting this error :

postgres=# select count(1) from pg_stat_activity;
ERROR:  out of memory
LINE 1: select count(1) from pg_stat_activity;
                             ^
DETAIL:  Failed on request of size 25462 in memory context "MessageContext".

postgresql_parameters:

{ option: "max_connections", value: "500" }
{ option: "superuser_reserved_connections", value: "5" }
{ option: "password_encryption", value: "{{ postgresql_password_encryption_algorithm }}" }
{ option: "max_locks_per_transaction", value: "512" }
{ option: "max_prepared_transactions", value: "0" }
{ option: "huge_pages", value: "try" } # or "on" if you set "vm_nr_hugepages" in kernel parameters
{ option: "shared_buffers", value: "{{ (ansible_memtotal_mb * 0.25) | int }}MB" } # by default, 25% of RAM
{ option: "effective_cache_size", value: "{{ (ansible_memtotal_mb * 0.75) | int }}MB" } # by default, 75% of RAM
{ option: "work_mem", value: "128MB" } # please change this value
{ option: "maintenance_work_mem", value: "256MB" } # please change this value
{ option: "checkpoint_timeout", value: "15min" }
{ option: "checkpoint_completion_target", value: "0.9" }
{ option: "min_wal_size", value: "1GB" }
{ option: "max_wal_size", value: "2GB" } # or 16GB/32GB
{ option: "wal_buffers", value: "32MB" }
{ option: "default_statistics_target", value: "1000" }
{ option: "seq_page_cost", value: "1" }
{ option: "random_page_cost", value: "4" } # "1.1" for SSD storage. Also, if your databases fits in shared_buffers
{ option: "effective_io_concurrency", value: "2" } # "200" for SSD storage
{ option: "synchronous_commit", value: "on" } # or 'off' if you can you lose single transactions in case of a crash
{ option: "autovacuum", value: "on" } # never turn off the autovacuum!
{ option: "autovacuum_max_workers", value: "5" }
{ option: "autovacuum_vacuum_scale_factor", value: "0.02" } # or 0.005/0.001
{ option: "autovacuum_analyze_scale_factor", value: "0.01" }
{ option: "autovacuum_vacuum_cost_limit", value: "500" } # or 1000/5000
{ option: "autovacuum_vacuum_cost_delay", value: "2" }
{ option: "autovacuum_naptime", value: "1s" }
{ option: "max_files_per_process", value: "4096" }
{ option: "archive_mode", value: "on" }
{ option: "archive_timeout", value: "1800s" }
{ option: "archive_command", value: "cd ." }
- { option: "wal_level", value: "replica" }
{ option: "wal_keep_size", value: "2GB" }
{ option: "max_wal_senders", value: "10" }
{ option: "max_replication_slots", value: "10" }
{ option: "hot_standby", value: "on" }
{ option: "wal_log_hints", value: "on" }
{ option: "wal_compression", value: "on" }
{ option: "shared_preload_libraries", value: "pg_stat_statements,auto_explain" }
{ option: "pg_stat_statements.max", value: "10000" }
{ option: "pg_stat_statements.track", value: "all" }
{ option: "pg_stat_statements.track_utility", value: "false" }
{ option: "pg_stat_statements.save", value: "true" }
{ option: "auto_explain.log_min_duration", value: "10s" } # enable auto_explain for 10-second logging threshold. Decrease this value if necessary
{ option: "auto_explain.log_analyze", value: "true" }
{ option: "auto_explain.log_buffers", value: "true" }
{ option: "auto_explain.log_timing", value: "false" }
{ option: "auto_explain.log_triggers", value: "true" }
{ option: "auto_explain.log_verbose", value: "true" }
{ option: "auto_explain.log_nested_statements", value: "true" }
{ option: "auto_explain.sample_rate", value: "0.01" } # enable auto_explain for 1% of queries logging threshold
{ option: "track_io_timing", value: "on" }
{ option: "log_lock_waits", value: "on" }
{ option: "log_temp_files", value: "0" }
{ option: "listen_addresses", value: "0.0.0.0" }
{ option: "track_activities", value: "on" }
{ option: "track_activity_query_size", value: "4096" }
{ option: "track_counts", value: "on" }
{ option: "track_functions", value: "all" }
{ option: "log_checkpoints", value: "on" }
{ option: "log_connections", value: "on" }
{ option: "log_disconnections", value: "on" }
{ option: "log_autovacuum_min_duration", value: "0" }
- { option: "log_error_verbosity", value: "default" }
{ option: "log_statement", value: "off" }
{ option: "log_min_duration_statement", value: "10" }
{ option: "logging_collector", value: "on" }
{ option: "log_truncate_on_rotation", value: "on" }
{ option: "log_rotation_age", value: "1d" }
{ option: "log_rotation_size", value: "0" }
{ option: "log_line_prefix", value: "'%t [%p]: db=%d,user=%u,app=%a,client=%h '" }
{ option: "log_filename", value: "postgresql-%a.log" }
{ option: "log_directory", value: "{{ postgresql_log_dir }}" }
{ option: "hot_standby_feedback", value: "on" } # allows feedback from a hot standby to the primary that will avoid query conflicts
{ option: "max_standby_streaming_delay", value: "30s" }
{ option: "wal_receiver_status_interval", value: "10s" }
{ option: "idle_in_transaction_session_timeout", value: "10min" } # reduce this timeout if possible
{ option: "jit", value: "off" }
{ option: "max_worker_processes", value: "24" }
{ option: "max_parallel_workers", value: "8" }
{ option: "max_parallel_workers_per_gather", value: "2" }
{ option: "max_parallel_maintenance_workers", value: "2" }
{ option: "tcp_keepalives_count", value: "10" }
{ option: "tcp_keepalives_idle", value: "300" }
{ option: "tcp_keepalives_interval", value: "30" }

I tried reducing the work_mem to 64MB and Shared buffer to 1400 MB still it is coming. What approach should I take?

The text was updated successfully, but these errors were encountered:

kumarashish071092 · 2024-06-05T09:45:05Z

postgres version : 15
OS: Ubuntu 22.04
RAM : 8 GB
Core : 2
Swap : 2 GB
pgbouncer : enabled

it was working fine earlier . Suddenly this issue has appeared.

vitabaks · 2024-06-05T21:08:14Z

Hi @kumarashish071092

This issue is not related to cluster deployment but to maintenance issues. Consider sponsorship, some of the subscriptions include individual support.

Where to start:

Check the monitoring system to determine the reason for the increased memory consumption. For example: more connections or a change in workload. Which processes consume memory.
If query optimization is not considered, then try to reduce memory usage (shared_buffers, work_mem, maintenance_work_mem) or increase the resources of the database server.
Monitor database performance metrics when changing parameters. If the memory usage parameters are reduced, performance degradation is possible.

vitabaks added the question Further information is requested label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting LOG: could not fork autovacuum worker process: Cannot allocate memory #671

getting LOG: could not fork autovacuum worker process: Cannot allocate memory #671

kumarashish071092 commented Jun 5, 2024

kumarashish071092 commented Jun 5, 2024

vitabaks commented Jun 5, 2024

getting LOG: could not fork autovacuum worker process: Cannot allocate memory #671

getting LOG: could not fork autovacuum worker process: Cannot allocate memory #671

Comments

kumarashish071092 commented Jun 5, 2024

kumarashish071092 commented Jun 5, 2024

vitabaks commented Jun 5, 2024