Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting LOG: could not fork autovacuum worker process: Cannot allocate memory #671

Open
kumarashish071092 opened this issue Jun 5, 2024 · 2 comments
Labels
question Further information is requested

Comments

@kumarashish071092
Copy link

I have suddenly started getting OOM errors in my Postgres cluster . I have to restart the patroni to log in to the database.

If I can log in by killing any existing session, then I am getting this error :

postgres=# select count(1) from pg_stat_activity;
ERROR:  out of memory
LINE 1: select count(1) from pg_stat_activity;
                             ^
DETAIL:  Failed on request of size 25462 in memory context "MessageContext".

postgresql_parameters:

  • { option: "max_connections", value: "500" }
  • { option: "superuser_reserved_connections", value: "5" }
  • { option: "password_encryption", value: "{{ postgresql_password_encryption_algorithm }}" }
  • { option: "max_locks_per_transaction", value: "512" }
  • { option: "max_prepared_transactions", value: "0" }
  • { option: "huge_pages", value: "try" } # or "on" if you set "vm_nr_hugepages" in kernel parameters
  • { option: "shared_buffers", value: "{{ (ansible_memtotal_mb * 0.25) | int }}MB" } # by default, 25% of RAM
  • { option: "effective_cache_size", value: "{{ (ansible_memtotal_mb * 0.75) | int }}MB" } # by default, 75% of RAM
  • { option: "work_mem", value: "128MB" } # please change this value
  • { option: "maintenance_work_mem", value: "256MB" } # please change this value
  • { option: "checkpoint_timeout", value: "15min" }
  • { option: "checkpoint_completion_target", value: "0.9" }
  • { option: "min_wal_size", value: "1GB" }
  • { option: "max_wal_size", value: "2GB" } # or 16GB/32GB
  • { option: "wal_buffers", value: "32MB" }
  • { option: "default_statistics_target", value: "1000" }
  • { option: "seq_page_cost", value: "1" }
  • { option: "random_page_cost", value: "4" } # "1.1" for SSD storage. Also, if your databases fits in shared_buffers
  • { option: "effective_io_concurrency", value: "2" } # "200" for SSD storage
  • { option: "synchronous_commit", value: "on" } # or 'off' if you can you lose single transactions in case of a crash
  • { option: "autovacuum", value: "on" } # never turn off the autovacuum!
  • { option: "autovacuum_max_workers", value: "5" }
  • { option: "autovacuum_vacuum_scale_factor", value: "0.02" } # or 0.005/0.001
  • { option: "autovacuum_analyze_scale_factor", value: "0.01" }
  • { option: "autovacuum_vacuum_cost_limit", value: "500" } # or 1000/5000
  • { option: "autovacuum_vacuum_cost_delay", value: "2" }
  • { option: "autovacuum_naptime", value: "1s" }
  • { option: "max_files_per_process", value: "4096" }
  • { option: "archive_mode", value: "on" }
  • { option: "archive_timeout", value: "1800s" }
  • { option: "archive_command", value: "cd ." }
    • { option: "wal_level", value: "replica" }
  • { option: "wal_keep_size", value: "2GB" }
  • { option: "max_wal_senders", value: "10" }
  • { option: "max_replication_slots", value: "10" }
  • { option: "hot_standby", value: "on" }
  • { option: "wal_log_hints", value: "on" }
  • { option: "wal_compression", value: "on" }
  • { option: "shared_preload_libraries", value: "pg_stat_statements,auto_explain" }
  • { option: "pg_stat_statements.max", value: "10000" }
  • { option: "pg_stat_statements.track", value: "all" }
  • { option: "pg_stat_statements.track_utility", value: "false" }
  • { option: "pg_stat_statements.save", value: "true" }
  • { option: "auto_explain.log_min_duration", value: "10s" } # enable auto_explain for 10-second logging threshold. Decrease this value if necessary
  • { option: "auto_explain.log_analyze", value: "true" }
  • { option: "auto_explain.log_buffers", value: "true" }
  • { option: "auto_explain.log_timing", value: "false" }
  • { option: "auto_explain.log_triggers", value: "true" }
  • { option: "auto_explain.log_verbose", value: "true" }
  • { option: "auto_explain.log_nested_statements", value: "true" }
  • { option: "auto_explain.sample_rate", value: "0.01" } # enable auto_explain for 1% of queries logging threshold
  • { option: "track_io_timing", value: "on" }
  • { option: "log_lock_waits", value: "on" }
  • { option: "log_temp_files", value: "0" }
  • { option: "listen_addresses", value: "0.0.0.0" }
  • { option: "track_activities", value: "on" }
  • { option: "track_activity_query_size", value: "4096" }
  • { option: "track_counts", value: "on" }
  • { option: "track_functions", value: "all" }
  • { option: "log_checkpoints", value: "on" }
  • { option: "log_connections", value: "on" }
  • { option: "log_disconnections", value: "on" }
  • { option: "log_autovacuum_min_duration", value: "0" }
    • { option: "log_error_verbosity", value: "default" }
  • { option: "log_statement", value: "off" }
  • { option: "log_min_duration_statement", value: "10" }
  • { option: "logging_collector", value: "on" }
  • { option: "log_truncate_on_rotation", value: "on" }
  • { option: "log_rotation_age", value: "1d" }
  • { option: "log_rotation_size", value: "0" }
  • { option: "log_line_prefix", value: "'%t [%p]: db=%d,user=%u,app=%a,client=%h '" }
  • { option: "log_filename", value: "postgresql-%a.log" }
  • { option: "log_directory", value: "{{ postgresql_log_dir }}" }
  • { option: "hot_standby_feedback", value: "on" } # allows feedback from a hot standby to the primary that will avoid query conflicts
  • { option: "max_standby_streaming_delay", value: "30s" }
  • { option: "wal_receiver_status_interval", value: "10s" }
  • { option: "idle_in_transaction_session_timeout", value: "10min" } # reduce this timeout if possible
  • { option: "jit", value: "off" }
  • { option: "max_worker_processes", value: "24" }
  • { option: "max_parallel_workers", value: "8" }
  • { option: "max_parallel_workers_per_gather", value: "2" }
  • { option: "max_parallel_maintenance_workers", value: "2" }
  • { option: "tcp_keepalives_count", value: "10" }
  • { option: "tcp_keepalives_idle", value: "300" }
  • { option: "tcp_keepalives_interval", value: "30" }

I tried reducing the work_mem to 64MB and Shared buffer to 1400 MB still it is coming. What approach should I take?

@kumarashish071092
Copy link
Author

postgres version : 15
OS: Ubuntu 22.04
RAM : 8 GB
Core : 2
Swap : 2 GB
pgbouncer : enabled

it was working fine earlier . Suddenly this issue has appeared.

@vitabaks vitabaks added the question Further information is requested label Jun 5, 2024
@vitabaks
Copy link
Owner

vitabaks commented Jun 5, 2024

Hi @kumarashish071092

This issue is not related to cluster deployment but to maintenance issues. Consider sponsorship, some of the subscriptions include individual support.

Where to start:

  1. Check the monitoring system to determine the reason for the increased memory consumption. For example: more connections or a change in workload. Which processes consume memory.
  2. If query optimization is not considered, then try to reduce memory usage (shared_buffers, work_mem, maintenance_work_mem) or increase the resources of the database server.
  3. Monitor database performance metrics when changing parameters. If the memory usage parameters are reduced, performance degradation is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants