Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgresql goes down due to "terminating connection due to administrator command" #616

Open
cbartz opened this issue Aug 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@cbartz
Copy link

cbartz commented Aug 7, 2024

Steps to reproduce

Deploy postgresql 14/edge on rev 239 with three units.

Model                        Controller                         Cloud/Region              Version  SLA          Timestamp
prod-repo-policy-compliance  juju-controller-35-production-ps6  k8s-prod-general/default  3.5.3    unsupported  08:07:59Z

App                       Version  Status  Scale  Charm                     Channel      Rev  Address        Exposed  Message
postgresql                14.12    active      3  postgresql-k8s            14/edge      239  10.87.6.161    no       Primary
repo-policy-compliance             active      1  repo-policy-compliance    latest/edge   17  10.87.26.0     no       

Unit                         Workload  Agent  Address          Ports  Message
postgresql/0                 active    idle   192.168.102.57          Primary
postgresql/1*                active    idle   192.168.100.212         
postgresql/2                 active    idle   192.168.103.49   
repo-policy-compliance/0*    active    idle   192.168.102.43          

Expected behavior

Postgreql is not going down.

Actual behavior

We received the Postgresql down alert on a production deployment with three units.

[[FIRING:2] (instance="prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1", job="juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0", juju_charm="postgresql-k8s", juju_unit="postgresql/1", severity="critical")](http://alertmanager-0.alertmanager-endpoints.prod-cos-k8s-ps6-is-charms.svc.cluster.local:9093/#/alerts?receiver=mattermost-notifications)
Alert: Postgresql down (instance prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1) - critical

Description: Postgresql instance is down
  VALUE = 0
  LABELS = map[name:pg_up instance:prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1 job:juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0 juju_application:postgresql juju_charm:postgresql-k8s juju_model:prod-repo-policy-compliance juju_model_uuid:e3901794-73c7-43f0-85ae-5179a6dc1b16 juju_unit:postgresql/1]

Details:
   • alertname: PostgresqlDown
   • instance: prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1
   • job: juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0
   • juju_application: postgresql
   • juju_charm: postgresql-k8s
   • juju_model: prod-repo-policy-compliance
   • juju_model_uuid: e3901794-73c7-43f0-85ae-5179a6dc1b16
   • juju_unit: postgresql/1
   • severity: critical
  

Source:  http://prometheus-0.prometheus-endpoints.prod-cos-k8s-ps6-is-charms.svc.cluster.local:9090/prod-cos-k8s-ps6-is-charms-prometheus-0/graph?g0.expr=pg_up%7Bjuju_application%3D%22postgresql%22%2Cjuju_model%3D%22prod-repo-policy-compliance%22%2Cjuju_model_uuid%3D%22e3901794-73c7-43f0-85ae-5179a6dc1b16%22%7D+%3D%3D+0&g0.tab=1
Alert: Postgresql exporter error (instance prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1) - critical

Description: Postgresql exporter is showing errors. A query may be buggy in query.yaml
  VALUE = 1
  LABELS = map[name:pg_exporter_last_scrape_error instance:prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1 job:juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0 juju_application:postgresql juju_charm:postgresql-k8s juju_model:prod-repo-policy-compliance juju_model_uuid:e3901794-73c7-43f0-85ae-5179a6dc1b16 juju_unit:postgresql/1]

Details:
   • alertname: PostgresqlExporterError
   • instance: prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1
   • job: juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0
   • juju_application: postgresql
   • juju_charm: postgresql-k8s
   • juju_model: prod-repo-policy-compliance
   • juju_model_uuid: e3901794-73c7-43f0-85ae-5179a6dc1b16
   • juju_unit: postgresql/1
   • severity: critical
  

Source:  http://prometheus-0.prometheus-endpoints.prod-cos-k8s-ps6-is-charms.svc.cluster.local:9090/prod-cos-k8s-ps6-is-charms-prometheus-0/graph?g0.expr=pg_exporter_last_scrape_error%7Bjuju_application%3D%22postgresql%22%2Cjuju_model%3D%22prod-repo-policy-compliance%22%2Cjuju_model_uuid%3D%22e3901794-73c7-43f0-85ae-5179a6dc1b16%22%7D+%3E+0&g0.tab=1

Logs indicate that postgresql-1 was shutdown due to

2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=12 LOG:  received fast shutdown request
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=13 LOG:  aborting any active transactions
2024-08-07 06:58:22 UTC [8888]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8867]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8769]: user=operator,db=postgres,app=Patroni,client=127.0.0.1,line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=14 LOG:  background worker "logical replication launcher" (PID 8783) exited with exit code 1
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=5 LOG:  shutting down

It healed itself quite quickly.

Versions

Operating system:

Juju CLI: 3.5.3

Juju agent: 3.5.3

Charm revision: 239

microk8s:

Log output

juju debug log

unit-postgresql-0: 06:54:06 DEBUG unit.postgresql/0.juju-log ops 2.12.0 up and running.
unit-postgresql-0: 06:54:06 DEBUG unit.postgresql/0.juju-log Emitting Juju event update_status.
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 INFO unit.postgresql/0.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-0: 06:54:09 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-1: 06:54:24 DEBUG unit.postgresql/1.juju-log ops 2.12.0 up and running.
unit-postgresql-1: 06:54:24 DEBUG unit.postgresql/1.juju-log Emitting Juju event update_status.
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 06:54:25 INFO unit.postgresql/1.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-1: 06:54:25 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log ops 2.12.0 up and running.
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Emitting Juju event update_status.
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 INFO unit.postgresql/2.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-2: 06:56:47 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-0: 06:59:40 DEBUG unit.postgresql/0.juju-log ops 2.12.0 up and running.
unit-postgresql-0: 06:59:40 DEBUG unit.postgresql/0.juju-log Emitting Juju event update_status.
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:59:42 INFO unit.postgresql/0.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-0: 06:59:42 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log ops 2.12.0 up and running.
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Emitting Juju event update_status.
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 INFO unit.postgresql/1.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-1: 07:00:03 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)

postgresql log

root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0658.log 
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=12 LOG:  received fast shutdown request
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=13 LOG:  aborting any active transactions
2024-08-07 06:58:22 UTC [8888]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8867]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8769]: user=operator,db=postgres,app=Patroni,client=127.0.0.1,line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=14 LOG:  background worker "logical replication launcher" (PID 8783) exited with exit code 1
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=5 LOG:  shutting down
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=6 LOG:  checkpoint starting: shutdown immediate
2024-08-07 06:58:23 UTC [8765]: user=,db=,app=,client=,line=7 LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.015 s, sync=0.001 s, total=0.242 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16382 kB, estimate=16382 kB
2024-08-07 06:58:23 UTC [17415]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17419]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17421]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:34 UTC [8758]: user=,db=,app=,client=,line=15 LOG:  database system is shut down
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=1 LOG:  database system was shut down at 2024-08-07 06:58:23 UTC
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:37 UTC [17583]: user=rewind,db=postgres,app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17584]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17586]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=5 LOG:  consistent recovery state reached at 0/2C0000A0
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=6 LOG:  invalid record length at 0/2C0000A0: wanted 24, got 0
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  no pg_hba.conf entry for replication connection from host "192.168.102.57", user "replication", no encryption
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=2 DETAIL:  Client IP address resolved to "192-168-102-57.postgresql-replicas.prod-repo-policy-compliance.svc.cluster.local", forward lookup not checked.
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=7 LOG:  received promote request
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=8 LOG:  redo is not required
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=9 LOG:  selected new timeline ID: 66
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=10 LOG:  archive recovery complete
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=1 LOG:  checkpoint starting: force
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17577]: user=,db=,app=,client=,line=7 LOG:  database system is ready to accept connections
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=2 LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.020 s, sync=0.003 s, total=0.067 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=8 LOG:  received SIGHUP, reloading configuration files
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=9 LOG:  parameter "timezone" changed to "Etc/UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=10 LOG:  parameter "TimeZone" changed to "UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=11 LOG:  parameter "synchronous_standby_names" changed to ""postgresql-2""
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=11 LOG:  standby "postgresql-2" is now a synchronous standby with priority 1
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=12 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=12 LOG:  received immediate shutdown request
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=13 LOG:  database system is shut down
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=1 LOG:  database system was interrupted; last known up at 2024-08-07 06:58:46 UTC
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=5 LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=6 LOG:  redo starts at 0/2C0000D0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=7 LOG:  invalid record length at 0/2C000218: wanted 24, got 0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=8 LOG:  consistent recovery state reached at 0/2C000218
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:59:00 UTC [17771]: user=,db=,app=,client=,line=7 LOG:  received SIGHUP, reloading configuration files
root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0656.log 
root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0655.log 
root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0658.log 
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=12 LOG:  received fast shutdown request
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=13 LOG:  aborting any active transactions
2024-08-07 06:58:22 UTC [8888]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8867]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8769]: user=operator,db=postgres,app=Patroni,client=127.0.0.1,line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=14 LOG:  background worker "logical replication launcher" (PID 8783) exited with exit code 1
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=5 LOG:  shutting down
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=6 LOG:  checkpoint starting: shutdown immediate
2024-08-07 06:58:23 UTC [8765]: user=,db=,app=,client=,line=7 LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.015 s, sync=0.001 s, total=0.242 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16382 kB, estimate=16382 kB
2024-08-07 06:58:23 UTC [17415]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17419]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17421]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:34 UTC [8758]: user=,db=,app=,client=,line=15 LOG:  database system is shut down
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=1 LOG:  database system was shut down at 2024-08-07 06:58:23 UTC
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:37 UTC [17583]: user=rewind,db=postgres,app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17584]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17586]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=5 LOG:  consistent recovery state reached at 0/2C0000A0
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=6 LOG:  invalid record length at 0/2C0000A0: wanted 24, got 0
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  no pg_hba.conf entry for replication connection from host "192.168.102.57", user "replication", no encryption
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=2 DETAIL:  Client IP address resolved to "192-168-102-57.postgresql-replicas.prod-repo-policy-compliance.svc.cluster.local", forward lookup not checked.
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=7 LOG:  received promote request
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=8 LOG:  redo is not required
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=9 LOG:  selected new timeline ID: 66
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=10 LOG:  archive recovery complete
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=1 LOG:  checkpoint starting: force
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17577]: user=,db=,app=,client=,line=7 LOG:  database system is ready to accept connections
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=2 LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.020 s, sync=0.003 s, total=0.067 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=8 LOG:  received SIGHUP, reloading configuration files
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=9 LOG:  parameter "timezone" changed to "Etc/UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=10 LOG:  parameter "TimeZone" changed to "UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=11 LOG:  parameter "synchronous_standby_names" changed to ""postgresql-2""
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=11 LOG:  standby "postgresql-2" is now a synchronous standby with priority 1
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=12 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=12 LOG:  received immediate shutdown request
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=13 LOG:  database system is shut down
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=1 LOG:  database system was interrupted; last known up at 2024-08-07 06:58:46 UTC
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=5 LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=6 LOG:  redo starts at 0/2C0000D0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=7 LOG:  invalid record length at 0/2C000218: wanted 24, got 0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=8 LOG:  consistent recovery state reached at 0/2C000218
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:59:00 UTC [17771]: user=,db=,app=,client=,line=7 LOG:  received SIGHUP, reloading configuration files

patroni log

root@postgresql-1:/# cat /var/log/postgresql/patroni.log.110
2024-08-07 07:02:09 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
2024-08-07 07:02:19 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
2024-08-07 07:02:29 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
2024-08-07 07:02:39 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
root@postgresql-1:/# cat /var/log/postgresql/patroni.log.120
2024-08-07 06:58:58 UTC [16]: WARNING: Retry got exception: connection problems 
2024-08-07 06:58:58 UTC [16]: INFO: Lock owner: postgresql-0; I am postgresql-1 
2024-08-07 06:58:58 UTC [16]: INFO: starting after demotion in progress 
2024-08-07 06:58:58 UTC [16]: INFO: Lock owner: postgresql-0; I am postgresql-1 
2024-08-07 06:58:58 UTC [16]: INFO: starting after demotion in progress 
2024-08-07 06:58:58 UTC [16]: WARNING: Retry got exception: connection problems 
2024-08-07 06:58:59 UTC [16]: INFO: Lock owner: postgresql-0; I am postgresql-1 
root@postgresql-1:/# cat /var/log/postgresql/patroni.log.121
2024-08-07 06:58:56 UTC [16]: ERROR: failed to update leader lock 
2024-08-07 06:58:56 UTC [16]: INFO: Demoting self (immediate-nolock) 
2024-08-07 06:58:56 UTC [16]: INFO: demoted self because failed to update leader lock in DCS 
2024-08-07 06:58:56 UTC [16]: INFO: Lock owner: None; I am postgresql-1 
2024-08-07 06:58:56 UTC [16]: INFO: not healthy enough for leader race 
2024-08-07 06:58:56 UTC [16]: INFO: starting after demotion in progress 
2024-08-07 06:58:56 UTC [16]: INFO: closed patroni connection to the postgresql cluster 
2024-08-07 06:58:57 UTC [16]: INFO: postmaster pid=17771 
root@postgresql-1:/# cat /var/log/postgresql/patroni.log.122
2024-08-07 06:58:56 UTC [16]: ERROR: Unexpected error from Kubernetes API 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1159, in _update_leader_with_retry
    return bool(self._patch_or_create(self.leader_path, annotations, resource_version, ips=ips, retry=_retry))
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1090, in _patch_or_create
    ret = retry(func, self._namespace, body) if retry else func(self._namespace, body)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1156, in _retry
    return retry(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/utils.py", line 599, in __call__
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 558, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 479, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 447, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 266, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'da0b6ecb-8d42-4964-9819-41a2d385b2dd', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '37606d0c-8ddf-4cc4-9faa-563142648f97', 'X-Kubernetes-Pf-Prioritylevel-Uid': '296b2e7b-e274-4aed-9e4a-019ddda1c429', 'Date': 'Wed, 07 Aug 2024 06:58:56 GMT', 'Content-Length': '214'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \\"patroni-postgresql\\" not found","reason":"NotFound","details":{"name":"patroni-postgresql","kind":"endpoints"},"code":404}\n'

root@postgresql-1:/# cat /var/log/postgresql/patroni.log.123
2024-08-07 06:58:56 UTC [16]: INFO: Assigning synchronous standby status to ['postgresql-2'] 
2024-08-07 06:58:56 UTC [16]: INFO: Synchronous standby status assigned to ['postgresql-2'] 
2024-08-07 06:58:56 UTC [16]: INFO: acquired session lock as a leader 
2024-08-07 06:58:56 UTC [16]: INFO: Lock owner: postgresql-1; I am postgresql-1 

Additional context

@cbartz cbartz added the bug Something isn't working label Aug 7, 2024
Copy link
Contributor

github-actions bot commented Aug 7, 2024

@dragomirp
Copy link
Contributor

Looks like Patroni was unable to access the K8s API to update the leader lock. We can potentially enable Patroni's DCS failsafe mode to try to mitigate this.

@taurus-forever
Copy link
Contributor

The active discussion is happening in #669 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants