[bitnami/postgresql-repmgr] After a 3-node cluster, simultaneous restart of all nodes resulted in failure to start properly. #67372

kwenzh · 2024-05-31T14:38:46Z

Name and Version

bitnami/postgresql-repmgr:15.5.0-debian-11-r15

What architecture are you using?

amd64

What steps will reproduce the bug?

deploy a 3 node postgres cluster in k8s v1.23, deploy in static pod mode
check it work right, there are one primary and two standby
stop postgres in each node, almost at the same time
check each postgres has stop
start each postgres, almost at the same time
check each postgres status

What is the expected behavior?

The postgres cluster can recover into healthz

What do you see instead?

each pg pod in crash, can not election the primary

pg pod exit log

[2024-05-31 18:54:06] [ERROR] unable connect to upstream node (ID: 2337678), terminating
[2024-05-31 18:54:06] [HINT] upstream node must be running before repmgrd can start
[2024-05-31 18:54:06] [INFO] repmgrd terminating...

Additional information

wal_log_hints = 'on'

reconnect_attempts='24'
reconnect_interval='5'

node_rejoin_timeout=300

The text was updated successfully, but these errors were encountered:

kwenzh · 2024-05-31T14:41:55Z

For example, there are ABC nodes. Before they are all shut down, C is the master node. After they are pulled up at the same time, C is not running right. After B is started, it cannot find the master node. repmgr directly runs the postgres process in B, After C is started in this short period of time, it connects to B's 5432 service and finds that the master is C. The unfiltered IP address is C itslef, which causes an attempt to connect but fails to connect, resulting in a circular dependency between B and C.

kwenzh · 2024-05-31T14:46:41Z

only check primary node is itself in https://github.com/bitnami/containers/blob/main/bitnami/postgresql-repmgr/15/debian-12/rootfs/opt/bitnami/scripts/librepmgr.sh#L224

but no check itself in https://github.com/bitnami/containers/blob/main/bitnami/postgresql-repmgr/15/debian-12/rootfs/opt/bitnami/scripts/librepmgr.sh#L240

when repmgr get primary node is itself, from other nodes postgres service, it retry connect self postgres serivce, but itself is not running ready

kwenzh · 2024-05-31T14:58:46Z

Look similar #999

carrodher · 2024-06-02T17:09:38Z

Thank you for bringing this issue to our attention. We appreciate your involvement! If you're interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

kwenzh · 2024-06-03T01:38:26Z

Thank you for bringing this issue to our attention. We appreciate your involvement! If you're interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

look at this #67370

carrodher · 2024-06-03T06:56:30Z

Thank you for opening this issue and submitting the associated Pull Request. Our team will review and provide feedback. Once the PR is merged, the issue will automatically close.

Your contribution is greatly appreciated!

kwenzh · 2024-06-03T08:07:23Z

Thank you for opening this issue and submitting the associated Pull Request. Our team will review and provide feedback. Once the PR is merged, the issue will automatically close.

Your contribution is greatly appreciated!

Feel free to reach out if you have any questions or need assistance.

github-actions · 2024-06-19T01:27:03Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions · 2024-06-24T01:27:14Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

kwenzh added the tech-issues The user has a technical issue about an application label May 31, 2024

kwenzh mentioned this issue May 31, 2024

[bitnami/postgresql-repmgr]Add judge primary node ip port is itslef #67370

Merged

github-actions bot added the triage Triage is needed label May 31, 2024

github-actions bot assigned carrodher May 31, 2024

carrodher added the postgresql-repmgr label Jun 2, 2024

github-actions bot added the stale 15 days without activity label Jun 19, 2024

github-actions bot added the solved label Jun 24, 2024

bitnami-bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024

fmulero closed this as completed in #67370 Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/postgresql-repmgr] After a 3-node cluster, simultaneous restart of all nodes resulted in failure to start properly. #67372

[bitnami/postgresql-repmgr] After a 3-node cluster, simultaneous restart of all nodes resulted in failure to start properly. #67372

kwenzh commented May 31, 2024 •

edited by carrodher

Loading

kwenzh commented May 31, 2024

kwenzh commented May 31, 2024

kwenzh commented May 31, 2024

carrodher commented Jun 2, 2024

kwenzh commented Jun 3, 2024

carrodher commented Jun 3, 2024

kwenzh commented Jun 3, 2024

github-actions bot commented Jun 19, 2024

github-actions bot commented Jun 24, 2024

[bitnami/postgresql-repmgr] After a 3-node cluster, simultaneous restart of all nodes resulted in failure to start properly. #67372

[bitnami/postgresql-repmgr] After a 3-node cluster, simultaneous restart of all nodes resulted in failure to start properly. #67372

Comments

kwenzh commented May 31, 2024 • edited by carrodher Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

What is the expected behavior?

What do you see instead?

Additional information

kwenzh commented May 31, 2024

kwenzh commented May 31, 2024

kwenzh commented May 31, 2024

carrodher commented Jun 2, 2024

kwenzh commented Jun 3, 2024

carrodher commented Jun 3, 2024

kwenzh commented Jun 3, 2024

github-actions bot commented Jun 19, 2024

github-actions bot commented Jun 24, 2024

kwenzh commented May 31, 2024 •

edited by carrodher

Loading