deal with running_xacts to hot standby replica #7236

skyzh · 2024-03-25T18:26:58Z

Steps to reproduce

A previous fix to ensure hot standby replicas to have running transactions information caused replicas fail to start or take a long time to start (tracked by #7204). To fix this issue, we reverted part of #6705 and removed test_replication_start test, via #7209. We need to fix this issue and add back the test cases.

Expected result

Actual result

Environment

Logs, links

The text was updated successfully, but these errors were encountered:

skyzh · 2024-03-26T17:08:24Z

Discussion at #6211 (comment)

skyzh · 2024-04-03T14:46:01Z

maybe part of #6211

kelvich · 2024-06-18T15:41:11Z

PR: #7288

We have one pretty serious MVCC visibility bug with hot standby replicas. We incorrectly treat any transactions that are in progress in the primary, when the standby is started, as aborted. That can break MVCC for queries running concurrently in the standby. It can also lead to hint bits being set incorrectly, and that damage can last until the replica is restarted. The fundamental bug was that we treated any replica start as starting from a shut down server. The fix for that is straightforward: we need to set 'wasShutdown = false' in InitWalRecovery() (see changes in the postgres repo). However, that introduces a new problem: with wasShutdown = false, the standby will not open up for queries until it receives a running-xacts WAL record from the primary. That's correct, and that's how Postgres hot standby always works. But it's a problem for Neon, because: * It changes the historical behavior for existing users. Currently, the standby immediately opens up for queries, so if they now need to wait, we can breka existing use cases that were working fine (assuming you don't hit the MVCC issues). * The problem is much worse for Neon than it is for standalone PostgreSQL, because in Neon, we can start a replica from an arbitrary LSN. In standalone PostgreSQL, the replica always starts WAL replay from a checkpoint record, and the primary arranges things so that there is always a running-xacts record soon after each checkpoint record. You can still hit this issue with PostgreSQL if you have a transaction with lots of subtransactions running in the primary, but it's pretty rare in practice. To mitigate that, we introduce another way to collect the running-xacts information at startup, without waiting for the running-xacts WAL record: We can the CLOG for XIDs that haven't been marked as committed or aborted. It has limitations with subtransactions too, but should mitigate the problem for most users. See #7236.

We have one pretty serious MVCC visibility bug with hot standby replicas. We incorrectly treat any transactions that are in progress in the primary, when the standby is started, as aborted. That can break MVCC for queries running concurrently in the standby. It can also lead to hint bits being set incorrectly, and that damage can last until the replica is restarted. The fundamental bug was that we treated any replica start as starting from a shut down server. The fix for that is straightforward: we need to set 'wasShutdown = false' in InitWalRecovery() (see changes in the postgres repo). However, that introduces a new problem: with wasShutdown = false, the standby will not open up for queries until it receives a running-xacts WAL record from the primary. That's correct, and that's how Postgres hot standby always works. But it's a problem for Neon, because: * It changes the historical behavior for existing users. Currently, the standby immediately opens up for queries, so if they now need to wait, we can breka existing use cases that were working fine (assuming you don't hit the MVCC issues). * The problem is much worse for Neon than it is for standalone PostgreSQL, because in Neon, we can start a replica from an arbitrary LSN. In standalone PostgreSQL, the replica always starts WAL replay from a checkpoint record, and the primary arranges things so that there is always a running-xacts record soon after each checkpoint record. You can still hit this issue with PostgreSQL if you have a transaction with lots of subtransactions running in the primary, but it's pretty rare in practice. To mitigate that, we introduce another way to collect the running-xacts information at startup, without waiting for the running-xacts WAL record: We can the CLOG for XIDs that haven't been marked as committed or aborted. It has limitations with subtransactions too, but should mitigate the problem for most users. See #7236. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

hlinnaka · 2024-07-01T11:17:16Z

PR #7288 was merged. That hopefully fixes most of the issues, although there are still known cases with lots of subtransactions, where the read replica might not immediately start up, or might crash with an error later. We are betting that those cases won't happen very often in practice.

kelvich · 2024-07-01T22:32:21Z

so let's close this one then. we track overall phys replication re-launch in parent epics

We have one pretty serious MVCC visibility bug with hot standby replicas. We incorrectly treat any transactions that are in progress in the primary, when the standby is started, as aborted. That can break MVCC for queries running concurrently in the standby. It can also lead to hint bits being set incorrectly, and that damage can last until the replica is restarted. The fundamental bug was that we treated any replica start as starting from a shut down server. The fix for that is straightforward: we need to set 'wasShutdown = false' in InitWalRecovery() (see changes in the postgres repo). However, that introduces a new problem: with wasShutdown = false, the standby will not open up for queries until it receives a running-xacts WAL record from the primary. That's correct, and that's how Postgres hot standby always works. But it's a problem for Neon, because: * It changes the historical behavior for existing users. Currently, the standby immediately opens up for queries, so if they now need to wait, we can breka existing use cases that were working fine (assuming you don't hit the MVCC issues). * The problem is much worse for Neon than it is for standalone PostgreSQL, because in Neon, we can start a replica from an arbitrary LSN. In standalone PostgreSQL, the replica always starts WAL replay from a checkpoint record, and the primary arranges things so that there is always a running-xacts record soon after each checkpoint record. You can still hit this issue with PostgreSQL if you have a transaction with lots of subtransactions running in the primary, but it's pretty rare in practice. To mitigate that, we introduce another way to collect the running-xacts information at startup, without waiting for the running-xacts WAL record: We can the CLOG for XIDs that haven't been marked as committed or aborted. It has limitations with subtransactions too, but should mitigate the problem for most users. See #7236. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

skyzh added t/bug Issue Type: Bug c/compute Component: compute, excluding postgres itself labels Mar 25, 2024

skyzh mentioned this issue Mar 25, 2024

Do not set hint bits at replica until running-xacts record is received neondatabase/postgres#403

Closed

skyzh changed the title ~~passing running_xacts to replica~~ passing running_xacts to hot standby replica Mar 26, 2024

kelvich changed the title ~~passing running_xacts to hot standby replica~~ deal with running_xacts to hot standby replica Jun 18, 2024

ololobus assigned knizhnik Jun 20, 2024

This was referenced Jun 20, 2024

Epic: stabilize physical replication #6211

Open

Restore running xacts from CLOG on replica startup #7288

Merged

kelvich closed this as completed Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deal with running_xacts to hot standby replica #7236

deal with running_xacts to hot standby replica #7236

skyzh commented Mar 25, 2024

skyzh commented Mar 26, 2024

skyzh commented Apr 3, 2024

kelvich commented Jun 18, 2024

hlinnaka commented Jul 1, 2024

kelvich commented Jul 1, 2024

deal with running_xacts to hot standby replica #7236

deal with running_xacts to hot standby replica #7236

Comments

skyzh commented Mar 25, 2024

Steps to reproduce

Expected result

Actual result

Environment

Logs, links

skyzh commented Mar 26, 2024

skyzh commented Apr 3, 2024

kelvich commented Jun 18, 2024

hlinnaka commented Jul 1, 2024

kelvich commented Jul 1, 2024