Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix init of WAL page header at startup #481

Merged
merged 1 commit into from
Sep 21, 2024

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Sep 4, 2024

If the primary is started at an LSN within the first of a 16 MB WAL segment, the "long XLOG page header" at the beginning of the segment was not initialized correctly. That has gone unnnoticed, because under normal circumstances, nothing looks at the page header. The WAL that is streamed to the safekeepers starts at the new record's LSN, not at the beginning of the page, so that bogus page header didn't propagate elsewhere, and a primary server doesn't normally read the WAL its written. Which is good because the contents of the page would be bogus anyway, as it wouldn't contain any of the records before the LSN where the new record is written.

Except that in the following cases a primary does read its own WAL:

  1. When there are two-phase transactions in prepared state at checkpoint. The checkpointer reads the two-phase state from the XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/.

  2. Logical decoding reads the WAL starting from the replication slot's restart LSN.

This PR fixes the problem with two-phase transactions. For that, it's sufficient to initialize the page header correctly. The checkpointer only needs to read XLOG_XACT_PREPARE records that were generated after the server startup, so it's still OK that older WAL is missing / bogus.

I have not investigated if we have a problem with logical decoding, however. Let's deal with that separately.

If the primary is started at an LSN within the first of a 16 MB WAL
segment, the "long XLOG page header" at the beginning of the segment
was not initialized correctly. That has gone unnnoticed, because under
normal circumstances, nothing looks at the page header. The WAL that
is streamed to the safekeepers starts at the new record's LSN, not at
the beginning of the page, so that bogus page header didn't propagate
elsewhere, and a primary server doesn't normally read the WAL its
written. Which is good because the contents of the page would be bogus
anyway, as it wouldn't contain any of the records before the LSN where
the new record is written.

Except that in the following cases a primary does read its own WAL:

1. When there are two-phase transactions in prepared state at
   checkpoint.  The checkpointer reads the two-phase state from the
   XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/.

2. Logical decoding reads the WAL starting from the replication slot's
   restart LSN.

This PR fixes the problem with two-phase transactions. For that, it's
sufficient to initialize the page header correctly. The checkpointer
only needs to read XLOG_XACT_PREPARE records that were generated after
the server startup, so it's still OK that older WAL is missing /
bogus.

I have not investigated if we have a problem with logical decoding,
however. Let's deal with that separately.
@hlinnaka hlinnaka force-pushed the fix-twophase-checkpoint-v16 branch from c125544 to 1d7081a Compare September 20, 2024 17:29
@hlinnaka hlinnaka merged commit 1d7081a into REL_16_STABLE_neon Sep 21, 2024
3 checks passed
@hlinnaka hlinnaka deleted the fix-twophase-checkpoint-v16 branch September 21, 2024 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants