Skip to content

Commit

Permalink
Fix init of WAL page header at startup
Browse files Browse the repository at this point in the history
If the primary is started at an LSN within the first of a 16 MB WAL
segment, the "long XLOG page header" at the beginning of the segment
was not initialized correctly. That has gone unnnoticed, because under
normal circumstances, nothing looks at the page header. The WAL that
is streamed to the safekeepers starts at the new record's LSN, not at
the beginning of the page, so that bogus page header didn't propagate
elsewhere, and a primary server doesn't normally read the WAL its
written. Which is good because the contents of the page would be bogus
anyway, as it wouldn't contain any of the records before the LSN where
the new record is written.

Except that in the following cases a primary does read its own WAL:

1. When there are two-phase transactions in prepared state at
   checkpoint.  The checkpointer reads the two-phase state from the
   XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/.

2. Logical decoding reads the WAL starting from the replication slot's
   restart LSN.

This PR fixes the problem with two-phase transactions. For that, it's
sufficient to initialize the page header correctly. The checkpointer
only needs to read XLOG_XACT_PREPARE records that were generated after
the server startup, so it's still OK that older WAL is missing /
bogus.

I have not investigated if we have a problem with logical decoding,
however. Let's deal with that separately.
  • Loading branch information
hlinnaka committed Sep 20, 2024
1 parent 3ec6e24 commit c125544
Showing 1 changed file with 19 additions and 7 deletions.
26 changes: 19 additions & 7 deletions src/backend/access/transam/xlogrecovery.c
Original file line number Diff line number Diff line change
Expand Up @@ -1650,25 +1650,37 @@ FinishWalRecovery(void)
}
else
{
int offs = endOfLog % XLOG_BLCKSZ;
char *page = palloc0(offs);
XLogRecPtr pageBeginPtr = endOfLog - offs;
int lastPageSize = ((pageBeginPtr % wal_segment_size) == 0) ? SizeOfXLogLongPHD : SizeOfXLogShortPHD;

XLogPageHeader xlogPageHdr = (XLogPageHeader) (page);
int offs = endOfLog % XLOG_BLCKSZ;
XLogRecPtr pageBeginPtr = endOfLog - offs;
bool isLongHeader = (pageBeginPtr % wal_segment_size) == 0;
int lastPageSize = isLongHeader ? SizeOfXLogLongPHD : SizeOfXLogShortPHD;
char *page = palloc0(offs);
XLogPageHeader xlogPageHdr = (XLogPageHeader) page;

xlogPageHdr->xlp_pageaddr = pageBeginPtr;
xlogPageHdr->xlp_magic = XLOG_PAGE_MAGIC;
xlogPageHdr->xlp_tli = recoveryTargetTLI;
xlogPageHdr->xlp_info = 0;
/*
* If we start writing with offset from page beginning, pretend in
* page header there is a record ending where actual data will
* start.
*/
xlogPageHdr->xlp_rem_len = offs - lastPageSize;
xlogPageHdr->xlp_info = (xlogPageHdr->xlp_rem_len > 0) ? XLP_FIRST_IS_CONTRECORD : 0;
if (xlogPageHdr->xlp_rem_len > 0)
xlogPageHdr->xlp_info |= XLP_FIRST_IS_CONTRECORD;
readOff = XLogSegmentOffset(pageBeginPtr, wal_segment_size);

if (isLongHeader)
{
XLogLongPageHeader longHdr = (XLogLongPageHeader) page;

longHdr->xlp_sysid = GetSystemIdentifier();
longHdr->xlp_seg_size = wal_segment_size;
longHdr->xlp_xlog_blcksz = XLOG_BLCKSZ;

xlogPageHdr->xlp_info |= XLP_LONG_HEADER;
}
result->lastPageBeginPtr = pageBeginPtr;
result->lastPage = page;
elog(LOG, "Continue writing WAL at %X/%X", LSN_FORMAT_ARGS(xlogreader->EndRecPtr));
Expand Down

0 comments on commit c125544

Please sign in to comment.