Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid #8484

knizhnik · 2024-07-24T12:47:16Z

Problem

If oldestActiveXid is not set in checkpoint and primary is not running, then replica failed to start.
See https://neondb.slack.com/archives/C07E5U1660Z/p1721804204809549

Changes

If oldestActiveXid is invalid and oldestXid (CLOG low boundary) is valid, then restore runningxacts from the beginning of CLOG. Otherwise just assume that only prepared transaction are active.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2024-07-24T12:52:50Z

3132 tests run: 3011 passed, 0 failed, 121 skipped (full report)

Flaky tests (2)

Postgres 16

test_sharding_split_compaction[compact-shard-ancestors-enqueued]: debug

Postgres 14

test_location_conf_churn[3]: release

Code coverage* (full report)

functions: 32.9% (7009 of 21310 functions)
lines: 50.1% (55790 of 111313 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
6e6b33c at 2024-07-28T12:52:52.069Z :recycle:}

…ntain oldestActiveXid

ololobus · 2024-07-29T16:06:29Z

pgxn/neon/neon.c

+		else
+		{
+			/* Assume that there are no active transactions except prepared */
+			PrescanPreparedTransactions(xids, nxids);


Discussed with @knizhnik that he will try to make a repro test based on test_replica_start_with_too_many_unused_xids to exercise the new path

Right now the problem can not be reproduced with test: oldfstActiveXid is assigned when checkpoint record is replayed (online or shutdown). And checkpoint record is written by initdb. So when we just creates database, we already has checkpoint with valid oldestActiveXid.

So how this problem (with invalid oldestActiveXid) can happen at all at prod?
Shutdown checkpoint contains oldestActiveXid=0 (just because there are no active transactions after shutdown.
Now pageserver contains the following logic:

// A shutdown checkpoint has `oldestActiveXid == InvalidTransactionid`, // because at shutdown, all in-progress transactions will implicitly // end. Postgres startup code knows that, and allows hot standby to start // immediately from a shutdown checkpoint. // // In Neon, Postgres hot standby startup always behaves as if starting from // an online checkpoint. It needs a valid `oldestActiveXid` value, so // instead of overwriting self.checkpoint.oldestActiveXid with // InvalidTransactionid from the checkpoint WAL record, update it to a // proper value, knowing that there are no in-progress transactions at this // point, except for prepared transactions. // // See also the neon code changes in the InitWalRecovery() function. if xlog_checkpoint.oldestActiveXid == pg_constants::INVALID_TRANSACTION_ID && info == pg_constants::XLOG_CHECKPOINT_SHUTDOWN { let mut oldest_active_xid = self.checkpoint.nextXid.value as u32; for xid in modification.tline.list_twophase_files(lsn, ctx).await? { if (xid.wrapping_sub(oldest_active_xid) as i32) < 0 { oldest_active_xid = xid; } } self.checkpoint.oldestActiveXid = oldest_active_xid; } else { self.checkpoint.oldestActiveXid = xlog_checkpoint.oldestActiveXid; }

But this code was added not so long time ago.
May be last time primary was active (and write shutdown checkpoint) before this change was deployed to prod.
In this case last written checkpoint will contain invalid (0) oldestActiveXid.

In any case, if it happen with one client, it may also happen with other clients.
This is why I think it will be useful to add logic which can handle such situation.

ololobus · 2024-07-29T16:09:28Z

pgxn/neon/neon.c

@@ -366,23 +366,38 @@ RestoreRunningXactsFromClog(CheckPoint *checkpoint, TransactionId **xids, int *n
 	if (!TransactionIdIsValid(checkpoint->oldestActiveXid))
 	{
 		elog(LOG, "cannot restore running-xacts from CLOG because oldestActiveXid is not set");


Shouldn't this message be under if (running_xacts_overflow_policy == OP_WAIT) now? Otherwise, now we have a path to write this message, but then still try to restore running xacts (if (TransactionIdIsValid(checkpoint->oldestXid)) path)

I specially move the message before check to register I the go the fact that subxids overflow took place.

ololobus · 2024-07-29T16:12:11Z

pgxn/neon/neon.c

@@ -366,23 +366,38 @@ RestoreRunningXactsFromClog(CheckPoint *checkpoint, TransactionId **xids, int *n
 	if (!TransactionIdIsValid(checkpoint->oldestActiveXid))
 	{
 		elog(LOG, "cannot restore running-xacts from CLOG because oldestActiveXid is not set");
-		goto fail;
-	}
+		if (running_xacts_overflow_policy == OP_WAIT)


Hmm, on the second thought, I don't understand why we check the running_xacts_overflow_policy here at all. At this moment, we do not know yet whether there is an overflow or not.

Maybe that's valid, but then it's worth a comment

Sorry, I didn't want to rename GUC name and variable. And didn't want to introduce yet another GUC.
So I used the same GUC running_xacts_overflow_policy to handle situation when oldestActiveXid is invalid.

hlinnaka · 2024-07-29T22:01:43Z

I have the same issue with this PR as with the running_xacts_overflow_policy='ignore' setting in general: it is merely papering over the bug. You are choosing to start up in a broken mode that can lead to incorrect query results and corrupted cached pages later. That's a dangerous default, which is why it's not the default.

Why would it OK when oldestActiveXid==0 ? It's just as dangerous in that case AFAICS.

(also, no tests..)

knizhnik · 2024-07-30T13:31:24Z

I have the same issue with this PR as with the running_xacts_overflow_policy='ignore' setting in general: it is merely papering over the bug. You are choosing to start up in a broken mode that can lead to incorrect query results and corrupted cached pages later. That's a dangerous default, which is why it's not the default.

Why would it OK when oldestActiveXid==0 ? It's just as dangerous in that case AFAICS.

(also, no tests..)

The motivation of this PR is the same as with original PR introducing running_xacts_overflow_policy GUC.
I completely agree with you that it is dangerous and error prone behaviour. And I do not suggest to make ignore policy the forever default. Rather it is is just temporary workaround used to address previous issues with replication

The problem with overflow was caused by gaps in XID range due to XID alignment.
I do not know precisely - what can cause oldestActiveXid to be invalid. I just have only hypothesis: primary was switched off for most of the time and so have no chance to write a checkpoint record. And want if was active long ago - there was old version of Pagesrver which doesn't property propagate oldestActiveXid .

Concerning test - as I wrote in comment to the issue, right now the problem can not be reproduced.Checkpoint WAL record is created by initdb, which us used to prepare new database. So activeOldestXid is not null from the very beginning.
May be it is cleared later. But most likely it ws not set by last checkpoint, because primary ws inactive fir a long time and primary was offline.

kelvich · 2024-08-05T12:02:08Z

And want if was active long ago - there was old version of Pagesrver which doesn't property propagate oldestActiveXid. Concerning test - as I wrote in comment to the issue, right now the problem can not be reproduced.

WAL for affected databases is there, so should be easy to check what exactly went wrong

knizhnik requested review from a team as code owners July 24, 2024 12:47

knizhnik requested review from ololobus and jcsp July 24, 2024 12:47

knizhnik changed the title ~~Do not wait running-xacts recprd from primary of checkoint doesn't contain oldestActiveXid~~ Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid Jul 26, 2024

Do not wait running-xacts recprd from primary of checkoint doesn't co…

6e6b33c

…ntain oldestActiveXid

knizhnik force-pushed the oldest_active_xid_policy branch from 705438c to 6e6b33c Compare July 26, 2024 09:30

ololobus requested a review from hlinnaka July 29, 2024 09:35

ololobus reviewed Jul 29, 2024

View reviewed changes

ololobus mentioned this pull request Jul 30, 2024

Epic: stabilize physical replication #6211

Open

problame mentioned this pull request Sep 6, 2024

Specialize WalIngest on PostgreSQL version #8904

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid #8484

Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid #8484

knizhnik commented Jul 24, 2024

github-actions bot commented Jul 24, 2024 •

edited

Loading

Postgres 16

Postgres 14

ololobus Jul 29, 2024

knizhnik Jul 29, 2024

ololobus Jul 29, 2024

knizhnik Jul 29, 2024

ololobus Jul 29, 2024 •

edited

Loading

knizhnik Jul 29, 2024

hlinnaka commented Jul 29, 2024

knizhnik commented Jul 30, 2024

kelvich commented Aug 5, 2024

Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid #8484

Are you sure you want to change the base?

Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid #8484

Conversation

knizhnik commented Jul 24, 2024

Problem

Changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Jul 24, 2024 • edited Loading

3132 tests run: 3011 passed, 0 failed, 121 skipped (full report)

Postgres 16

Postgres 14

Code coverage* (full report)

ololobus Jul 29, 2024

Choose a reason for hiding this comment

knizhnik Jul 29, 2024

Choose a reason for hiding this comment

ololobus Jul 29, 2024

Choose a reason for hiding this comment

knizhnik Jul 29, 2024

Choose a reason for hiding this comment

ololobus Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

knizhnik Jul 29, 2024

Choose a reason for hiding this comment

hlinnaka commented Jul 29, 2024

knizhnik commented Jul 30, 2024

kelvich commented Aug 5, 2024

github-actions bot commented Jul 24, 2024 •

edited

Loading

ololobus Jul 29, 2024 •

edited

Loading