Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid #8484

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

knizhnik
Copy link
Contributor

Problem

If oldestActiveXid is not set in checkpoint and primary is not running, then replica failed to start.
See https://neondb.slack.com/archives/C07E5U1660Z/p1721804204809549

Changes

If oldestActiveXid is invalid and oldestXid (CLOG low boundary) is valid, then restore runningxacts from the beginning of CLOG. Otherwise just assume that only prepared transaction are active.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@knizhnik knizhnik requested review from a team as code owners July 24, 2024 12:47
@knizhnik knizhnik requested review from ololobus and jcsp July 24, 2024 12:47
Copy link

github-actions bot commented Jul 24, 2024

3132 tests run: 3011 passed, 0 failed, 121 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_sharding_split_compaction[compact-shard-ancestors-enqueued]: debug

Postgres 14

  • test_location_conf_churn[3]: release

Code coverage* (full report)

  • functions: 32.9% (7009 of 21310 functions)
  • lines: 50.1% (55790 of 111313 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
6e6b33c at 2024-07-28T12:52:52.069Z :recycle:

@knizhnik knizhnik changed the title Do not wait running-xacts recprd from primary of checkoint doesn't contain oldestActiveXid Do not wait running-xacts record from primary if checkoint doesn't contain oldestActiveXid Jul 26, 2024
@knizhnik knizhnik force-pushed the oldest_active_xid_policy branch from 705438c to 6e6b33c Compare July 26, 2024 09:30
@ololobus ololobus requested a review from hlinnaka July 29, 2024 09:35
else
{
/* Assume that there are no active transactions except prepared */
PrescanPreparedTransactions(xids, nxids);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @knizhnik that he will try to make a repro test based on test_replica_start_with_too_many_unused_xids to exercise the new path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the problem can not be reproduced with test: oldfstActiveXid is assigned when checkpoint record is replayed (online or shutdown). And checkpoint record is written by initdb. So when we just creates database, we already has checkpoint with valid oldestActiveXid.

So how this problem (with invalid oldestActiveXid) can happen at all at prod?
Shutdown checkpoint contains oldestActiveXid=0 (just because there are no active transactions after shutdown.
Now pageserver contains the following logic:

                   // A shutdown checkpoint has `oldestActiveXid == InvalidTransactionid`,
                    // because at shutdown, all in-progress transactions will implicitly
                    // end. Postgres startup code knows that, and allows hot standby to start
                    // immediately from a shutdown checkpoint.
                    //
                    // In Neon, Postgres hot standby startup always behaves as if starting from
                    // an online checkpoint. It needs a valid `oldestActiveXid` value, so
                    // instead of overwriting self.checkpoint.oldestActiveXid with
                    // InvalidTransactionid from the checkpoint WAL record, update it to a
                    // proper value, knowing that there are no in-progress transactions at this
                    // point, except for prepared transactions.
                    //
                    // See also the neon code changes in the InitWalRecovery() function.
                    if xlog_checkpoint.oldestActiveXid == pg_constants::INVALID_TRANSACTION_ID
                        && info == pg_constants::XLOG_CHECKPOINT_SHUTDOWN
                    {
                        let mut oldest_active_xid = self.checkpoint.nextXid.value as u32;
                        for xid in modification.tline.list_twophase_files(lsn, ctx).await? {
                            if (xid.wrapping_sub(oldest_active_xid) as i32) < 0 {
                                oldest_active_xid = xid;
                            }
                        }
                        self.checkpoint.oldestActiveXid = oldest_active_xid;
                    } else {
                        self.checkpoint.oldestActiveXid = xlog_checkpoint.oldestActiveXid;
                    }

But this code was added not so long time ago.
May be last time primary was active (and write shutdown checkpoint) before this change was deployed to prod.
In this case last written checkpoint will contain invalid (0) oldestActiveXid.

In any case, if it happen with one client, it may also happen with other clients.
This is why I think it will be useful to add logic which can handle such situation.

@@ -366,23 +366,38 @@ RestoreRunningXactsFromClog(CheckPoint *checkpoint, TransactionId **xids, int *n
if (!TransactionIdIsValid(checkpoint->oldestActiveXid))
{
elog(LOG, "cannot restore running-xacts from CLOG because oldestActiveXid is not set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this message be under if (running_xacts_overflow_policy == OP_WAIT) now? Otherwise, now we have a path to write this message, but then still try to restore running xacts (if (TransactionIdIsValid(checkpoint->oldestXid)) path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specially move the message before check to register I the go the fact that subxids overflow took place.

@@ -366,23 +366,38 @@ RestoreRunningXactsFromClog(CheckPoint *checkpoint, TransactionId **xids, int *n
if (!TransactionIdIsValid(checkpoint->oldestActiveXid))
{
elog(LOG, "cannot restore running-xacts from CLOG because oldestActiveXid is not set");
goto fail;
}
if (running_xacts_overflow_policy == OP_WAIT)
Copy link
Member

@ololobus ololobus Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on the second thought, I don't understand why we check the running_xacts_overflow_policy here at all. At this moment, we do not know yet whether there is an overflow or not.

Maybe that's valid, but then it's worth a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't want to rename GUC name and variable. And didn't want to introduce yet another GUC.
So I used the same GUC running_xacts_overflow_policy to handle situation when oldestActiveXid is invalid.

@hlinnaka
Copy link
Contributor

I have the same issue with this PR as with the running_xacts_overflow_policy='ignore' setting in general: it is merely papering over the bug. You are choosing to start up in a broken mode that can lead to incorrect query results and corrupted cached pages later. That's a dangerous default, which is why it's not the default.

Why would it OK when oldestActiveXid==0 ? It's just as dangerous in that case AFAICS.

(also, no tests..)

@knizhnik
Copy link
Contributor Author

I have the same issue with this PR as with the running_xacts_overflow_policy='ignore' setting in general: it is merely papering over the bug. You are choosing to start up in a broken mode that can lead to incorrect query results and corrupted cached pages later. That's a dangerous default, which is why it's not the default.

Why would it OK when oldestActiveXid==0 ? It's just as dangerous in that case AFAICS.

(also, no tests..)

The motivation of this PR is the same as with original PR introducing running_xacts_overflow_policy GUC.
I completely agree with you that it is dangerous and error prone behaviour. And I do not suggest to make ignore policy the forever default. Rather it is is just temporary workaround used to address previous issues with replication

The problem with overflow was caused by gaps in XID range due to XID alignment.
I do not know precisely - what can cause oldestActiveXid to be invalid. I just have only hypothesis: primary was switched off for most of the time and so have no chance to write a checkpoint record. And want if was active long ago - there was old version of Pagesrver which doesn't property propagate oldestActiveXid .

Concerning test - as I wrote in comment to the issue, right now the problem can not be reproduced.Checkpoint WAL record is created by initdb, which us used to prepare new database. So activeOldestXid is not null from the very beginning.
May be it is cleared later. But most likely it ws not set by last checkpoint, because primary ws inactive fir a long time and primary was offline.

@kelvich
Copy link
Contributor

kelvich commented Aug 5, 2024

And want if was active long ago - there was old version of Pagesrver which doesn't property propagate oldestActiveXid. Concerning test - as I wrote in comment to the issue, right now the problem can not be reproduced.

WAL for affected databases is there, so should be easy to check what exactly went wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants