-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix init of WAL page header at startup #8914
Conversation
I happened to notice this bug while I was writing a new test case for an issue with prepared transactions on the v17 branch. It turned out to be a pre-existing bug. (There were other issues with prepared transactions on the v17 branch too, though) |
b8698da
to
86a7ced
Compare
@hlinnaka is the target branch the one you want this to be merged into? It seems like a fairly arbitrary branch to me. |
4977 tests run: 4813 passed, 0 failed, 164 skipped (full report)Flaky tests (6)Postgres 17
Postgres 16
Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
caea18d at 2024-09-20T23:57:30.168Z :recycle: |
This doesn't include the new test from that PR, however, because it fails without the corresponding code changes, which we don't have on this branch yet.
86a7ced
to
94d0f00
Compare
Right, it was just because I made those other change first and knew they were going to be merged soon. It's been merged now, so the point is moot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please update the submodules after merging the PG PRs but before merging this one.
0df5cf1
to
07613c3
Compare
This doesn't include the new test from that PR, however, because it fails without the corresponding code changes, which we don't have on this branch yet.
This doesn't include the new test from that PR, however, because it fails without the corresponding code changes, which we don't have on this branch yet.
Leaving in reminder for this here: https://github.com/neondatabase/neon/pull/8916/files#diff-116a0da838509394db2ba0a7c6bec4c72808bb993040feba0bfde1e17fc4a93eR23 |
07613c3
to
8d8c1c7
Compare
Corresponding PRs in the postgres repo: v17: neondatabase/postgres#506 |
Rebased, this is now ready to be merged again. I need approval for the v17 PR: neondatabase/postgres#506 |
86ac9c5
to
a4f35fb
Compare
CI failure on one branch, seems to be flaky https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8914/10957027921/index.html#suites/572721f46de364f1302c76c19e8023c4/c4814c8a998655dd/:
@arssher is this a known issue? |
The test case up to the failure is basically just:
|
I haven't seen exactly this sequence with xlog switch, but it is kind of expected behaviour. CI failure misses first ep logs (it is better to do
So I'd suggest to add sleep/retry until endpoint starts well. Alternatively safekeeper exposes list of walreceivers and we can poll sk until walreceiver on it is gone. |
I was able to reproduce this locally with:
failed after about ~100 iterations
Hmm, isn't this a potential problem in production too? |
Opened #9079 to track that flakiness, since it seems unrelated to this PR. In this PR, to work around this, I'll add a call to wait for the WAL to be flushed before stopping the compute. |
34978e2
to
c42856d
Compare
If the primary is started at an LSN within the first of a 16 MB WAL segment, the "long XLOG page header" at the beginning of the segment was not initialized correctly. That has gone unnnoticed, because under normal circumstances, nothing looks at the page header. The WAL that is streamed to the safekeepers starts at the new record's LSN, not at the beginning of the page, so that bogus page header didn't propagate elsewhere, and a primary server doesn't normally read the WAL its written. Which is good because the contents of the page would be bogus anyway, as it wouldn't contain any of the records before the LSN where the new record is written. Except that in the following cases a primary does read its own WAL: 1. When there are two-phase transactions in prepared state at checkpoint. The checkpointer reads the two-phase state from the XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/. 2. Logical decoding reads the WAL starting from the replication slot's restart LSN. This PR fixes the problem with two-phase transactions. For that, it's sufficient to initialize the page header correctly. The checkpointer only needs to read XLOG_XACT_PREPARE records that were generated after the server startup, so it's still OK that older WAL is missing / bogus. I have not investigated if we have a problem with logical decoding, however. Let's deal with that separately. Special thanks to @Lzjing-1997, who independently found the same bug and opened a PR to fix it, although I did not use that PR.
c42856d
to
437b09a
Compare
I hope I got them right now...
If the primary is started at an LSN within the first of a 16 MB WAL segment, the "long XLOG page header" at the beginning of the segment was not initialized correctly. That has gone unnnoticed, because under normal circumstances, nothing looks at the page header. The WAL that is streamed to the safekeepers starts at the new record's LSN, not at the beginning of the page, so that bogus page header didn't propagate elsewhere, and a primary server doesn't normally read the WAL its written. Which is good because the contents of the page would be bogus anyway, as it wouldn't contain any of the records before the LSN where the new record is written.
Except that in the following cases a primary does read its own WAL:
When there are two-phase transactions in prepared state at checkpoint. The checkpointer reads the two-phase state from the XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/.
Logical decoding reads the WAL starting from the replication slot's restart LSN.
This PR fixes the problem with two-phase transactions. For that, it's sufficient to initialize the page header correctly. The checkpointer only needs to read XLOG_XACT_PREPARE records that were generated after the server startup, so it's still OK that older WAL is missing / bogus.
I have not investigated if we have a problem with logical decoding, however. Let's deal with that separately.