Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: refactor ingest inplace to decouple decoding and handling #9472

Merged
merged 20 commits into from
Oct 24, 2024

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Oct 21, 2024

Problem

WAL ingest couples decoding of special records with their handling (updates to the storage engine mostly).
This is a roadblock for our plan to move WAL filtering (and implicitly decoding) to safekeepers since they cannot
do writes to the storage engine.

Summary of changes

This PR decouples the decoding of the special WAL records from their application. The changes are done in place
and I've done my best to refrain from refactorings and attempted to preserve the original code as much as possible.
The commit messages flag deviations.

Reviewer Note

You will notice that this is a somewhat transient state. This is intentional. I'd like this review to focus on ensuring
that I've not changed the handling of these special records.

Future PRs will:

  • Move all the new types and decoding logic into a crate of its own
  • Create separate instances for these types for each supported pg version
  • Further refactor ingest to take a SerializedBatch and a series of special records for handling (types defined in this PR)

Related: #9335
Epic: #9329

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Oct 21, 2024

5264 tests run: 5050 passed, 0 failed, 214 skipped (full report)


Flaky tests (1)

Postgres 15

Code coverage* (full report)

  • functions: 31.4% (7671 of 24445 functions)
  • lines: 48.8% (60374 of 123675 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
02fbac3 at 2024-10-24T11:37:52.072Z :recycle:

This one is a bit less obvious than the previous ones.

I merged some of the logic that was previously in
`WalIngest::ingest_record` to `WalIngest::ingest_xact_record`.
@VladLazar VladLazar force-pushed the vlad/refactor-ingest branch from c3f31fa to 4b155c1 Compare October 22, 2024 09:46
This is an odd one. It requires the current checkpoint value to decide
what to do. That can't trivially be moved to the SK. It's possible with
protocol change, but deferring decision for now. Hence, send the raw
record and let the pageserver figure it out.
This will give us a nice evolution path when we want to add
new actions for a record.
The goal of this commit is to make it clearer when we are ingesting
the whole record versus when we are ingesting an action for the record.

I also merged the VM bits clearing into one function since they were
exactly the same.
@VladLazar VladLazar force-pushed the vlad/refactor-ingest branch from 4b155c1 to 991a4c0 Compare October 22, 2024 10:37
@VladLazar VladLazar changed the title Vlad/refactor ingest pageserver: refactor ingest inplace to decouple decoding and handling Oct 22, 2024
@VladLazar VladLazar marked this pull request as ready for review October 22, 2024 10:47
@VladLazar VladLazar requested a review from a team as a code owner October 22, 2024 10:47
@VladLazar VladLazar requested review from arpad-m and problame and removed request for arpad-m October 22, 2024 10:47
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good tooling to post review comments for individual commits (I reviewed commit by commit).
So, dumping comments here


         return Ok(Some(SmgrRecord::Truncate(SmgrTruncate {
                    rel,
                    to: truncate.blkno,
                })));
            }

            if (truncate.flags & pg_constants::SMGR_TRUNCATE_FSM) != 0 {
                let rel = RelTag {
                    spcnode,

In the original code, theoretically SMGR_TRUNCATE_HEAP and SMGR_TRUNCATE_FSM could both be set, so, it's not technically equivalent to return on the first matching bit. I guess in practice they're exclusive though? => should anyhow::ensure! that at the top of the function.


   let (xact_common, is_commit, is_prepared) = match record {
            XactRecord::Prepare(XactPrepare { xl_xid, data }) => {
                let xid: u64 = if modification.tline.pg_version >= 17 {
                    self.adjust_to_full_transaction_id(xl_xid)?
                } else {
                    xl_xid as u64
                };
                return modification.put_twophase_file(xid, data, ctx).await;
            }

Before this PR, this special case was for the twophase PREPARED records, i.e., XLOG_XACT_COMMIT_PREPARED and XLOG_XACT_ABORT_PREPARED. This PR moves that handling to the XactRecord::Prepare, which afaict is the wrong place. Needs to be in the cases for XactRecord::CommitPrepared and XactRecord::AbortPrepared.


Please create a follow-up task to address the .unwrap().unwrap() business that multiple commits added, and link it from the epic & this PR.
If we don't address this, inevitably eventually we'll make stuff fallible and cause a panic.



    fn decode_logical_message_record(
        buf: &mut Bytes,
        decoded: &DecodedWALRecord,
        _pg_version: u32,
    ) -> anyhow::Result<Option<LogicalMessageRecord>> {
        let info = decoded.xl_info & pg_constants::XLR_RMGR_INFO_MASK;
        if info == pg_constants::XLOG_LOGICAL_MESSAGE {
            let xlrec = crate::walrecord::XlLogicalMessage::decode(buf);
            let buf_size = xlrec.prefix_size + xlrec.message_size;
            // TODO change decode function interface to take ownership of buf

I think it would be more natural, robust, and efficient, to filter the prefix during decode instead of during ingest, which is what this PR does right now.
It would also allow us to have an enum for the two variants (neon-test and neon-file:).

Independent of that discussion, did you check the curren tusers of neon-test wrt where the failpoint happens?
Like, probably all the tests that use it just expect that they can hold up pageserver walreceiver via the wal-ingest-logical-message-sleep .
But would make sense to check that / maybe rename the failpoint to pageserver-wal-ingest-... to make things a bit clearer.


Have you thought through what happens if the decoding sides pg_version (=SK's view of the world) gets out of sync with the ingesting side's pg_version (=PS's view of the world?
Do we want to protect against that by including the pg_version in each of the decoded records?
=> ok to do this later


 async fn ingest_heapam_record(
        &mut self,
        clear_vm_bits: ClearVmBits,
        modification: &mut DatadirModification<'_>,
        ctx: &RequestContext,
    ) -> anyhow::Result<()> {
        let ClearVmBits {

IMO this should still take a HeapamRecord to get more type safety.
Even though that enum only has one variant, ClearVmBits.
You can still destructure it painfree.

Same for ingest_neonrmgr_record


Hmm, the last commit that DRYs the clear VM bits code should be a separate PR IMO.
Can't speak to whether it's a wise choice or not to DRY the code.
Matthias would know, I think.

@jcsp jcsp requested a review from skyzh October 23, 2024 14:24
@VladLazar
Copy link
Contributor Author

         return Ok(Some(SmgrRecord::Truncate(SmgrTruncate {
                    rel,
                    to: truncate.blkno,
                })));
            }

            if (truncate.flags & pg_constants::SMGR_TRUNCATE_FSM) != 0 {
                let rel = RelTag {
                    spcnode,

In the original code, theoretically SMGR_TRUNCATE_HEAP and SMGR_TRUNCATE_FSM could both be set, so, it's not technically equivalent to return on the first matching bit. I guess in practice they're exclusive though? => should anyhow::ensure! that at the top of the function.

Thank you for spotting this. I've adjusted the handling to allow for truncation of multiple fork types in the same record in 4580b54.

   let (xact_common, is_commit, is_prepared) = match record {
            XactRecord::Prepare(XactPrepare { xl_xid, data }) => {
                let xid: u64 = if modification.tline.pg_version >= 17 {
                    self.adjust_to_full_transaction_id(xl_xid)?
                } else {
                    xl_xid as u64
                };
                return modification.put_twophase_file(xid, data, ctx).await;
            }

Before this PR, this special case was for the twophase PREPARED records, i.e., XLOG_XACT_COMMIT_PREPARED and XLOG_XACT_ABORT_PREPARED. This PR moves that handling to the XactRecord::Prepare, which afaict is the wrong place. Needs to be in the cases for XactRecord::CommitPrepared and XactRecord::AbortPrepared.

I don't see it, but perhaps I'm missing something here. Let's work through it.

  1. XLOG_XACT_PREPARE
    a. Previous code: It was handled here by: adjusting the transaction id and calling put_twophase_file.
    b. New code: It is now handled here with exactly the same code as previously
  2. XLOG_XACT_ABORT_PREPARED or XLOG_XACT_COMMIT_PREPARED
    a. Previous code: It was handled here by: ingesting the xact record, adjusting xid and, finally, dropping twophase files
    b. New code: It is now handled here with is_prepared=true. It's hard to tell from the diff but the code block linked
    previously merges two pre-existint code blocks: previous body of ingest_xact_record and epilogue of the old handling which does the xid adjustment and dropping of two phase files. In the new code, we do xid adjustment and dropping of two phase files at the end of ingest_xact_record for XLOG_XACT_ABORT_PREPARED or XLOG_XACT_COMMIT_PREPARED.

To me, this looks equiavalent to the old code, but let me know if you're not convinced.

Please create a follow-up task to address the .unwrap().unwrap() business that multiple commits added, and link it from the epic & this PR. If we don't address this, inevitably eventually we'll make stuff fallible and cause a panic.

I already have a draft of a follow-follow-up PR where this is fixed, but sure: #9491.


    fn decode_logical_message_record(
        buf: &mut Bytes,
        decoded: &DecodedWALRecord,
        _pg_version: u32,
    ) -> anyhow::Result<Option<LogicalMessageRecord>> {
        let info = decoded.xl_info & pg_constants::XLR_RMGR_INFO_MASK;
        if info == pg_constants::XLOG_LOGICAL_MESSAGE {
            let xlrec = crate::walrecord::XlLogicalMessage::decode(buf);
            let buf_size = xlrec.prefix_size + xlrec.message_size;
            // TODO change decode function interface to take ownership of buf

I think it would be more natural, robust, and efficient, to filter the prefix during decode instead of during ingest, which is what this PR does right now. It would also allow us to have an enum for the two variants (neon-test and neon-file:).

I like the suggestion. Done in 4f8c733.

Independent of that discussion, did you check the curren tusers of neon-test wrt where the failpoint happens? Like, probably all the tests that use it just expect that they can hold up pageserver walreceiver via the wal-ingest-logical-message-sleep . But would make sense to check that / maybe rename the failpoint to pageserver-wal-ingest-... to make things a bit clearer.

Renamed the failpoint in 9c7baec.

Have you thought through what happens if the decoding sides pg_version (=SK's view of the world) gets out of sync with the ingesting side's pg_version (=PS's view of the world? Do we want to protect against that by including the pg_version in each of the decoded records? => ok to do this later

Subtle breakage. Will work fine for 90% of records until you get to one with PG specific parsing or handling.
If we detect that (SK can include it's own pg version for the timeline), we should kill the wal connection and drop our current in-memory file (like any other error on the ingest path).
This is not really consistent right now, so will have to be thought through more and tested.

 async fn ingest_heapam_record(
        &mut self,
        clear_vm_bits: ClearVmBits,
        modification: &mut DatadirModification<'_>,
        ctx: &RequestContext,
    ) -> anyhow::Result<()> {
        let ClearVmBits {

IMO this should still take a HeapamRecord to get more type safety. Even though that enum only has one variant, ClearVmBits. You can still destructure it painfree.

Same for ingest_neonrmgr_record

I don't see how that's more type safe. Are you alluding to the scenario where we call ingest_heapam_record from the neonrmgr handling? That's why I want to keep the DRI
change for ClearVmBits.

Hmm, the last commit that DRYs the clear VM bits code should be a separate PR IMO. Can't speak to whether it's a wise choice or not to DRY the code. Matthias would know, I think.

See above.

@VladLazar VladLazar requested a review from problame October 23, 2024 17:21
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this looks equiavalent to the old code, but let me know if you're not convinced.

Ok, I hadn't seen the inlining / DRYing of the body of ingest_xact_record.

@VladLazar VladLazar merged commit 5069123 into main Oct 24, 2024
81 checks passed
@VladLazar VladLazar deleted the vlad/refactor-ingest branch October 24, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants