-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: refactor ingest inplace to decouple decoding and handling #9472
Conversation
5264 tests run: 5050 passed, 0 failed, 214 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
02fbac3 at 2024-10-24T11:37:52.072Z :recycle: |
This one is a bit less obvious than the previous ones. I merged some of the logic that was previously in `WalIngest::ingest_record` to `WalIngest::ingest_xact_record`.
c3f31fa
to
4b155c1
Compare
This is an odd one. It requires the current checkpoint value to decide what to do. That can't trivially be moved to the SK. It's possible with protocol change, but deferring decision for now. Hence, send the raw record and let the pageserver figure it out.
This will give us a nice evolution path when we want to add new actions for a record.
The goal of this commit is to make it clearer when we are ingesting the whole record versus when we are ingesting an action for the record. I also merged the VM bits clearing into one function since they were exactly the same.
4b155c1
to
991a4c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No good tooling to post review comments for individual commits (I reviewed commit by commit).
So, dumping comments here
return Ok(Some(SmgrRecord::Truncate(SmgrTruncate {
rel,
to: truncate.blkno,
})));
}
if (truncate.flags & pg_constants::SMGR_TRUNCATE_FSM) != 0 {
let rel = RelTag {
spcnode,
In the original code, theoretically SMGR_TRUNCATE_HEAP
and SMGR_TRUNCATE_FSM
could both be set, so, it's not technically equivalent to return on the first matching bit. I guess in practice they're exclusive though? => should anyhow::ensure!
that at the top of the function.
let (xact_common, is_commit, is_prepared) = match record {
XactRecord::Prepare(XactPrepare { xl_xid, data }) => {
let xid: u64 = if modification.tline.pg_version >= 17 {
self.adjust_to_full_transaction_id(xl_xid)?
} else {
xl_xid as u64
};
return modification.put_twophase_file(xid, data, ctx).await;
}
Before this PR, this special case was for the twophase PREPARED records, i.e., XLOG_XACT_COMMIT_PREPARED
and XLOG_XACT_ABORT_PREPARED
. This PR moves that handling to the XactRecord::Prepare
, which afaict is the wrong place. Needs to be in the cases for XactRecord::CommitPrepared
and XactRecord::AbortPrepared
.
Please create a follow-up task to address the .unwrap().unwrap()
business that multiple commits added, and link it from the epic & this PR.
If we don't address this, inevitably eventually we'll make stuff fallible and cause a panic.
fn decode_logical_message_record(
buf: &mut Bytes,
decoded: &DecodedWALRecord,
_pg_version: u32,
) -> anyhow::Result<Option<LogicalMessageRecord>> {
let info = decoded.xl_info & pg_constants::XLR_RMGR_INFO_MASK;
if info == pg_constants::XLOG_LOGICAL_MESSAGE {
let xlrec = crate::walrecord::XlLogicalMessage::decode(buf);
let buf_size = xlrec.prefix_size + xlrec.message_size;
// TODO change decode function interface to take ownership of buf
I think it would be more natural, robust, and efficient, to filter the prefix
during decode instead of during ingest, which is what this PR does right now.
It would also allow us to have an enum for the two variants (neon-test
and neon-file:
).
Independent of that discussion, did you check the curren tusers of neon-test
wrt where the failpoint happens?
Like, probably all the tests that use it just expect that they can hold up pageserver walreceiver via the wal-ingest-logical-message-sleep
.
But would make sense to check that / maybe rename the failpoint to pageserver-wal-ingest-...
to make things a bit clearer.
Have you thought through what happens if the decoding sides pg_version (=SK's view of the world) gets out of sync with the ingesting side's pg_version (=PS's view of the world?
Do we want to protect against that by including the pg_version in each of the decoded records?
=> ok to do this later
async fn ingest_heapam_record(
&mut self,
clear_vm_bits: ClearVmBits,
modification: &mut DatadirModification<'_>,
ctx: &RequestContext,
) -> anyhow::Result<()> {
let ClearVmBits {
IMO this should still take a HeapamRecord
to get more type safety.
Even though that enum only has one variant, ClearVmBits
.
You can still destructure it painfree.
Same for ingest_neonrmgr_record
Hmm, the last commit that DRYs the clear VM bits code should be a separate PR IMO.
Can't speak to whether it's a wise choice or not to DRY the code.
Matthias would know, I think.
Thank you for spotting this. I've adjusted the handling to allow for truncation of multiple fork types in the same record in 4580b54.
I don't see it, but perhaps I'm missing something here. Let's work through it.
To me, this looks equiavalent to the old code, but let me know if you're not convinced.
I already have a draft of a follow-follow-up PR where this is fixed, but sure: #9491.
I like the suggestion. Done in 4f8c733.
Renamed the failpoint in 9c7baec.
Subtle breakage. Will work fine for 90% of records until you get to one with PG specific parsing or handling.
I don't see how that's more type safe. Are you alluding to the scenario where we call
See above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this looks equiavalent to the old code, but let me know if you're not convinced.
Ok, I hadn't seen the inlining / DRYing of the body of ingest_xact_record.
Problem
WAL ingest couples decoding of special records with their handling (updates to the storage engine mostly).
This is a roadblock for our plan to move WAL filtering (and implicitly decoding) to safekeepers since they cannot
do writes to the storage engine.
Summary of changes
This PR decouples the decoding of the special WAL records from their application. The changes are done in place
and I've done my best to refrain from refactorings and attempted to preserve the original code as much as possible.
The commit messages flag deviations.
Reviewer Note
You will notice that this is a somewhat transient state. This is intentional. I'd like this review to focus on ensuring
that I've not changed the handling of these special records.
Future PRs will:
SerializedBatch
and a series of special records for handling (types defined in this PR)Related: #9335
Epic: #9329
Checklist before requesting a review
Checklist before merging