Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAL filtering in safekeeper for sharding #6345

Closed
2 of 11 tasks
petuhovskiy opened this issue Jan 12, 2024 · 1 comment
Closed
2 of 11 tasks

WAL filtering in safekeeper for sharding #6345

petuhovskiy opened this issue Jan 12, 2024 · 1 comment
Assignees
Labels
c/storage/safekeeper Component: storage: safekeeper t/bug Issue Type: Bug

Comments

@petuhovskiy
Copy link
Member

petuhovskiy commented Jan 12, 2024

Arthur's prototype from January: https://github.com/neondatabase/neon/tree/sk-sharding-stream

Precursors

We may start by refactoring pageserver WAL ingest code to make decoding WAL records more independent of Timeline.

Tasks

Preview Give feedback
  1. t/tech_design_rfc
    VladLazar
  2. a/tech_debt c/storage/pageserver
  3. a/tech_debt c/storage/pageserver
    jcsp

Implement splitting on safekeeper

Tasks

Preview Give feedback
  1. 1 of 3
    c/storage c/storage/pageserver t/bug triaged

Optimizations

Tasks

Preview Give feedback
@petuhovskiy petuhovskiy added t/bug Issue Type: Bug c/storage/safekeeper Component: storage: safekeeper labels Jan 12, 2024
@jcsp jcsp self-assigned this Aug 5, 2024
@problame problame self-assigned this Aug 27, 2024
jcsp added a commit that referenced this issue Sep 3, 2024
…8621)

## Problem

Currently, DatadirModification keeps a key-indexed map of all pending
writes, even though we (almost) never need to read back dirty pages for
anything other than metadata pages (e.g. relation sizes).

Related: #6345

## Summary of changes

- commit() modifications before ingesting database creation wal records,
so that they are guaranteed to be able to get() everything they need
directly from the underlying Timeline.
- Split dirty pages in DatadirModification into pending_metadata_pages
and pending_data_pages. The data ones don't need to be in a
key-addressable format, so they just go in a Vec instead.
- Special case handling of zero-page writes in DatadirModification,
putting them in a map which is flushed on the end of a WAL record. This
handles the case where during ingest, we might first write a zero page,
and then ingest a postgres write to that page. We used to do this via
the key-indexed map of writes, but in this PR we change the data page
write path to not bother indexing these by key.

My least favorite thing about this PR is that I needed to change the
DatadirModification interface to add the on_record_end call. This is not
very invasive because there's really only one place we use it, but it
changes the object's behaviour from being clearly an aggregation of many
records to having some per-record state. I could avoid this by
implicitly doing the work when someone calls set_lsn or commit -- I'm
open to opinions on whether that's cleaner or dirtier.

## Performance

There may be some efficiency improvement here, but the primary
motivation is to enable an earlier stage of ingest to operate without
access to a Timeline. The `pending_data_pages` part is the "fast path"
bulk write data that can in principle be generated without a Timeline,
in parallel with other ingest batches, and ultimately on the safekeeper.

`test_bulk_insert` on AX102 shows approximately the same results as in
the previous PR #8591:

```
------------------------------ Benchmark results -------------------------------
test_bulk_insert[neon-release-pg16].insert: 23.577 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB
test_bulk_insert[neon-release-pg16].peak_mem: 637 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 8 
test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 18.264 s
test_bulk_insert[neon-release-pg16].compaction: 0.052 s
```
@VladLazar
Copy link
Contributor

(Accidentally) replaced by #9329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/safekeeper Component: storage: safekeeper t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

4 participants