Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: make BufferedWriter do double-buffering #9693

Merged
merged 40 commits into from
Dec 4, 2024

Conversation

yliang412
Copy link
Contributor

@yliang412 yliang412 commented Nov 8, 2024

Closes #9387.

Problem

BufferedWriter cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387.

Summary of changes

  • Maintain two owned buffers in BufferedWriter.
  • The writer is in charge of copying the data into owned, aligned buffer, once full, submit it to the flush task.
  • The flush background task is in charge of flushing the owned buffer to disk, and returned the buffer to the writer for reuse.
  • The writer and the flush background task communicate through a bi-directional channel.

For in-memory layer, we also need to be able to read from the buffered writer in get_values_reconstruct_data. To handle this case, we did the following

  • Use replace VirtualFile::write_all with VirtualFile::write_all_at, and use Arc to share it between writer and background task.
  • leverage IoBufferMut::freeze to get a cheaply clonable IoBuffer, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invoke IoBuffer::into_mut, which gives us back the mutable aligned buffer.
  • InMemoryLayer reads is now aware of the maybe_flushed part of the buffer.

Caveat

  • We removed the owned version of write, because this interface does not work well with buffer alignment. The result is that without direct IO enabled, download_object does one more memcpy than before this PR due to the switch to use _borrowed version of the write.
  • "Bypass aligned part of write" could be implemented later to avoid large amount of memcpy.

Testing

  • use an oneshot channel based control mechanism to make flush behavior deterministic in test.
  • test reading from EphemeralFile when the last submitted buffer is not flushed, in-progress, and done flushing to disk.

Performance

We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency.

Results

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Copy link

github-actions bot commented Nov 8, 2024

7018 tests run: 6710 passed, 0 failed, 308 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 30.8% (8306 of 26946 functions)
  • lines: 47.8% (65399 of 136789 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
1da4028 at 2024-12-03T15:31:43.605Z :recycle:

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
@yliang412 yliang412 self-assigned this Nov 11, 2024
yliang412 and others added 7 commits November 11, 2024 21:33
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
…rsion)

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
@yliang412 yliang412 changed the title [WIP] double buffered writer pageserver: make BufferedWriter do double-buffering Nov 12, 2024
@yliang412 yliang412 requested a review from problame November 12, 2024 17:31
@yliang412 yliang412 marked this pull request as ready for review November 12, 2024 17:32
@yliang412 yliang412 requested a review from a team as a code owner November 12, 2024 17:32
@problame
Copy link
Contributor

write_buffered vs write_buffered_borrowed: my gut feeling is that in practice on-demand downloads did benefit from the old behavior where we were able to bypass the buffer (lower CPU usage).

We have that pagebench sub-benchmark for on-demand downloads, you could compare CPU usage before and after this change.

But, might be faster to "just" address this TODO.

Maybe you can be generic over constraints on the buffer type by making the buffer type an associated type of the writer?

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, again on write_buffered / write_bufferd_borrowed.
Let's call it "buffer bypass for aligned parts of the write".

I remembered that with O_DIRECT, we save one memcpy so we can spend one and come out net 0 wrt CPU efficiency.

It would be nice to have a CPU efficiency WIN, though I'm ok with net 0.

The only remaining CPU efficiency difference that I can think of right now is that write_buffered issues one giant write for the entire middle of the buffer, whereas write_buffered issues TAIL_SZ'd writes.


Left a couple of comments that need addressing. Let's discuss major unclarities on Slack.

pageserver/src/tenant/ephemeral_file.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/ephemeral_file.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/ephemeral_file.rs Outdated Show resolved Hide resolved
pageserver/src/virtual_file/owned_buffers_io/write.rs Outdated Show resolved Hide resolved
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
yliang412 and others added 2 commits November 25, 2024 15:25
## Problem

The newly added flush task in
#9693 should hold timeline gate
open, to avoid doing local IO after timeline shutdown completes.

## Solution

Pass timeline gate guard to flush background task. The flush task do not
need cancellation token b/c it will automatically shutdown when the
front writer task drop the channel.

- Refactor relevant paths to pass down `&Gate` instead of `GateGuard`.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
yliang412 and others added 8 commits December 2, 2024 15:58
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
panics if IoBufferMut does not enough capacity left to accomodate the source
buffer.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
consider cases where offset != 0

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see how this works in staging. Preprod deployment later this week.

@yliang412 yliang412 added this pull request to the merge queue Dec 4, 2024
Merged via the queue into main with commit e6cd505 Dec 4, 2024
82 checks passed
@yliang412 yliang412 deleted the yuchen/double-buffered-writer branch December 4, 2024 16:55
github-merge-queue bot pushed a commit that referenced this pull request Dec 5, 2024
## Problem

In #9693, we forgot to check
macos build. The [CI
run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468)
on main showed that macos build failed with unused variables and dead
code.

## Summary of changes

- add `allow(dead_code)` and `allow(unused_variables)` to the relevant
code that is not used on macos.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
awarus pushed a commit that referenced this pull request Dec 5, 2024
Closes #9387.

## Problem

`BufferedWriter` cannot proceed while the owned buffer is flushing to
disk. We want to implement double buffering so that the flush can happen
in the background. See #9387.

## Summary of changes

- Maintain two owned buffers in `BufferedWriter`.
- The writer is in charge of copying the data into owned, aligned
buffer, once full, submit it to the flush task.
- The flush background task is in charge of flushing the owned buffer to
disk, and returned the buffer to the writer for reuse.
- The writer and the flush background task communicate through a
bi-directional channel.

For in-memory layer, we also need to be able to read from the buffered
writer in `get_values_reconstruct_data`. To handle this case, we did the
following
- Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`,
and use `Arc` to share it between writer and background task.
- leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`,
one clone will be submitted to the channel, the other clone will be
saved within the writer to serve reads. When we want to reuse the
buffer, we can invoke `IoBuffer::into_mut`, which gives us back the
mutable aligned buffer.
- InMemoryLayer reads is now aware of the maybe_flushed part of the
buffer.

**Caveat**

- We removed the owned version of write, because this interface does not
work well with buffer alignment. The result is that without direct IO
enabled,
[`download_object`](https://github.com/neondatabase/neon/blob/a439d57050dafd603d24e001215213eb5246a029/pageserver/src/tenant/remote_timeline_client/download.rs#L243)
does one more memcpy than before this PR due to the switch to use
`_borrowed` version of the write.
- "Bypass aligned part of write" could be implemented later to avoid
large amount of memcpy.

**Testing**
- use an oneshot channel based control mechanism to make flush behavior
deterministic in test.
- test reading from `EphemeralFile` when the last submitted buffer is
not flushed, in-progress, and done flushing to disk.


## Performance


We see performance improvement for small values, and regression on big
values, likely due to being CPU bound + disk write latency.


[Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4)


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
awarus pushed a commit that referenced this pull request Dec 5, 2024
## Problem

In #9693, we forgot to check
macos build. The [CI
run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468)
on main showed that macos build failed with unused variables and dead
code.

## Summary of changes

- add `allow(dead_code)` and `allow(unused_variables)` to the relevant
code that is not used on macos.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
@jcsp jcsp assigned problame and unassigned yliang412 Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: use double buffering in BufferedWriter
2 participants