Fix issues with prefetch ring buffer resize #9847

knizhnik · 2024-11-22T08:47:50Z

Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479

We observe the following error in the logs

[XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128

most likely caused by changing neon.readahead_buffer_size

Summary of changes

Copy shard state
Do not use prefetch_set_unused in readahead_buffer_resize
Change prefetch buffer overflow criteria

github-actions · 2024-11-22T09:32:59Z

5625 tests run: 5389 passed, 0 failed, 236 skipped (full report)

Flaky tests (1)

Postgres 14

test_pull_timeline[True]: release-arm64

Code coverage* (full report)

functions: 31.0% (7973 of 25720 functions)
lines: 48.8% (63289 of 129701 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
85bc9b9 at 2024-11-23T09:35:02.523Z :recycle:}

hlinnaka

This patch looks correct to me. See my comment on the "Change prefetch buffer overflow criteria" part though.

readahead_buffer_resize() is called from the GUC's assign hook. If readahead_buffer_resize() throws an error for any reason (OOM, network error etc.), that could be a problem. The GUC is marked as PGC_USERSET, so its value might need to be changed e.g on transaction abort, or by RESET ALL, and it would be unpleasant if that would fail. But that's not new in this PR.

In general it would feel less error-prone if we'd just throw away all prefetched state, disconnect all connections, and start from scratch.

hlinnaka · 2024-11-25T09:36:07Z

pgxn/neon/pagestore_smgr.c

-		if (MyPState->ring_last + readahead_buffer_size - 1 == MyPState->ring_unused)
+		if (MyPState->ring_last + readahead_buffer_size == MyPState->ring_unused)


This is very subtle. Is this related to resizing, or an unrelated improvement? I'd suggest opening a separate PR.

It is actually the main ;problem fixed by this PR.
This check assumes that we use not more than readahead_buffer_size - 1 elements in prefetch buffer.
But readahead_buffer_resize can fill completely (all readahead_buffer_size elements). In this case this check will Neve fired and we will overwrite old entries without freeing them.

hlinnaka · 2024-11-25T09:50:02Z

Do not use prefetch_set_unused in readahead_buffer_resize

Looks correct. AFAICS prefetch_set_unused() worked too, though. This is just an optimization to avoid unnecessary work on the old prefetching queue that we are about to throw away anyway, right? Maybe add a comment on that.

knizhnik · 2024-11-25T11:53:02Z

Do not use prefetch_set_unused in readahead_buffer_resize

Looks correct. AFAICS prefetch_set_unused() worked too, though. This is just an optimization to avoid unnecessary work on the old prefetching queue that we are about to throw away anyway, right? Maybe add a comment on that.

I was not 100% sure about use of prefetch_set_unused in this case.
The problem is that it can perform compaction, i.e. move used elements right, collapsing unused hole.
How it will interfere with loop through the ring buffer in prefetch_set_unused?
Can it result in some memory leaks (when some responses are not deallocated)?
In any case, the only reason of calling prefetch_set_unused() in this case was to deallocate responses. All other actions performed by prefetch_set_unused() are useless, because we are going to destroy old ring. Optimizations are unlikely to be important here because ring buffer is very rarely resized. So I rather consider this change as simplification.

Fix issues with prefetch ring buffer resize

d14b85a

knizhnik requested review from a team as code owners November 22, 2024 08:47

knizhnik requested review from myrrc and VladLazar November 22, 2024 08:47

Konstantin Knizhnik and others added 2 commits November 22, 2024 16:42

Fix assert

085f462

Add test for prefetch buffer resize

3e9927c

knizhnik requested a review from MMeent November 22, 2024 19:54

Disable statement timeoutfor test_prefetch_buffer_resize

5e4ee81

aswin-devil approved these changes Nov 23, 2024

View reviewed changes

Increase test timeout

85bc9b9

hlinnaka approved these changes Nov 25, 2024

View reviewed changes

ololobus mentioned this pull request Nov 26, 2024

Epic: stabilize Postgres prefetch #9893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with prefetch ring buffer resize #9847

Fix issues with prefetch ring buffer resize #9847

knizhnik commented Nov 22, 2024

github-actions bot commented Nov 22, 2024 •

edited

Loading

Postgres 14

hlinnaka left a comment

hlinnaka Nov 25, 2024

knizhnik Nov 25, 2024

hlinnaka commented Nov 25, 2024

knizhnik commented Nov 25, 2024

		if (MyPState->ring_last + readahead_buffer_size - 1 == MyPState->ring_unused)
		if (MyPState->ring_last + readahead_buffer_size == MyPState->ring_unused)

Fix issues with prefetch ring buffer resize #9847

Are you sure you want to change the base?

Fix issues with prefetch ring buffer resize #9847

Conversation

knizhnik commented Nov 22, 2024

Problem

Summary of changes

github-actions bot commented Nov 22, 2024 • edited Loading

5625 tests run: 5389 passed, 0 failed, 236 skipped (full report)

Postgres 14

Code coverage* (full report)

hlinnaka left a comment

Choose a reason for hiding this comment

hlinnaka Nov 25, 2024

Choose a reason for hiding this comment

knizhnik Nov 25, 2024

Choose a reason for hiding this comment

hlinnaka commented Nov 25, 2024

knizhnik commented Nov 25, 2024

github-actions bot commented Nov 22, 2024 •

edited

Loading