Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG16: Fix timeout value used in XLogWaitForReplayOf #544

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

MMeent
Copy link

@MMeent MMeent commented Nov 29, 2024

The previous value assumed usec precision, while the timeout used is in milliseconds, causing replica backends to wait for many hours for WAL replay without the expected progress reports in logs.

This fixes the issue.

Reported-By: Alexander Lakhin exclusion@gmail.com

neondatabase/neon#9937

The previous value assumed usec precision, while the timeout used is in
milliseconds, causing replica backends to wait for many hours for WAL
replay without the expected progress reports in logs.

This fixes the issue.
@MMeent MMeent changed the title Fix timeout value used in XLogWaitForReplayOf PG16: Fix timeout value used in XLogWaitForReplayOf Nov 29, 2024
@MMeent MMeent merged commit 13e9e35 into REL_16_STABLE_neon Nov 29, 2024
3 checks passed
@MMeent MMeent deleted the MMeent/fix/xl-wait-replay-timeout-v16 branch November 29, 2024 14:30
github-merge-queue bot pushed a commit to neondatabase/neon that referenced this pull request Nov 29, 2024
The previous value assumed usec precision, while the timeout used is in
milliseconds, causing replica backends to wait for (potentially) many
hours for WAL replay without the expected progress reports in logs.

This fixes the issue.

Reported-By: Alexander Lakhin <exclusion@gmail.com>

## Problem


neondatabase/postgres#279 (comment)

The timeout value was configured with the assumption the indicated value
would be microseconds, where it's actually milliseconds. That causes the
backend to wait for much longer (2h46m40s) before it emits the "I'm
waiting for recovery" message. While we do have wait events configured
on this, it's not great to have stuck backends without clear logs, so
this fixes the timeout value in all our PostgreSQL branches.

## PG PRs

* PG14: neondatabase/postgres#542
* PG15: neondatabase/postgres#543
* PG16: neondatabase/postgres#544
* PG17: neondatabase/postgres#545
awarus pushed a commit to neondatabase/neon that referenced this pull request Dec 5, 2024
The previous value assumed usec precision, while the timeout used is in
milliseconds, causing replica backends to wait for (potentially) many
hours for WAL replay without the expected progress reports in logs.

This fixes the issue.

Reported-By: Alexander Lakhin <exclusion@gmail.com>

## Problem


neondatabase/postgres#279 (comment)

The timeout value was configured with the assumption the indicated value
would be microseconds, where it's actually milliseconds. That causes the
backend to wait for much longer (2h46m40s) before it emits the "I'm
waiting for recovery" message. While we do have wait events configured
on this, it's not great to have stuck backends without clear logs, so
this fixes the timeout value in all our PostgreSQL branches.

## PG PRs

* PG14: neondatabase/postgres#542
* PG15: neondatabase/postgres#543
* PG16: neondatabase/postgres#544
* PG17: neondatabase/postgres#545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants