Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_ctl: Use 'fast' shutdown for Postgres termination #8289

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

ololobus
Copy link
Member

@ololobus ololobus commented Jul 5, 2024

Problem

We currently use 'immediate' mode in the most commonly used shutdown path, when the control plane calls a compute_ctl API to terminate Postgres inside compute without waiting for the actual pod / VM termination. Yet, 'immediate' shutdown doesn't create a shutdown checkpoint and ROs have bad times figuring out the list of running xacts during next start.

Summary of changes

Use 'fast' mode, which creates a shutdown checkpoint that is important for ROs to get a list of running xacts faster instead of going through the CLOG. On the control plane side, we poll this compute_ctl termination API for 10s, it should be enough as we don't really write any data at checkpoint time. If it times out, we anyway switch to the slow k8s-based termination.

See https://www.postgresql.org/docs/current/server-shutdown.html for the list of modes and signals.

The default VM shutdown hook already uses fast mode, see [1]

[1]

neon/vm-image-spec.yaml

Lines 30 to 31 in c9fd8d7

shutdownHook: |
su -p postgres --session-command '/usr/local/bin/pg_ctl stop -D /var/db/postgres/compute/pgdata -m fast --wait -t 10'

Related to #6211

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

It also creates a shutdown checkpoint, which is important for
ROs to get a list of running xacts faster instead of going through
the CLOG.

See https://www.postgresql.org/docs/current/server-shutdown.html
for the list of modes and signals.

Related to #6211
@ololobus ololobus requested a review from hlinnaka July 5, 2024 16:20
@ololobus ololobus requested review from a team as code owners July 5, 2024 16:20
@ololobus ololobus requested a review from nikitakalyanov July 5, 2024 16:20
Copy link

github-actions bot commented Jul 5, 2024

3067 tests run: 2952 passed, 0 failed, 115 skipped (full report)


Flaky tests (4)

Postgres 14

  • test_pg_regress[None]: debug
  • test_tenant_creation_fails: debug
  • test_pageserver_lsn_wait_error_start: release
  • test_pageserver_lsn_wait_error_safekeeper_stop: debug

Code coverage* (full report)

  • functions: 32.6% (6933 of 21275 functions)
  • lines: 50.0% (54487 of 108970 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
a245fe5 at 2024-07-08T12:18:29.254Z :recycle:

@ololobus ololobus merged commit 84b039e into main Jul 8, 2024
66 checks passed
@ololobus ololobus deleted the alexk/compute-fast-shutdown branch July 8, 2024 17:54
skyzh pushed a commit that referenced this pull request Jul 15, 2024
## Problem

We currently use 'immediate' mode in the most commonly used shutdown
path, when the control plane calls a `compute_ctl` API to terminate
Postgres inside compute without waiting for the actual pod / VM
termination. Yet, 'immediate' shutdown doesn't create a shutdown
checkpoint and ROs have bad times figuring out the list of running xacts
during next start.

## Summary of changes

Use 'fast' mode, which creates a shutdown checkpoint that is important
for ROs to get a list of running xacts faster instead of going through
the CLOG. On the control plane side, we poll this `compute_ctl`
termination API for 10s, it should be enough as we don't really write
any data at checkpoint time. If it times out, we anyway switch to the
slow k8s-based termination.

See https://www.postgresql.org/docs/current/server-shutdown.html for the
list of modes and signals.

The default VM shutdown hook already uses `fast` mode, see [1]

[1]
https://github.com/neondatabase/neon/blob/c9fd8d76937c2031fd4fea1cdf661d6cf4f00dc3/vm-image-spec.yaml#L30-L31

Related to #6211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants