Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: track pageserver availability zone #8852

Merged
merged 3 commits into from
Aug 28, 2024
Merged

Conversation

VladLazar
Copy link
Contributor

Problem

In order to build AZ aware scheduling, the storage controller needs to know what AZ pageservers are in.

Related #8848

Summary of changes

This patch set adds a new nullable column to the nodes table: availability_zone_id. The node registration
request is extended to include the AZ id (pageservers already have this in their metadata.json file).

If the node is already registered, then we update the persistent and in-memory state with the provided AZ.
Otherwise, we add the node with the AZ to begin with.

A couple assumptions are made here:

  1. Pageserver AZ ids are stable
  2. AZ ids do not change over time

Once all pageservers have a configured AZ, we can remove the optionals in the code
and make the database column not nullable.

Testing

I've tested this locally for compat:

  • new storcon and new pageserver
  • old storcon and new pageserver
  • new storcon and new pageserver

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Aug 28, 2024

3780 tests run: 3674 passed, 0 failed, 106 skipped (full report)


Flaky tests (2)

Postgres 16

Postgres 15

  • test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64

Code coverage* (full report)

  • functions: 32.3% (7310 of 22628 functions)
  • lines: 50.4% (59094 of 117342 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
f4cc045 at 2024-08-28T13:16:09.916Z :recycle:

@VladLazar VladLazar force-pushed the vlad/storcon-track-ps-az branch from 1c911e4 to e7eccb3 Compare August 28, 2024 11:08
When a node reattaches, it will include a registration request.
The registration request may include the AZ id of the node.
If the node is already known, but it's AZ id isn't, then extract
it from the request and persist in the db and memory.
@VladLazar VladLazar force-pushed the vlad/storcon-track-ps-az branch from e7eccb3 to 4825c70 Compare August 28, 2024 12:05
@VladLazar VladLazar force-pushed the vlad/storcon-track-ps-az branch from 4825c70 to f4cc045 Compare August 28, 2024 12:15
@VladLazar VladLazar marked this pull request as ready for review August 28, 2024 13:33
@VladLazar VladLazar requested a review from a team as a code owner August 28, 2024 13:33
@VladLazar VladLazar requested review from arpad-m and jcsp August 28, 2024 13:33
Copy link
Collaborator

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VladLazar VladLazar merged commit 793b506 into main Aug 28, 2024
68 checks passed
@VladLazar VladLazar deleted the vlad/storcon-track-ps-az branch August 28, 2024 17:23
VladLazar added a commit that referenced this pull request Sep 3, 2024
## Problem
Neon local set-up does not inject an az id in `metadata.json`. See real
change in #8852.

## Summary of changes
We piggyback on the existing `availability_zone` pageserver
configuration in order to avoid making neon local even more complex.
VladLazar added a commit that referenced this pull request Sep 5, 2024
## Problem
#8852 introduced a new nullable
column for the `nodes` table: `availability_zone_id`

## Summary of changes
* Make neon local and the test suite always provide an az id
* Make the az id field in the ps registration request mandatory
* Migrate the column to non-nullable and adjust in memory state
accordingly
* Remove the code that was used to populate the az id for pre-existing
nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants