Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: do not read redundant timeline_layers from IndexPart, so that we can remove it later #4972

Merged
merged 3 commits into from
Aug 21, 2023

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Aug 11, 2023

Problem

IndexPart contains two redundant lists of layer names: a set of the names, and then a map of name to metadata.

We already required that all the layers in timeline_layers are also in layers_metadata, in initialize_with_current_remote_index_part, so if there were any index_part.json files in the field that relied on these sets being different, they would already be broken.

Summary of changes

timeline_layers is made private and no longer read at runtime. It is still serialized, but not deserialized.

disk_consistent_lsn is also made private, as this field only exists for convenience of humans reading the serialized JSON.

This prepares us to entirely remove timeline_layers in a future release, once this change is fully deployed, and therefore no pageservers are trying to read the field.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

jcsp added 2 commits August 11, 2023 14:38
timeline_layers is no longer used, no need to cross
reference it with layer_metadata
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Aug 11, 2023
@github-actions
Copy link

github-actions bot commented Aug 11, 2023

1624 tests run: 1550 passed, 0 failed, 74 skipped (full report)


The comment gets automatically updated with the latest test results
0657362 at 2023-08-21T11:10:48.836Z :recycle:

@jcsp jcsp marked this pull request as ready for review August 11, 2023 15:42
@jcsp jcsp requested review from a team as code owners August 11, 2023 15:42
@jcsp jcsp requested review from knizhnik, hlinnaka, koivunej, LizardWizzard and problame and removed request for a team, knizhnik and hlinnaka August 11, 2023 15:42
Copy link
Member

@arpad-m arpad-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the two lists match, only one has values in addition. It's thus redundant and can be removed.

Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bump the version (cannot add comment) because this is a meaningful change overall. It'll all contribute to us being able to spot if some old version is surprisingly still lurking around when doing cleanups.

No other issues.

@problame
Copy link
Contributor

Maybe worth noting in this PR the earlier commit that started requiring layer_metadata. It is dd22c87 .

By the way, in theory, there is no guarantee that Control Plane has attached all tenants all the time.
So, in theory, we can't be certain that all index_part.json's have been migrated.

I guess it's fine though.

@arpad-m
Copy link
Member

arpad-m commented Aug 16, 2023

there is no guarantee that Control Plane has attached all tenants all the time.
So, in theory, we can't be certain that all index_part.json's have been migrated.

That's important to keep in mind. I think right now we are still writing out the old format, we just don't read the field any more, basically allowing for index_part.json's that don't have it. This ensures that we can still migrate tenants from an old pageserver version to a newer pageserver version, i.e. when someone uses the migration button during deployment. I think it's quite rare to have this, but it's good that @jcsp is so careful about it.

This is not strictly necessary as the serialized format remains
the same in practice, but will give us visibility of what
IndexParts were written with the recently changed code in case
of issues.
@jcsp
Copy link
Collaborator Author

jcsp commented Aug 21, 2023

I checked that we didn't have any indices in the field with timeline_layers that didn't match layer_metadata, and defensively bumped the IndexPart version: even though the serialized data should be the same, the version will make it obvious which version wrote the file.

@jcsp jcsp enabled auto-merge (squash) August 21, 2023 10:52
@jcsp jcsp merged commit b95addd into main Aug 21, 2023
@jcsp jcsp deleted the jcsp/index-part-lite branch August 21, 2023 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants