Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: move toward more efficient IndexPart serialization #4939

Closed
wants to merge 2 commits into from

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Aug 9, 2023

Problem

IndexPart contains two redundant lists of layer names: a set of the names, and then a map of name to metadata.

It also has two places where serialization stuff peeks into the runtime view of the data:

  • metadata is carried as metadata_bytes, we should just serialize/deserialize it like other serialized data, rather than having to do from_bytes/to_bytes separately.
  • disk_consistent_lsn field is written to JSON redundant (it's in the binary metadata) for human eyes, but at runtime we shouldn't need to use that field.

Summary of changes

The IndexPart structure removes timeline_layers and disk_persistent_lsn. metadata_bytes becomes just metadata. Explicit Serialize and Deserialize implementations map from the serialized structured to the cleaned up IndexPart.

The encoding version remains v2, so that existing pageservers are forward compatible to what we write. The timeline_layers in serialized version is synthesized from the keys of layer_metadata. The next release after deploying this, we can bump the version to 3 and remove the field.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

jcsp added 2 commits August 9, 2023 10:32
This will replace hand encode/decode done in various places.
The v2 format duplicates all layer names in
a set and a map.

Remove the `timeline_layers` from the structure,
and demote it to just being serialized from
`layer_metadata` keys: this prepares for a
v3 format that removes the field entirely, which
can be done after this version is fully deployed.

Also clean up the IndexPart's fields to disentangle
it from serialization:
 - Remove disk_consistent_lsn from IndexPart, as
   it only exists as a convenience to people looking at
   JSON.
 - Replace metadata_bytes with metadata, and do
   the serialization of this along with the struct
   as a whole.

The unit test that tested v1 decode with missing_layers
and inconsistent layer_metadata is removed, because
all production data had already been rewritten to avoid
that.  It was already the case that an index_part with
incomplete layer_metadata would fail to attach.
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Aug 9, 2023
@jcsp jcsp changed the title Jcsp/index part encoding pageserver: move toward more efficient IndexPart serialization Aug 9, 2023
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

1264 tests run: 1210 passed, 1 failed, 53 skipped (full report)


Failures on Posgres 14

  • test_crafted_wal_end[last_wal_record_xlog_switch_ends_on_page_boundary]: debug
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_crafted_wal_end[debug-pg14-last_wal_record_xlog_switch_ends_on_page_boundary]"
The comment gets automatically updated with the latest test results
7d80f9f at 2023-08-09T09:58:22.429Z :recycle:

@jcsp
Copy link
Collaborator Author

jcsp commented Aug 11, 2023

Doing this more gradually in #4972

@jcsp jcsp closed this Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant