-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Aux file store v2 #7462
Labels
c/storage/pageserver
Component: storage: pageserver
t/Epic
Issue type: Epic
t/feature
Issue type: feature, for new features or requests
Comments
skyzh
added
t/feature
Issue type: feature, for new features or requests
c/storage/pageserver
Component: storage: pageserver
t/Epic
Issue type: Epic
labels
Apr 22, 2024
This was referenced Apr 23, 2024
skyzh
added a commit
that referenced
this issue
Apr 26, 2024
extracted from #7468, part of #7462. In the page server, we use i128 (instead of u128) to do the integer representation of the key, which indicates that the highest bit of the key should not be 1. This constraints our keyspace to <= 0x7F. Also fix the bug of `to_i128` that dropped the highest 4b. Now we keep 3b of them, dropping the sign bit. And on that, we shrink the metadata keyspace to 0x60-0x7F for now, and once we add support for u128, we can have a larger metadata keyspace. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh
added a commit
that referenced
this issue
Apr 30, 2024
extracted (and tested) from #7468, part of #7462. The current codebase assumes the keyspace is dense -- which means that if we have a keyspace of 0x00-0x100, we assume every key (e.g., 0x00, 0x01, 0x02, ...) exists in the storage engine. However, the assumption does not hold any more in metadata keyspace. The metadata keyspace is sparse. It is impossible to do per-key check. Ideally, we should not have the assumption of dense keyspace at all, but this would incur a lot of refactors. Therefore, we split the keyspaces we have to dense/sparse and handle them differently in the code for now. At some point in the future, we should assume all keyspaces are sparse. ## Summary of changes * Split collect_keyspace to return dense+sparse keyspace. * Do not allow generating image layers for sparse keyspace (for now -- will fix this next week, we need image layers anyways). * Generate delta layers for sparse keyspace. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
5 tasks
This week: after #7517 is merged, testing & benchmarking on staging. Improve on known perf bottlenecks:
|
This was referenced May 7, 2024
skyzh
added a commit
that referenced
this issue
May 15, 2024
This was referenced May 15, 2024
skyzh
added a commit
that referenced
this issue
May 17, 2024
Part of #7462 ## Summary of changes Tenant config is not persisted unless it's attached on the storage controller. In this pull request, we persist the aux file policy flag in the `index_part.json`. Admins can set `switch_aux_file_policy` in the storage controller or using the page server API. Upon the first aux file gets written, the write path will compare the aux file policy target with the current policy. If it is switch-able, we will do the switch. Otherwise, the original policy will be used. The test cases show what the admins can do / cannot do. The `last_aux_file_policy` is stored in `IndexPart`. Updates to the persisted policy are done via `schedule_index_upload_for_aux_file_policy_update`. On the write path, the writer will update the field. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>
skyzh
added a commit
that referenced
this issue
May 17, 2024
part of #7462 ## Summary of changes This pull request adds two APIs to the pageserver management API: list_aux_files and ingest_aux_files. The aux file pagebench is intended to be used on an empty timeline because the data do not go through the safekeeper. LSNs are advanced by 8 for each ingestion, to avoid invariant checks inside the pageserver. For now, I only care about space amplification / read amplification, so the bench is designed in a very simple way: ingest 10000 files, and I will manually dump the layer map to analyze. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
a-masterov
pushed a commit
that referenced
this issue
May 20, 2024
a-masterov
pushed a commit
that referenced
this issue
May 20, 2024
Part of #7462 ## Summary of changes Tenant config is not persisted unless it's attached on the storage controller. In this pull request, we persist the aux file policy flag in the `index_part.json`. Admins can set `switch_aux_file_policy` in the storage controller or using the page server API. Upon the first aux file gets written, the write path will compare the aux file policy target with the current policy. If it is switch-able, we will do the switch. Otherwise, the original policy will be used. The test cases show what the admins can do / cannot do. The `last_aux_file_policy` is stored in `IndexPart`. Updates to the persisted policy are done via `schedule_index_upload_for_aux_file_policy_update`. On the write path, the writer will update the field. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>
a-masterov
pushed a commit
that referenced
this issue
May 20, 2024
part of #7462 ## Summary of changes This pull request adds two APIs to the pageserver management API: list_aux_files and ingest_aux_files. The aux file pagebench is intended to be used on an empty timeline because the data do not go through the safekeeper. LSNs are advanced by 8 for each ingestion, to avoid invariant checks inside the pageserver. For now, I only care about space amplification / read amplification, so the bench is designed in a very simple way: ingest 10000 files, and I will manually dump the layer map to analyze. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh
added a commit
that referenced
this issue
May 20, 2024
Part of #7462 Sparse keyspace does not generate image layers for now. This pull request adds support for generating image layers for sparse keyspace. ## Summary of changes * Use the scan interface to generate compaction data for sparse keyspace. * Track num of delta layers reads during scan. * Read-trigger compaction: when a scan on the keyspace touches too many delta files, generate an image layer. There are one hard-coded threshold for now: max delta layers we want to touch for a scan. * L0 compaction does not need to compute holes for metadata keyspace. Know issue: the scan interface currently reads past the image layer, which causes `delta_layer_accessed` keeps increasing even if image layers are generated. The pull request to fix that will be separate, and orthogonal to this one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh
added a commit
that referenced
this issue
May 20, 2024
## Problem Part of #7462 On metadata keyspace, vectored get will not stop if a key is not found, and will read past the image layer. However, the semantics is different from single get, because if a key does not exist in the image layer, it means that the key does not exist in the past, or have been deleted. This pull request fixed it by recording image layer coverage during the vectored get process and stop when the full keyspace is covered by an image layer. A corresponding test case is added to ensure generating image layer reduces the number of delta layers. This optimization (or bug fix) also applies to rel block keyspaces. If a key is missing, we can know it's missing once the first image layer is reached. Page server will not attempt to read lower layers, which potentially incurs layer downloads + evictions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
All major works are done and closing this epic issue for now. I have created issues for all follow-up tasks. Might need one extra day at some point to implement the migration path when we decide how to roll this out to all users. |
5 tasks
This was referenced May 22, 2024
skyzh
added a commit
that referenced
this issue
May 22, 2024
## Problem If an existing user already has some aux v1 files, we don't want to switch them to the global tenant-level config. Part of #7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh
added a commit
that referenced
this issue
May 22, 2024
part of #7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
skyzh
added a commit
that referenced
this issue
May 22, 2024
For existing users, we want to allow doing a force switch for their aux file policy. Part of #7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
c/storage/pageserver
Component: storage: pageserver
t/Epic
Issue type: Epic
t/feature
Issue type: feature, for new features or requests
Motivation
To store aux file efficiently, we use one key for each of the aux file. To workaround the fixed-size key constraint, we hash the file name into the key. As the chance of hash collision is low, it is likely that we can get one aux file stored in one key.
DoD
Implementation ideas
Tasks
^--- likely done in the week of Apr 22 - Apr 26
separate scan interface to avoid maintaining untouched keyspacesdecide migration method and write migration code^--- likely done in the week of Apr 29 - May 3
^--- likely done in the week of May 6 - May 10
^--- likely done in the week of May 13 - May 17
Follow-up Works
Other related tasks and Epics
The parent epic: #7290. We will discuss further tasks like storing pg_stats and storing logical size in the new metadata key space in that epic issue.
The text was updated successfully, but these errors were encountered: