-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: pageserver_ondisk_layers metric #3775
Conversation
allows us to keep the layermap clean of these concerns.
Need to check if we have any test for this metric, if not, fix that and then last split the metric up. |
@SomeoneToIgnore do you feel this approach of deref-abuse to delegate is acceptable with LayerMap which doesn't implement PartialEq and friends? |
Did a poll internally, no one is horrified. |
been thinking for this a while, could move to a different PR.
856b187
to
e1b9005
Compare
merge-allure-reports failed debug?
|
pageserver_http.get_metrics().query_one("pageserver_ondisk_layers").value | ||
) | ||
|
||
# assumption: floats here are small enough to compare with integers safely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the value < 2**53 and value.is_integer()
dance on the other PR, maybe should move that to Sample
which is returned by this query_one
, something like ... def value_as_int(self)
on a follow-up.
assert total_populated_layers == post_eviction_total_layers + 4 | ||
# corrected with remotes_after because only 3 out of 4 seem to be usually | ||
# required for layer creation | ||
assert post_compaction_total_layers == total_populated_layers + 1 - remotes_after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts are quite awful after all.
total_populated_layers == 4 (deltas before creating image)
post_eviction_total_layers == 0 (evicted all layers)
post_compaction_total_layers == 4 + 1 - N (N = remotes not needed for imaging)
should these be put in as constants and let's see how long they last? :) Feels like that's adding work to konstantin's current PRs but at the same time I am unsure if these will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the resident_physical_size
gauge into the MeasuredLayerMap
as well?
(If you still want to attempt a Collector-based approach, that should be a separate PR)
It's non-trivial to move all affected metrics in (I wasn't expecting there to be many other) so drafting while waiting for time to look at this. |
Mentioned for generation/version number which would be handy in number of cases: https://github.com/neondatabase/neon/pull/4005/files#r1165309034 |
Sad to leave this out because of scope creep. |
Current pageserver_ondisk_layers is wrong, see issue. To fix that, we need
trait PersistentLayer
in LayerMap. Solve this need by introducing a wrapperMeasuredLayerMap
where the metrics are updated. Additionally an existing test which does evictions is modified to check this fixed metric (couldn't find an existing test case and nothing broke).Additionally has three unrelated fixes.
Cc: #3705 (separate from adding a metric for remote layers)