Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: avoid incrementing access time when reading layers for compaction #4971

Merged
merged 5 commits into from
Aug 14, 2023

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Aug 11, 2023

Problem

Currently, image generation reads delta layers before writing out subsequent image layers, which updates the access time of the delta layers and effectively puts them at the back of the queue for eviction. This is the opposite of what we want, because after a delta layer is covered by a later image layer, it's likely that subsequent reads of latest data will hit the image rather than the delta layer, so the delta layer should be quite a good candidate for eviction.

Summary of changes

RequestContext gets a new ATimeBehavior field, and a RequestContextBuilder helper so that we can optionally add the new field without growing RequestContext::new every time we add something like this.

Request context is passed into the record_access function, and the access time is not updated if ATimeBehavior::Skip is set.

The compaction background task constructs its request context with this skip policy.

Closes: #4969

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

jcsp added 2 commits August 11, 2023 13:37
...and implement a RequestContextBuilder so that as
we add such fields, we do not have to keep adding args
to RequestContext::new and updating all call sites.
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Aug 11, 2023
@github-actions
Copy link

github-actions bot commented Aug 11, 2023

1588 tests run: 1510 passed, 0 failed, 78 skipped (full report)


The comment gets automatically updated with the latest test results
d2bf8f5 at 2023-08-14T08:19:19.339Z :recycle:

@jcsp jcsp force-pushed the jcsp/compaction-skip-atime branch from 00a7a66 to f2d0acb Compare August 11, 2023 14:19
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be an if in DeltaLayer::load in

self.access_stats
.record_access(access_kind, ctx.task_kind());

Like:

if access_kind != AccessKind::CreateImageLayers {
    self.access_stats.record_access(access_kind, ctx.task_kind());
}

After this PR, we could already try 1TB import in staging to see if any needle is moved.

@jcsp
Copy link
Collaborator Author

jcsp commented Aug 11, 2023

I don't think a LayerAccessKind variant works for this. The reads in create_image_layers go via the general purpose get(), and ultimately get_value_reconstruct_data -- it doesn't take LayerAccessKind arguments (but it does carry the context object).

I'd also not want to make this behavior specific to delta layer, because image layer generation will also depend on reading earlier image layers, where we'd also like to avoid updating their access times.

@jcsp jcsp marked this pull request as ready for review August 11, 2023 15:41
@jcsp jcsp requested review from a team as code owners August 11, 2023 15:41
@jcsp jcsp requested review from save-buffer and skyzh and removed request for a team August 11, 2023 15:41
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad for not giving this a more thorough look last friday. Approved with getting rid of "atime" mention just not to confuse with the fs level information which we explicitly ended up not using.

pageserver/src/context.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline.rs Outdated Show resolved Hide resolved
@jcsp
Copy link
Collaborator Author

jcsp commented Aug 14, 2023

Added a commit that tweaks the names to avoid potential confusion around filesystem atime vs out internal access stats.

@jcsp jcsp requested a review from koivunej August 14, 2023 07:58
@jcsp jcsp merged commit d3a97fd into main Aug 14, 2023
@jcsp jcsp deleted the jcsp/compaction-skip-atime branch August 14, 2023 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create_image_layers records accesses to input layers, making them harder to evict
2 participants