Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lots of time wasted on count_deltas() #6861

Closed
4 of 5 tasks
problame opened this issue Feb 21, 2024 · 5 comments
Closed
4 of 5 tasks

lots of time wasted on count_deltas() #6861

problame opened this issue Feb 21, 2024 · 5 comments
Assignees
Labels
a/performance Area: relates to performance of the system c/storage/pageserver Component: storage: pageserver

Comments

@problame
Copy link
Contributor

problame commented Feb 21, 2024

Problem

original thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1708513450565049

Now that the flamegraphs are fixed, I took one on ps-2 ap-southeast-1 to investigate the elevanted CPU usage after enabling tokio-epoll-uring there.
That investigation isn't the subject of this thread though, but, the general finding of where that PS is spending its time.
LayerMap::count_deltas inside time_for_new_image_layer completely dominates the CPU usage there.
AFAICT that is called for every tenant, even if the layer map hasn't changed.

This is wasteful.

ps-2 ap-southeast-1 tokio-epoll-uring 60s 2

Solution

If the layer map and partitioning is the same as in an earlier call, early-exit in time_for_new_image_layer to avoid the call to count_deltas().

Tasks

Tasks

Preview Give feedback
@problame problame added c/storage/pageserver Component: storage: pageserver a/performance Area: relates to performance of the system labels Feb 21, 2024
problame added a commit that referenced this issue Feb 21, 2024
…ove backwards

This PR enforces aspects of `Timeline::repartition` that were already
true at runtime:

- it's not called concurrently, so, bail out if it is anyway (see
  comment why it's not called concurrently)
- the `lsn` should never be moving backwards over the lifetime of a
  Timeline object, because last_record_lsn() can only move forwards
  over the lifetime of a Timeline object

part of #6861
problame added a commit that referenced this issue Feb 21, 2024
It's been dead-code-at-runtime for 9 months, let's remove it.
We can always re-introduce it at a later point.

Came across this while working on #6861, which will touch
`time_for_new_image_layer`. This is an opporunity to make that function
simpler.
@hlinnaka
Copy link
Contributor

The new compaction code in #6830 no longer calls count_deltas. (It needs testing to see if it introduces other problems of course)

@problame
Copy link
Contributor Author

Yeah, aware, @arpad-m is going to work on compaction, but, it'll be many more weeks until it lands, I think.

problame added a commit that referenced this issue Feb 26, 2024
It's been dead-code-at-runtime for 9 months, let's remove it.
We can always re-introduce it at a later point.

Came across this while working on #6861, which will touch
`time_for_new_image_layer`. This is an opporunity to make that function
simpler.
problame added a commit that referenced this issue Feb 26, 2024
…e backwards (#6862)

This PR enforces aspects of `Timeline::repartition` that were already
true at runtime:

- it's not called concurrently, so, bail out if it is anyway (see
  comment why it's not called concurrently)
- the `lsn` should never be moving backwards over the lifetime of a
  Timeline object, because last_record_lsn() can only move forwards
  over the lifetime of a Timeline object

The switch to tokio::sync::Mutex blows up the size of the `partitioning`
field from 40 bytes to 72 bytes on Linux x86_64.
That would be concerning if it was a hot field, but, `partitioning` is
only accessed every 20s by one task, so, there won't be excessive cache
pain on it.
(It still sucks that it's now >1 cache line, but I need the Send-able
MutexGuard in the next PR)

part of #6861
@VladLazar VladLazar self-assigned this Mar 25, 2024
@problame
Copy link
Contributor Author

@VladLazar just in case you didn't see it, my PR to avoid count_deltas() is here: #6868

Feel free to take it over

@VladLazar
Copy link
Contributor

Update:

  • WAL based solution merged and will be released this week (2024-04-02)
  • Need to check things have improved post release. Keeping the ticket open until then.

@VladLazar
Copy link
Contributor

VladLazar commented Apr 8, 2024

Looks like #7230 helped here. Generated another flamegraph this morning and it's not exhibiting the original issue:

(ask me if you want the svg - can't add it here for some reason)

2024-04-08-ps-2-ap-southeast-1-perf-check-count-delta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/performance Area: relates to performance of the system c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

No branches or pull requests

3 participants