release-22.2: kv,sql: integrate row-level TTL reads with CPU limiter #109259

irfansharif · 2023-08-22T17:40:19Z

Backport 1/2 commits from #108815.

/cc @cockroachdb/release

Part of #98722.

We do it at two levels, each appearing prominently in CPU profiles:

Down in KV, where we're handling batch requests issued as part of row-level TTL selects. This is gated by kvadmission.low_pri_read_elastic_control.enabled.
Up in SQL, when handling KV responses to said batch requests. This is gated by sqladmission.low_pri_read_response_elastic_control.enabled.

Similar to backups, rangefeed initial scans, and changefeed event processing, we've observed latency impact during CPU-intensive scans issued as part of row-level TTL jobs. We know from before that the existing slots based mechanism for CPU work can result in excessive scheduling latency for high-pri work in the presence of lower-pri work, affecting end-user latencies. This commit then tries to control the total CPU% used by row-level TTL selects through the elastic CPU limiter. For the KV work, this was trivial -- we already have integrations at the batch request level and now we pick out requests with the admissionpb.TTLLowPri bit set.

For the SQL portion of the work we introduce some minimal plumbing. Where previously we sought admission in the SQL-KV response queues after fetching each batch of KVs from KV as part of our volcano operator iteration, we now incrementally acquire CPU nanos. We do this specifically for row-level TTL work. Experimentally the CPU nanos we acquire here map roughly to the CPU utilization due to SQL work for row-level TTL selects.

Release note: None

Release justification: Disabled-by-default AC integration for row-level TTL selects. Comes up in escalations.

blathers-crl · 2023-08-22T17:40:21Z

blathers-crl · 2023-08-22T17:40:23Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-08-22T17:40:32Z

This change is

Part of cockroachdb#98722. We do it at two levels, each appearing prominently in CPU profiles: - Down in KV, where we're handling batch requests issued as part of row-level TTL selects. This is gated by kvadmission.low_pri_read_elastic_control.enabled. - Up in SQL, when handling KV responses to said batch requests. This is gated by sqladmission.low_pri_read_response_elastic_control.enabled. Similar to backups, rangefeed initial scans, and changefeed event processing, we've observed latency impact during CPU-intensive scans issued as part of row-level TTL jobs. We know from before that the existing slots based mechanism for CPU work can result in excessive scheduling latency for high-pri work in the presence of lower-pri work, affecting end-user latencies. This is because the slots mechanisms aims for full utilization of the underlying resource, which is incompatible with low scheduling latencies. This commit then tries to control the total CPU% used by row-level TTL selects through the elastic CPU limiter. For the KV work, this was trivial -- we already have integrations at the batch request level and now we pick out requests with priorities less than admissionpb.UserLowPri, which includes admissionpb.TTLLowPri. For the SQL portion of the work we introduce some minimal plumbing. Where previously we sought admission in the SQL-KV response queues after fetching each batch of KVs from KV as part of our volcano operator iteration, we now incrementally acquire CPU nanos. We do this specifically for row-level TTL work. Experimentally the CPU nanos we acquire here map roughly to the CPU utilization due to SQL work for row-level TTL selects. (Note that we apply the elastic CPU limiter for all reads with priorities less than admissionpb.UserPriLow. This is typically internally submitted reads, and includes row-level TTL selects.) Release note: None

Leave it switched off by default in an already released branch. Release note: None

irfansharif · 2023-08-22T19:19:22Z

These were clean backports BTW. Only diffs were around pkg/roachpb rename to kv/kvpb and some pkg/server initialization code (things moved around with kvflowcontrol).

bananabrick

Reviewable status: complete! 1 of 0 LGTMs obtained

yuzefovich

Reviewed 6 of 6 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)

irfansharif requested review from a team as code owners August 22, 2023 17:40

irfansharif force-pushed the backport22.2-109257 branch from 22a2c5f to b25d579 Compare August 22, 2023 17:52

irfansharif added 2 commits August 22, 2023 13:56

kvadmission,sql: disable row-level TTL/CPU limiter integration

3b48428

Leave it switched off by default in an already released branch. Release note: None

irfansharif force-pushed the backport22.2-109257 branch from b25d579 to 3b48428 Compare August 22, 2023 17:57

yuzefovich changed the title ~~release-22.2: release-23.1: kv,sql: integrate row-level TTL reads with CPU limiter~~ release-22.2: kv,sql: integrate row-level TTL reads with CPU limiter Aug 22, 2023

bananabrick approved these changes Aug 22, 2023

View reviewed changes

yuzefovich approved these changes Aug 22, 2023

View reviewed changes

irfansharif merged commit 9fd131d into cockroachdb:release-22.2 Aug 22, 2023
2 checks passed

irfansharif deleted the backport22.2-109257 branch August 22, 2023 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.2: kv,sql: integrate row-level TTL reads with CPU limiter #109259

release-22.2: kv,sql: integrate row-level TTL reads with CPU limiter #109259

irfansharif commented Aug 22, 2023 •

edited by rafiss

Loading

blathers-crl bot commented Aug 22, 2023

blathers-crl bot commented Aug 22, 2023

cockroach-teamcity commented Aug 22, 2023

irfansharif commented Aug 22, 2023

bananabrick left a comment

yuzefovich left a comment

release-22.2: kv,sql: integrate row-level TTL reads with CPU limiter #109259

release-22.2: kv,sql: integrate row-level TTL reads with CPU limiter #109259

Conversation

irfansharif commented Aug 22, 2023 • edited by rafiss Loading

blathers-crl bot commented Aug 22, 2023

blathers-crl bot commented Aug 22, 2023

cockroach-teamcity commented Aug 22, 2023

irfansharif commented Aug 22, 2023

bananabrick left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

irfansharif commented Aug 22, 2023 •

edited by rafiss

Loading