admission: additional observability #82743

irfansharif · 2022-06-10T19:05:58Z

In order of importance and/or done-ness:

The text was updated successfully, but these errors were encountered:

irfansharif · 2022-06-21T17:20:41Z

Perhaps exporting Go's /sched/latencies:seconds to have visibility in Go scheduler latencies.

This has proven extremely valuable to do in internal AC-related experiments (re: #75066). https://github.com/irfansharif/cockroach/tree/220614.export-tracing is a prototype that grafts together the prometheus-compatible data from https://github.com/prometheus/client_golang/blob/main/prometheus/go_collector_latest.go, and looks as follows:

Through it we were able to correlate foreground latency spikes to Go scheduler latency spikes.

irfansharif · 2022-07-04T15:09:47Z

From an internal doc, re: "Information needed from Go runtime":

Runnable info: Minimally, we need the number of runnable goroutines, sampled at
some reasonably high rate (100hz?). It would be preferable to get a delta value
of total duration spent in Runnable and Running state since the last sample (or
a cumulative number, from which we can compute the delta). The duration is less
sensitive to observing spikes in runnable goroutines, which quickly get
scheduled, which does not necessarily represent scarcity of cpu resources.

IIUC, this is exactly the total sum of everything captured within /sched/latencies:seconds.

sumeerbhola · 2022-07-19T17:53:39Z

Exporting segmented latency histograms by different priority levels as seen by admission control, to capture what classes of requests are observing queuing and by how much;

We need this to make sense of mixed workload behavior (e.g. conversation in https://cockroachlabs.slack.com/archives/C038JEXC5AT/p1658247509643359?thread_ts=1657630075.576439&cid=C038JEXC5AT)

irfansharif · 2022-08-24T14:42:12Z

Adding Andrew here too to pick through the list within the next two weeks, it'll be a good way to get our feet wet.

irfansharif · 2022-09-12T18:28:57Z

@andrewbaptist: I'm working on the "Exporting Go's /sched/latencies:seconds" as a histogram. Want to take on the remaining?

And record data into CRDB's internal time-series database. Informs \cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None

86277: eventpb: add storage event types r=jbowens,sumeerbhola a=nicktrav Add the `StoreStats` event type, a per-store event emitted to the `TELEMETRY` logging channel. This event type will be computed from the Pebble metrics for each store. Emit a `StoreStats` event periodically, by default, once per hour, per store. Touches #85589. Release note: None. Release justification: low risk, high benefit changes to existing functionality. 87142: workload/mixed-version/schemachanger: re-enable mixed version workload r=fqazi a=fqazi Fixes: #58489 #87477 Previously the mixed version schema changer workload was disabled because of the lack of version gates. These changes will do the following: - Start reporting errors on this workload again. - Disable trigrams in a mixed version state. - Disable the insert part of the workload in a mixed version state (there is an optimizer on 22.1 that can cause some of the queries to fail) Release justification: low risk only extends test coverage 87883: schedulerlatency: export Go scheduling latency metric r=irfansharif a=irfansharif And record data into CRDB's internal time-series database. Informs \#82743 and #87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. ``` bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) ``` Release note: None Release justification: observability-only PR, low-risk high-benefit; would help understand admission control out in the wild 88179: ui/cluster-ui: fix no most recent stmt for active txns r=xinhaoz a=xinhaoz Fixes #87738 Previously, active txns could have an empty 'Most Recent Statement' column, even if their executed statement count was non-zero. This was due to the most recent query text being populated by the active stmt, which could be empty at the time of querying. This commit populates the last statement text for a txn even when it is not currently executing a query. This commit also removes the `isFullScan` field from active txn pages, as we cannot fill this field out without all stmts in the txn. Release note (ui change): Full scan field is removed from active txn details page. Release note (bug fix): active txns with non-zero executed statement count now always have populated stmt text, even when no stmt is being executed. 88334: kvserver: align Raft recv/send queue sizes r=erikgrinaker a=pavelkalinnikov Fixes #87465 Release justification: performance fix Release note: Made sending and receiving Raft queue sizes match. Previously the receiver could unnecessarily drop messages in situations when the sending queue is bigger than the receiving one. Co-authored-by: Nick Travers <travers@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com> Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>

And record data into CRDB's internal time-series database. Informs \#82743 and #87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None

This commit splits the WorkQueueMetric stats into priority. Because there are a lot of new stats that might be generated as a result of this splitting, only the "important" priorities are collected and considered for each queue. Informs cockroachdb#82743. Release note: None

Part of cockroachdb#82743. We add cluster settings to control: - smoothing alpha for byte token computations; - reduction factor for L0 compaction tokens, based on observed compactions; We've found these to be useful in internal experiments, and also when looking to paper over L0 compaction variability effects up in AC. While here, print out observed smoothed compaction bytes in io_load_listener logging and introduce metrics for - l0 compacted bytes; - generated l0 tokens; - l0 tokens returned. Release note: None

Part of cockroachdb#82743. We introduce metrics for l0 compacted bytes, generated l0 tokens, and l0 tokens returned. Release note: None

Part of cockroachdb#82743. We add cluster settings to control: - smoothing alpha for byte token computations; - reduction factor for L0 compaction tokens, based on observed compactions; We've found these to be useful in internal experiments, and also when looking to paper over L0 compaction variability effects up in AC. Release note: None

109640: admission: add l0 control metrics r=irfansharif a=irfansharif Part of #82743. We introduce metrics for l0 compacted bytes, generated l0 tokens, and l0 tokens returned. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

We previously did not record anything into {IO,CPU} wait queue histograms when work either bypassed admission control (because of the nature of the work, or when certain admission queues were disabled through cluster settings). This meant that our histogram percentiles were not accurate. This problem didn't exist at the flow control level where work may not be subject to flow control depending on the mode selected ('apply_to_elastic', 'apply_to_all'). We'd still record a measured wait duration (~0ms), so we had accurate waiting-for-flow-tokens histograms. Part of cockroachdb#82743. Release note: None

We previously did not record anything into {IO,CPU} wait queue histograms when work either bypassed admission control (because of the nature of the work, or when certain admission queues were disabled through cluster settings) or used the fast-path (i.e. didn't add themselves to the tenant heaps). This meant that our histogram percentiles were not accurate. NB: This problem didn't exist at the flow control level where work may not be subject to flow control depending on the mode selected ('apply_to_elastic', 'apply_to_all'). We'd still record a measured wait duration (~0ms), so we had accurate waiting-for-flow-tokens histograms. Part of cockroachdb#82743. Release note: None

110060: admission: fix wait queue histograms r=irfansharif a=irfansharif We previously did not record anything into {IO,CPU} wait queue histograms when work either bypassed admission control (because of the nature of the work, or when certain admission queues were disabled through cluster settings). This meant that our histogram percentiles were not accurate. This problem didn't exist at the flow control level where work may not be subject to flow control depending on the mode selected ('apply_to_elastic', 'apply_to_all'). We'd still record a measured wait duration (~0ms), so we had accurate waiting-for-flow-tokens histograms. Part of #82743. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

Part of cockroachdb#82743. We introduce an admission.granter.io_tokens_bypassed.kv metric, that tracks the total number of tokens taken by work bypassing admission control. For example, follower writes without flow control. Aside: cockroachdb#109640 ripped out a tokens-taken-without-permission metric that was supposed to capture some of this, but even for standard admission work we'd routinely exercise that code path. When admitting work, we take 1 token, and later take the remaining without permission. Release note: None

109833: kvflowcontrol: annotate/fix perf regressions r=irfansharif a=irfansharif - Replace the flow controller level mutex-backed kvflowcontrol.Stream => token bucket map with sync.Map. On kv0/enc=false/nodes=3/cpu=96 accessing this map contributed to a high amount of mutex contention. We observe that this bucket is effectively read-only - entries for keys are written once (on creation) and read frequently after. We don't currently GC these buckets, but even if we did, the same access pattern would hold. We'll note that using a sync.Map is slightly more expensive CPU-wise. - Replace various map accesses with individual variables. We were needly using maps to access one of two variables, keyed by work class, for example when maintaining metrics per work class, or tracking token adjustments. The map accesses appeared prominently in CPU profiles and was unnecessary overhead. - Avoid using log.ExpensiveLogEnabled in hot code paths; it shows up in CPU profiles. - Slightly reduce the surface area of kvflowhandle.Handle.mu when returning flow tokens. - We also annotate various other points in the code where peep-hole optimizations exist, as surfaced by kv0/enc=false/nodes=3/cpu=96. Part of #104154. Release note: None 110088: privilege: automate generation of ByName map r=ecwall a=andyyang890 This patch automates the process of generating the `ByName` map so that any newly added privileges will automatically be included. Epic: None Release note: None 110110: admission: add metric for bypassed IO admission work r=irfansharif a=irfansharif Part of #82743. We introduce an admission.granter.io_tokens_bypassed.kv metric, that tracks the total number of tokens taken by work bypassing admission control. For example, follower writes without flow control. Aside: #109640 ripped out a tokens-taken-without-permission metric that was supposed to capture some of this, but even for standard admission work we'd routinely exercise that code path. When admitting work, we take 1 token, and later take the remaining without permission. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Andy Yang <yang@cockroachlabs.com>

110135: ui: surface flow control metrics in overload dashboard r=irfansharif a=irfansharif Some of this new flow control machinery changes the game for IO admission control. This commits surfaces relevant metrics to the overload dashboard: - kvadmission.flow_controller.{regular,elastic}_wait_duration-p75 - kvadmission.flow_controller.{regular,elastic}_requests_waiting - kvadmission.flow_controller.{regular,elastic}_blocked_stream_count While here, we replace the storage.l0-{sublevels,num-files} metrics with the admission.io.overload instead. The former showed the raw counts instead of normalizing it based on AC target thresholds. And the y-axis scales for sublevels vs. files are an order of magnitude apart, so slightly more annoying to distinguish. Part of #82743. Release note: None 110924: release: remove SREOPS request, as part of deprecating CC qual clusters r=rail a=celiala As per RE-462, we plan to deprecate the CC qualification clusters. In this commit, we remove the SREOPS request, which previously requested updates to the CC qualification clusters. Release note: None Epic: RE-462 Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Celia La <celia@cockroachlabs.com>

aadityasondhi · 2023-12-12T19:27:18Z

Discussed offline with @sumeerbhola, closing this issue as the changes we want to do already have separate issues that are being tracked in the backlog with priorities assigned to them. The rest of them, we don't want to invest time into doing.

irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Jun 10, 2022

irfansharif changed the title ~~admission: finer runnable goroutine + slot count observability~~ admission: additional observability Jun 14, 2022

irfansharif added the O-postmortem Originated from a Postmortem action item. label Jun 14, 2022

irfansharif self-assigned this Jul 20, 2022

sumeerbhola mentioned this issue Aug 4, 2022

admission: investigate TPC-E online index creation problem #85641

Closed

irfansharif assigned andrewbaptist Aug 24, 2022

irfansharif mentioned this issue Sep 12, 2022

metrics: export go runtime metrics #87823

Closed

irfansharif mentioned this issue Sep 12, 2022

schedulerlatency: export Go scheduling latency metric #87883

Merged

irfansharif removed their assignment Sep 21, 2022

blathers-crl bot mentioned this issue Sep 21, 2022

release-22.2: schedulerlatency: export Go scheduling latency metric #88403

Merged

irfansharif unassigned andrewbaptist Oct 24, 2022

irfansharif mentioned this issue Dec 14, 2022

admission: better observability of slot adjustment behavior #92673

Closed

exalate-issue-sync bot added the T-kv KV Team label Feb 17, 2023

exalate-issue-sync bot removed the O-postmortem Originated from a Postmortem action item. label Mar 7, 2023

irfansharif mentioned this issue Aug 29, 2023

admission: add l0 control metrics #109640

Merged

irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 5, 2023

admission: add l0 control metrics

05f0064

Part of cockroachdb#82743. We introduce metrics for l0 compacted bytes, generated l0 tokens, and l0 tokens returned. Release note: None

irfansharif mentioned this issue Sep 5, 2023

admission: add l0 control settings #110045

Closed

irfansharif mentioned this issue Sep 5, 2023

ui: fix overload dashboard units #110056

Open

irfansharif mentioned this issue Sep 5, 2023

admission: fix wait queue histograms #110060

Merged

irfansharif mentioned this issue Sep 6, 2023

admission: add metric for bypassed IO admission work #110110

Merged

irfansharif mentioned this issue Sep 6, 2023

ui: surface flow control metrics in overload dashboard #110135

Merged

aadityasondhi self-assigned this Oct 2, 2023

aadityasondhi added T-admission-control Admission Control and removed T-kv KV Team labels Oct 3, 2023

aadityasondhi closed this as completed Dec 12, 2023

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: additional observability #82743

admission: additional observability #82743

irfansharif commented Jun 10, 2022 •

edited

Loading

irfansharif commented Jun 21, 2022

irfansharif commented Jul 4, 2022

sumeerbhola commented Jul 19, 2022

irfansharif commented Aug 24, 2022

irfansharif commented Sep 12, 2022

aadityasondhi commented Dec 12, 2023

admission: additional observability #82743

admission: additional observability #82743

Comments

irfansharif commented Jun 10, 2022 • edited Loading

irfansharif commented Jun 21, 2022

irfansharif commented Jul 4, 2022

sumeerbhola commented Jul 19, 2022

irfansharif commented Aug 24, 2022

irfansharif commented Sep 12, 2022

aadityasondhi commented Dec 12, 2023

irfansharif commented Jun 10, 2022 •

edited

Loading