Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: make use of telemetry channel for structured event logging #85589

Closed
nicktrav opened this issue Aug 3, 2022 · 1 comment
Closed
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team

Comments

@nicktrav
Copy link
Collaborator

nicktrav commented Aug 3, 2022

Is your feature request related to a problem? Please describe.

Cockroach supports logging structured events (see the proto definition, here) to the TELEMETRY logging channel. On Cockroach Cloud, these events are collected and made available elsewhere for reporting purposes.

Describe the solution you'd like

The storage layer should make use of these structured events to log characteristics about clusters that would help us a) identify areas for improvement, and b) track our improvements over time.

Additional context

There is additional documentation in the following:

Jira issue: CRDB-18327

Epic CRDB-17515

@nicktrav nicktrav added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Aug 3, 2022
nicktrav added a commit to nicktrav/cockroach that referenced this issue Aug 19, 2022
Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches cockroachdb#85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.
nicktrav added a commit to nicktrav/cockroach that referenced this issue Aug 22, 2022
Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches cockroachdb#85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.
nicktrav added a commit to nicktrav/cockroach that referenced this issue Aug 23, 2022
Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches cockroachdb#85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.
nicktrav added a commit to nicktrav/cockroach that referenced this issue Aug 24, 2022
Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches cockroachdb#85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.
nicktrav added a commit to nicktrav/cockroach that referenced this issue Sep 19, 2022
Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches cockroachdb#85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.
craig bot pushed a commit that referenced this issue Sep 21, 2022
86277: eventpb: add storage event types r=jbowens,sumeerbhola a=nicktrav

Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches #85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.

87142: workload/mixed-version/schemachanger: re-enable mixed version workload r=fqazi a=fqazi

Fixes: #58489 #87477

Previously the mixed version schema changer workload was disabled because of the lack of version gates. These changes will do the following:

- Start reporting errors on this workload again.
- Disable trigrams in a mixed version state.
- Disable the insert part of the workload in a mixed version state (there is an optimizer on 22.1 that can cause some of the queries to fail)

Release justification: low risk only extends test coverage

87883: schedulerlatency: export Go scheduling latency metric r=irfansharif a=irfansharif

And record data into CRDB's internal time-series database. Informs
\#82743 and #87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

```
  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)
```

Release note: None
Release justification: observability-only PR, low-risk high-benefit; would help understand admission control out in the wild

88179: ui/cluster-ui: fix no most recent stmt for active txns r=xinhaoz a=xinhaoz

Fixes #87738

Previously, active txns could have an empty 'Most Recent Statement' column, even if their executed statement count was non-zero. This was due to the most recent query text being populated by the active stmt, which could be empty at the time of querying. This commit populates the last statement text for a txn even when it is not currently executing a query.

This commit also removes the `isFullScan` field from active txn pages, as we cannot fill this field out without all stmts in the txn.

Release note (ui change): Full scan field is removed from active txn details page.

Release note (bug fix): active txns with non-zero
executed statement count now always have populated stmt text, even when no stmt is being executed.

88334: kvserver: align Raft recv/send queue sizes r=erikgrinaker a=pavelkalinnikov

Fixes #87465

Release justification: performance fix
Release note: Made sending and receiving Raft queue sizes match. Previously the receiver could unnecessarily drop messages in situations when the sending queue is bigger than the receiving one.

Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com>
Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Sep 21, 2022
Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches #85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.
@nicktrav
Copy link
Collaborator Author

Marking this as done in #86277.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects
None yet
Development

No branches or pull requests

1 participant