Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry: track hot ranges from a cluster #85415

Closed
thtruo opened this issue Aug 1, 2022 · 4 comments · Fixed by #89511
Closed

telemetry: track hot ranges from a cluster #85415

thtruo opened this issue Aug 1, 2022 · 4 comments · Fixed by #89511
Assignees
Labels
A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@thtruo
Copy link
Contributor

thtruo commented Aug 1, 2022

Is your feature request related to a problem? Please describe.
The KV Observability team has several data questions that they'd like to answer that we don't have insights into today:

  • Which org or cluster has the hottest range?
  • How hot is that range? What database, table, index or node, store, sql pod, etc is that range associated with?
  • What is the 50th, 70th, 90th-percentile QPS for hot ranges?

Describe the solution you'd like
Brainstorm: we could track hourly logs (or some other cadence) that take a snapshot of the currently hottest ranges (top X ranges) for each cloud cluster and their associated QPS, db, table, index, store, node, sql pod, etc. We could perhaps include other summary/aggregate stats alongside each telemetry log event. Details pending in a separate 1-pager

Additional context
We should backport this to 22.1 and 21.2

Jira issue: CRDB-18238

@thtruo thtruo added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-observability labels Aug 1, 2022
@thtruo
Copy link
Contributor Author

thtruo commented Aug 1, 2022

cc @nkodali for awareness

@kevin-v-ngo
Copy link

This would be great to have collected! We can correlate this information to the schema telemetry snapshot we're collecting. And then depending on number (and which) of statements are hitting the table/index, we can suggest and create insights to potential solutions such as hash-sharded indexes if sequential ids are used, manual splits, etc.

FYI @devadvocado @postamar

@koorosh
Copy link
Collaborator

koorosh commented Jan 27, 2023

@thtruo , can you help me answer following questions:

  1. how often hot ranges stats should be sent to telemetry?
  2. should it be always enabled or enabled by some cluster setting (ie diagnostics.reporting.enabled)?

@thtruo
Copy link
Contributor Author

thtruo commented Jan 27, 2023

heads up @koorosh I noted details and open questions in this doc - let's review at our next sync and circle back on details here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants