Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add storage time-series metrics for level size and score #88504

Merged
merged 1 commit into from
Sep 23, 2022

Conversation

nicktrav
Copy link
Collaborator

@nicktrav nicktrav commented Sep 22, 2022

Currently, the only way to infer the compaction score and heuristics is to use the LSM printout from the logs (emitted once every ten minutes), or to call the /debug/lsm endpoint manually, and track values over time. This makes it difficult to debug issues retroactively.

Add two new sets of per-LSM-level time-series metrics for level size and level score. These new metrics have names of the form storage.$LEVEL-level-{size,score}.

Closes #88415.

Release note (ops change): Adds two new sets of per-LSM-level time-series metrics, one for level size and another for level score. These metrics are of the form storage.$LEVEL-level-{size,score}.

@nicktrav nicktrav requested review from jbowens, sumeerbhola and a team September 22, 2022 20:20
@nicktrav nicktrav requested a review from a team as a code owner September 22, 2022 20:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nicktrav
Copy link
Collaborator Author

Example of how these can be used:

Screen Shot 2022-09-22 at 12 25 49 PM

@nicktrav nicktrav changed the title kverver: add storage time-series metrics for level size and score kvserver: add storage time-series metrics for level size and score Sep 22, 2022
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @nicktrav)


pkg/util/metric/metric.proto line 54 at r1 (raw file):

  // UNITLESS expresses that the metric's measurement does not have units (e.g.
  // a score).
  UNITLESS = 9;

Is adding an enum value here sufficient, or does it need some plumbing elsewhere too?
btw, COUNT does not have units either, and we seem to use it both for gauge and cumulative.
Is the assumption that COUNT is for integers while this UNITLESS value should be used for floats? I think it's worth clarifying via code comment how one should decide between COUNT and UNITLESS.

Currently, the only way to infer the compaction score and heuristics is
to use the LSM printout from the logs (emitted once every ten minutes),
or to call the `/debug/lsm` endpoint manually, and track values over
time. This makes it difficult to debug issues retroactively.

Add two new sets of per-LSM-level time-series metrics for level size and
level score. These new metrics have names of the form
`storage.$LEVEL-level-{size,score}`.

Closes cockroachdb#88415.

Release note (ops change): Adds two new sets of per-LSM-level
time-series metrics, one for level size and another for level score.
These metrics are of the form `storage.$LEVEL-level-{size,score}`.
Copy link
Collaborator Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens and @sumeerbhola)


pkg/util/metric/metric.proto line 54 at r1 (raw file):

COUNT does not have units either, and we seem to use it both for gauge and cumulative.

Reverted to just use COUNT.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)

@nicktrav
Copy link
Collaborator Author

TFTRs!

bors r=sumeerbhola,jbowens

@craig
Copy link
Contributor

craig bot commented Sep 23, 2022

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Sep 23, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from d41cce0 to blathers/backport-release-22.1-88504: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: time-series metrics for level size and score
4 participants