Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tablet throttler: get remote tablets metrics from Realtime Stats , with auto-detection #13034

Closed

Conversation

shlomi-noach
Copy link
Contributor

Description

An enhancement of #13018 ; per #13018 (comment), this is a modification where the newly introduced --feature-throttler-read-realtime-stats command line flag is not required, and removed in this PR.

In this PR we track availability of throttler metrics in RealtimeStats. If a throttle metric was seen in RealtimeStats in the past minute, we do not run probes on the relevant tablet. If no metric has been seen for a tablet in the past minute, then the throttler runs the usual probes (currently HTTP based) for that tablet.

I'm not sure this approach is better than #13018, and the reason has to do with probe frequency. The PRIMARY tablet runs the standard probes run in subsecond intervals. However, it has no control over the probing frequency in other tablets. Thus, if --health_check_interval is high on replica tablets, say 10s, that means the PRIMARY has low resolution for throttler metrics (in particular, replication lag).

It does make sense when the throttler's threshold accommodates --health_check_interval. For example, health_check_interval of 5s makes sense if throttler is configured for the default replication lag metric, and the threshold is configured to, say, 30. But if the threshold is at 5s, then I'd expect a 1s-2s value for health_check_interval.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

…imeStats

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…id actively probing for relevant tablet

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels May 8, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented May 8, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@github-actions github-actions bot added this to the v17.0.0 milestone May 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Jun 8, 2023
@frouioui frouioui modified the milestones: v17.0.0, v18.0.0 Jun 12, 2023
@github-actions github-actions bot removed the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Jun 13, 2023
@github-actions
Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Jul 15, 2023
@shlomi-noach
Copy link
Contributor Author

We're not going to pursue this path. Instead, we will convert throttler's HTTP calls with RPC calls.

@shlomi-noach shlomi-noach deleted the throttler-health-metric-auto branch July 16, 2023 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Component: TabletManager NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Internal Cleanup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants