-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic tablet throttler config: enable/disable, set metrics query/threshold #11604
Dynamic tablet throttler config: enable/disable, set metrics query/threshold #11604
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ards compatibility) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
… STATUS Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! My last concern has been addressed. Made one minor comment about the vtctld
help output. I'll let the other reviewers double check their respective areas. Thank you for working on this! I think this greatly improves the throttling feature! ❤️
// UpdateThrottlerConfig makes a UpdateThrottlerConfig gRPC call to a vtctld. | ||
UpdateThrottlerConfig = &cobra.Command{ | ||
Use: "UpdateThrottlerConfig [--enable|--disable] [--threshold=<float64>] [--custom-query=<query>] [--check-as-check-self|--check-as-check-shard] <keyspace>", | ||
Short: "Rebuilds the cell-specific SrvVSchema from the global VSchema objects in the provided cells (or all cells if none provided).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that you can specify cells but currently you cannot. It also doesn't rebuild/refresh so much as update the config in the topo which is then picked up by the watchers. Looks like we can mostly copy the help output from vtctl: "Update the table throttler configuration for all cells and tablets of a given keyspace"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is just an overlooked copy+paste. Updated the comment.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@ajm188 is this looking good on your side? |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach can you
Rest LGTM |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
done
Added release notes |
ya, approving formally for completeness! great stuff |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
huh! I used to have the superpower to force merge a PR, I seem to not have it. Will solicit more approvals |
Description
A different implementation for dynamic throttler config from the one described in #11316
We have decided to implement dynamic throttler config in the following way:
vtctldclient UpdateThrottlerConfig
command.topo
, not in a backend_vt
table.vtgate
is postponed and to be re-evaluated if we want such control.show vitess_throttler status
query returns per-tablet throttler state.--throttler-config-via-topo
Discussion & details.
The main deviation from #11316 is that we do not use a
_vt
backend table to store the throttler's config, and instead store it intopo
. There are multiple reasons to that:_vt
means replica tablets are susceptible to replication lag, which introduces a dependency looptopo
seems a logical place to set this kind of configurationtopo
listeners/callback to simplify the propagation of information to the tablets.We utilize local
topo
s ; changes to configuration will apply to all cell-topos of a keyspace.We do require global
topo
to be available if you want to make a change to the throttler. This is because we need globaltopo
to tell us where to find per-celltopo
s.vtctldclient UpdateThrottlerConfig
You indicate the specific configuration changes you make to the throttler, like so:
Examples:
Any changes made, are sent to all tablets of given keyspace, in all cells and all shards.
10sec
at current configuration).Configuration and backwards compatibility
Today, the throttler is controlled per-tablet via
vttablet
command line flags:enable_lag_throttler
throttle_threshold
throttle_metrics_threshold
(used when metrics query is defined, overrides the above, and that's confusing)throttle_metrics_query
throttle_check_as_check_self
The above five flags are consolidated into four in the new
ThrottlerConfig
proto:For backwards compatibility, the existing
vttablet
flags are still accepted, but will be deprecated in the future. A newvttablet
flag,--throttler-config-via-topo
, indicates that aSrvKeyspace_ThrottlerConfig
configuration (i.e. configuration stored intopo
) overrides the above flags. The way to transition into the new setup is to first run your vitess cluster with existing configuration, untouched. Then, populate topo with the new config viavtctldclient UpdateThrottlerConfig
as described above. Then, add--throttler-config-via-topo
and restart tablets.show vitess_throttler status
The command
show vitess_throttler status
retrieves throttler status from tablets in all cells and shards. To clarify, the command does not read anything fromtopo
. The command represents how the tablets are actually running the throttler. Is it enabled? Disabled? What's the threshold?At this time the query is only sent to
PRIMARY
tablets, but we will follow up and make it run on all tablets.Tests
The main test in this PR is the new
go/test/endtoend/tabletmanager/throttler_topo/throttler_test.go
, which is tested in a new CI workflow/shard calledtabletmanager_throttler_topo
.This new test runs the throttler, changes configuration dynamically, enables, disables, changes threshold, changes the metrics query, etc. etc.
This PR also has a minor effect on on-demand heatbeats to ensure they get an initial "kick" upon startup, and both
tabletmanager_throttler
andtabletmanager_throttler_custom_config
are adjusted accordingly. In the future, we will delete those two tests/workflows/shards, and keep onlytabletmanager_throttler_topo
.Related Issue(s)
#11316
Checklist
Deployment Notes