Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Throttler Config Via Topo Is Broken #12523

Closed
mattlord opened this issue Feb 28, 2023 · 0 comments · Fixed by #12520
Closed

Bug Report: Throttler Config Via Topo Is Broken #12523

mattlord opened this issue Feb 28, 2023 · 0 comments · Fixed by #12520

Comments

@mattlord
Copy link
Contributor

mattlord commented Feb 28, 2023

Overview of the Issue

In #11604 we added support for dynamic tablet throttler configuration.

The problem with the implementation we landed on is a general one because we stored the configuration directly in the cell local SrvKeyspace records.

The Keyspace record is the persistent and (mostly) static keyspace configuration stored in the global topo. The "Keyspace [Serving] Graph" is stored in the SrvKeyspace records across the cells and that is dynamic and ephemeral, regularly built and rebuilt from the global topo info and the live state. We build and rebuild it internally per cell and across all cells when certain operations happen, and a user can rebuild them as well using the RebuildKeyspaceGraph client command. Currently in release-16.0 and main, anytime a SrvKeyspace record in a cell is built/rebuilt, it WILL NOT have any previously configured ThrottlerConfig value. So what tablets have what config will be undefined and chaotic (not to mention the Vitess Operator’s SrvKeyspace pruning behavior).

We can instead store the configuration in the global topo Keyspace record and propagate that to the SrvKeyspace records on build/rebuild — leaving the rest of the work largely unchanged (see draft PR for an example).

There were also some more minor issues:

  1. The IsEnabled property is no longer a static property of the throttler via tablet flags but is dynamic. There was no way to observe whether or not the throttler was currently active/enabled on a tablet.
  2. We did not wait for the throttler to become enabled on all relevant tablets in the endtoend tests before proceeding (which was difficult to do because of the first issue).
  3. We did not return the cells we actually updated when we had partial results for updating the topo.

Reproduction Steps

An easy way to demonstrate the general problem:

git checkout main
make build
cd examples/local

./101_initial_cluster.sh

vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'

vtctldclient UpdateThrottlerConfig --enable commerce

vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'

vtctldclient RebuildKeyspaceGraph commerce

vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'

You will see that the throttler config is wiped out after the rebuild:

$ vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
null

$ vtctldclient UpdateThrottlerConfig --enable commerce

$ vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
{
  "enabled": true
}

$ vtctldclient RebuildKeyspaceGraph commerce

$ vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
null

Binary Version

v16.0.0 and v17.0.0-SNAPSHOT

Operating System and Environment details

N/A

Log Fragments

N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant