Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Transaction throttler does not respect config parameters passed to it in CLI #12549

Open
ejortegau opened this issue Mar 3, 2023 · 2 comments

Comments

@ejortegau
Copy link
Contributor

ejortegau commented Mar 3, 2023

Overview of the Issue

The transaction throttler takes several config parameters. Specifically, the docs indicate the following:

The configuration of the transaction throttler as a text formatted throttlerdata.Configuration protocol buffer message (default "target_replication_lag_sec: 2\nmax_replication_lag_sec: 10\ninitial_rate: 100\nmax_increase: 1\nemergency_decrease: 0.5\nmin_duration_between_increases_sec: 40\nmax_duration_between_increases_sec: 62\nmin_duration_between_decreases_sec: 20\nspread_backlog_across_sec: 20\nage_bad_rate_after_sec: 180\nbad_rate_increase: 0.1\nmax_rate_approach_threshold: 0.9\n")

However, at the very least the initial rate is not being passed to the max replication lag module that is used by the transaction throttler. This can be seen by checking code calls as follows:

TabletServer's TxThrottler is created here using txthrottler.NewTxThrottler(). That ends up instantiating a txThrottlerState which in turn creates a Throttler via throttler.NewThrottler() and the throttler.newThrottler(). The last one of those instantiates a MaxReplicationLagModule and, while doing so, uses NewMaxReplicationLagModuleConfig() which only clones the default configuration and overrides the MaxReplicationLagSec attribute. Other attributes in the configuration are only being passed to the underlying MaxReplicationLagModule later by calling Throttler.UpdateConfiguration(), but this is done after the new MaxReplicationLagModule instance has been created and set to have a rate that matches the one passed in the configuration (which had the default one of 100). Therefore, the MaxReplicationLagoModule always starts with an initial rate of 100 (the default) instead of the one passed by the CLI arguments.

Reproduction Steps

Start vttablet with:

-enable-tx-throttler -tx-throttler-config target_replication_lag_sec: 10 max_replication_lag_sec: 80 initial_rate: 1000 max_increase: 1 emergency_decrease: 0.5 min_duration_between_increases_sec: 2 max_duration_between_increases_sec: 5 min_duration_between_decreases_sec: 1 spread_backlog_across_sec: 1 age_bad_rate_after_sec: 180 bad_rate_increase: 0.1 max_rate_approach_threshold: 0.9 -tx-throttler-healthcheck-cells <cells>

Tail -f the vttablet log. Notice these messages:

I0303 06:57:54.345315   25610 max_replication_lag_module.go:378] rate was: not changed from: 100 to: 100

showing that the rate that it is using is the one coming from default config instead of the one coming from the CLI flag.

Binary Version

Version: 12.0.5 (Jenkins build 765) (Git revision 1c1fea83df branch 'HEAD') built on Fri Sep 30 16:10:04 UTC 2022 by root@8c60c8cf557f using go1.17.12 linux/amd64

Operating System and Environment details

NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
Linux 5.4.0-1096-aws
x86_64

Log Fragments

I0303 06:57:54.345315   25610 max_replication_lag_module.go:378] rate was: not changed from: 100 to: 100
@ejortegau ejortegau added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Mar 3, 2023
@ejortegau
Copy link
Contributor Author

Actually, the issue description text is factually incorrect - my bad, I had missed this when I filed it. The config seems to be passed here - though it seems to take a while to be used to increase the rate. I am going to close this.

@ejortegau
Copy link
Contributor Author

Re-opening after realizing that one parameter - Initial Max Rate - is indeed not being applied. Bug description has been updated.

@ejortegau ejortegau reopened this Mar 13, 2023
ejortegau added a commit to ejortegau/vitess that referenced this issue Mar 13, 2023
This addresses the issue reported in vitessio#12549

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
@shlomi-noach shlomi-noach added Component: TabletManager and removed Needs Triage This issue needs to be correctly labelled and triaged labels Mar 14, 2023
shlomi-noach pushed a commit that referenced this issue Mar 29, 2023
* Fix transaction throttler ignoring the initial rate

This addresses the issue reported in #12549

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Add missing override of max replication lag in `throttler.newThrottler()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Reorder functions to make diff easier to read

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix check for maxRate in `newThrottlerFromConfig()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix some CI pipeline issues

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comment.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix typo

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
ejortegau added a commit to slackhq/vitess that referenced this issue Jun 6, 2023
* Fix transaction throttler ignoring the initial rate

This addresses the issue reported in vitessio#12549

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Add missing override of max replication lag in `throttler.newThrottler()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Reorder functions to make diff easier to read

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix check for maxRate in `newThrottlerFromConfig()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix some CI pipeline issues

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comment.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix typo

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
timvaillancourt pushed a commit to slackhq/vitess that referenced this issue Apr 16, 2024
* Fix transaction throttler ignoring the initial rate

This addresses the issue reported in vitessio#12549

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Add missing override of max replication lag in `throttler.newThrottler()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Reorder functions to make diff easier to read

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix check for maxRate in `newThrottlerFromConfig()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix some CI pipeline issues

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comment.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix typo

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
timvaillancourt pushed a commit to slackhq/vitess that referenced this issue May 7, 2024
* Fix transaction throttler ignoring the initial rate

This addresses the issue reported in vitessio#12549

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Add missing override of max replication lag in `throttler.newThrottler()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Reorder functions to make diff easier to read

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix check for maxRate in `newThrottlerFromConfig()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix some CI pipeline issues

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comment.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix typo

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
timvaillancourt added a commit to slackhq/vitess that referenced this issue May 10, 2024
* Add basic metrics to `vttablet` transaction throttler (vitessio#12418)

* Add basic stats to vttablet tx throttler

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* test new metrics

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* reorder

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* short names

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add max rate

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Move NewGaugeFunc to under conditional

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Use env

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Remove env from TxThrottler struct

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Fix tests

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* PR suggestion

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Fix unit test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* reorder test vars

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Fix transaction throttler ignoring the initial rate (vitessio#12618)

* Fix transaction throttler ignoring the initial rate

This addresses the issue reported in vitessio#12549

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Add missing override of max replication lag in `throttler.newThrottler()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Reorder functions to make diff easier to read

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix check for maxRate in `newThrottlerFromConfig()`

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix some CI pipeline issues

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comment.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix typo

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>

* Cleanup panics in `txthrottler`, reorder for readability (vitessio#12901)

* Cleanup tx_throttler.go

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Cleanup tx_throttler.go #2

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Fix throttlerFactoryFunc

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Undo if-cond consolidation

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Undo struct shuffling

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* prove that disabled config returns nil error

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Improve test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Emit per workload labels for existing per table vttablet metrics (vitessio#12394)

* Emit per workload labels for existing per table vttablet metrics

This adds the possibility to configure vttablet (via CLI flag) to also have a
workload label for existing per table metrics (query counts, query times, query
errors, query rows affected, query rows returned, query error counts). Workload
can be any string that makes sense for the client application. For example, API
endpoint name, controller, batch job name, application name or something else.

This is usefult to be able to gain observability about how the query load is
distributed across different workloads.

This is achieved with two new CLI flags, namely:

* `enable-per-workload-table-metrics`: whether to enable or disable per
  workload metric collection - disabled by default to preserve the current
  behavior, thus making the new feature opt-in only.
* `workload-label`: a string to look for in query comments to identify the
  workload running the current query.

The workload is obtained by parsing query comments of the form:

/* ... <workload_label>=<workload_name>; ... */

For example, if vttablet is started with

`--enable-per-workload-table-metrics --workload-label app_name`

anda query is issued with a comment like

/* ... app_name=shop; ... */

then metrics will look like

```
vttablet_query_counts{plan="Select",table="dual", workload="shop"} 15479
```

instead of

```
vttablet_query_counts{plan="Select",table="dual"} 15479
```

Query comment parsing only takes place if `--enable-per-workload-table-metrics`
is used, as to not incur parsing performance impact if the user does not want
per workload metrics.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* make linter happy

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* fix flags e2e test

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comments:

* Obtain workload information on the vtgate instead of the vttablet, avoiding
  double parsing.
* Treat workload name as a query directive.
* Send workload name from vtgate to vttablet as ExecuteOptions.

Additionally, annotate tabletserver's execution span with the workload name
to also enrich traces with workload name data, in addition to metrics.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* A few fixes:

1. Rebuild some files with `make proto`.
2. Protect against nil ExecuteOptions on the tabletserver.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix flags e2e test

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comments

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fixes

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix a comment

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix e2e flag test

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Update JS code for protobuf changes.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix QueryEngine unit test

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix e2e flag test

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Fix spurious tab in comment

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* Address PR comment

Don't use dual format flag for new flags - stick with - separated ones.

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>

* remove mistaken git add

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* make vtadmin_web_proto_types

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* test unit_race test on go-version: 1.18.9

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Revert "test unit_race test on go-version: 1.18.9"

This reverts commit 922e897.

* CI: Misc test improvements to limit failures with various runners (vitessio#13825)

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Fix setup order to avoid races (vitessio#13871)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants