Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: VReplication SwitchTraffic allowed when reverse workflow not fully usable #16794

Open
mattlord opened this issue Sep 16, 2024 · 0 comments · May be fixed by #16762
Open

Bug Report: VReplication SwitchTraffic allowed when reverse workflow not fully usable #16794

mattlord opened this issue Sep 16, 2024 · 0 comments · May be fixed by #16762

Comments

@mattlord
Copy link
Contributor

mattlord commented Sep 16, 2024

Overview of the Issue

By default, when doing a SwitchTraffic command we setup a reverse vreplication workflow to later support the ReverseTraffic command if you need to revert the switch.

The problem, however, is that we didn't confirm that the reverse workflow could be fully managed (created, read, updated, deleted) before executing the initial SwitchTraffic command so that traffic could be subsequently reverted if necessary or desired. This could lead to a state where the reverse workflow is broken and/or later attempting to revert the traffic switch fails.

This is particularly relevant when working with "external" or "unmanaged" tablets that you would often use when first migrating to Vitess. In this case you have to manually setup the necessary privileges for the --db_filtered_user and other MySQL users that Vitess uses in the mysqld instances that these unmanaged tablets work with.

This happens today because the initial creation of the reverse workflow record itself and later starting it is done using the VReplicationExec RPC which uses the DBA user (because we have not completed the work here #12086). VReplication generally, however, now uses the filtered user. So we end up creating a reverse workflow record that we then cannot later read or modify in the purpose built RPCs from the work we've done so far in #12086 .

Reproduction Steps

git checkout main && make build

cd examples/local

./101_initial_cluster.sh; mysql < ../common/insert_commerce_data.sql; ./201_customer_tablets.sh; ./202_move_tables.sh

alias vtctldclient='command vtctldclient --server=localhost:15999'

command mysql -u root --socket ${VTDATAROOT}/vt_0000000$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)/mysql.sock vt_commerce -e "revoke select,insert,update,delete on *.* from vt_filtered@localhost"

vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer switchtraffic --tablet-types primary

❯ vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer reversetraffic --tablet-types primary
E0916 12:55:29.819421   15182 main.go:56] rpc error: code = Unknown desc = TabletManager.ReadVReplicationWorkflow on zone1-0000000100: SELECT command denied to user 'vt_filtered'@'localhost' for table 'vreplication' (errno 1142) (sqlstate 42000) during query: select id, source, pos, stop_pos, max_tps, max_replication_lag, cell, tablet_types, time_updated, transaction_timestamp, state, message, db_name, rows_copied, tags, time_heartbeat, workflow_type, time_throttled, component_throttled, workflow_sub_type, defer_secondary_keys, options from _vt.vreplication where workflow = 'commerce2customer_reverse' and db_name = 'vt_commerce'

❯ vtctldclient MoveTables --workflow commerce2customer_reverse --target-keyspace commerce show
{
  "workflows": []
}

Binary Version

❯ vtgate --version
vtgate version Version: 21.0.0-SNAPSHOT (Git revision 646bfd41c1fdfd52c79bc6ce31f6d42c94069e36 branch 'main') built on Mon Sep 16 12:51:11 EDT 2024 by matt@pslord.local using go1.23.1 darwin/arm64

Operating System and Environment details

N/A

Log Fragments

N/A
@mattlord mattlord self-assigned this Sep 16, 2024
@mattlord mattlord changed the title Bug Report: VReplication SwitchTraffic allowed when reverse vreplication workflow not usable Bug Report: VReplication SwitchTraffic allowed when reverse workflow not fully usable Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

1 participant