-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Comments
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Jun 8, 2022
hari90
added a commit
that referenced
this issue
Sep 9, 2022
Summary: Compute the xcluster min safe read time for each namespace\DB and propagate it to all tservers. Each CDC Producer, the source cluster sends the safe time (last replicated operation ht, or leader safe time) to the consumers. On Consumer cluster, cdc_poller keeps track of the the min safe time it got from the producer. Periodically the cdc_consumer gets the safe time from all pollers and writes it to xcluster_safe_time table. XClusterSafeTimeService (which currently runs on master) runs a periodic task which read the entries in the table, and compute the min safe time per consumer namespace (DB in ysql). This information is stored in a new catalog entity XCLUSTER_SAFE_TIME, and propagated back to all tservers via the heartbeat. XClusterSafeTimeService split brain protection: During master failovers it is possible that the XClusterSafeTime task on the old node has not completed before the one on the new node starts. Or it may have some network requests that have not yet completed. The Task modifies two on-disk data that are still protected in these cases: - XClusterSafeTime Sys CatalogEntity: The task gets the master leader term at the start of the work and uses it to commit the entity change. This ensures that there has not been any leader change since the start of the work, and commit of the new entity. - Stale rows from the XClusterSafeTime Table: This is an idempotent operation, and can be run multiple times, even in parallel from multiple nodes. If the replication stream was destroyed and recreated, then it would just delay the new safe time computation by one round. There is no correctness issues in removing rows from the table, as Sys CatalogEntity stores the actual safeTime and it is guaranteed never to move backwards. xcluster_safe_time table Schema: universe_id string(HASH), tablet_id string(HASH), safe_time int64 Reduced the scope of `CDCConsumer::should_run_mutex_`. The background `RunThread` was holding this for the entire run causing Shutdown to block on it. With this new reduced scope Shutdown will be able to clear in-memory structures and call the Client Shutdown which will cause `RunThread` to fail and exit early. Dump ClusterConfig in yb-admin dump_masters_state Test Plan: xcluster_safe_time_service-test xcluster_safe_time-itest Manual test: Setup the cluster with replication: ./bin/yugabyted destroy ./bin/yb-ctl destroy ./bin/yb-ctl wipe_restart --data_dir ~/yugabyte-data1 --ip_start 1 --tserver_flags "yb_system_namespace_readonly=false,vmodule=xcluster_safe_time_service=1" ./bin/yb-ctl wipe_restart --data_dir ~/yugabyte-data2 --ip_start 10 --tserver_flags "yb_system_namespace_readonly=false,vmodule=xcluster_safe_time_service=1" ./bin/ysqlsh -h 127.0.0.1 -c "create table tbl1(a int);" ./bin/ysqlsh -h 127.0.0.10 -c "create table tbl1(a int);" #./build/latest/bin/yb-admin -master_addresses 127.0.0.1:7100 list_tables include_table_id | grep tbl1 ybadmin get_universe_config ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 setup_universe_replication e2ff1315-9811-4211-b8b8-386f5083049a 127.0.0.1:7100 000033e8000030008000000000004000 Get the safe time and make sure it moves up: ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 dump_masters_state console | grep XCluster XCluster Safe Time: safe_time_map { key: "000033e8000030008000000000000000" value: 6805014956461838336 } ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 dump_masters_state console | grep XCluster XCluster Safe Time: safe_time_map { key: "000033e8000030008000000000000000" value: 6805014997570662400 } Reviewers: slingam, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: jenkins-bot, yugaware, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D18579
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Jira Link: DB-1297
Description
Given we know each tablet's safe time, each target server needs to calculate the minimum safe time across all tablets and report this to the master on heartbeat. The master will keep a mapping of tserver -> min_safe_time, update this structure, and calculate a new global min to be sent back as part of the response.
The text was updated successfully, but these errors were encountered: