[xCluster] Create API to Wait for Replication Drain #10978

rahuldesirazu · 2022-01-03T23:01:33Z

Jira Link: DB-1155

Description

For XCluster, we don't have an explicit API to verify when the Producer & Consumer are in sync. The normal mode of operation should include lag, since XCluster is asynchronous. However, there are some use cases where we would want to understand if the Producer has completely sent all operations to the Consumer (drain).

Cluster Migration. Stop writes on the old cluster (Producer). Don't use the new cluster (Consumer) until all transactions are sent over.
Schema Alters. Since we don't have full DDL support yet, it can be safer to just stop writes and change both sides before continuing. When doing this, we need to ensure there are no outstanding OPs with the old schema.
Testing. The customer is using synthetic data and wants to verify that XCluster works and is drained before continuing the next round of test data.
Truncates and PITR. (@rahuldesirazu for more info)

Additionally, if we extended this "drain" API to include an OpID or Timestamp, we use this API to wait for a particular catch up window when bootstrapping XCluster.

Implementation Notes:

Currently, the user can look at per-tablet metrics using Prometheus or Platform: "async_replication_committed_lag_micros", "last_checkpoint_opid_index".
We need to aggregate into per-table or a more user-friendly abstraction and verify that all tablets in that schema are caught up.
To ensure that we don't have any new writes, we need to check the "current_term" or "last_appended" on each tablet. Some clarification is here.
We need to think of some heuristic to verify drain and not additional writes. Maybe query all tablets that are not caught up, then do one final query after we think we're caught up to verify?

nspiegelberg · 2022-01-20T23:30:07Z

This API would need to disable splitting while waiting for the drain or take that into account.

bmatican · 2022-03-30T23:16:45Z

@hari90 taking this one back for now, as it might be a bit more work than initially thinking.

I'll pass you a separate one as an intro to producer side code first, instead.

…logManager and CDCService Summary: Issue: #10978 Design Doc: https://docs.google.com/document/d/1HB6zT2MX3NmKnlhhhJYbuCisVV-JKvgSO9gJAhul9OE/edit As a first step, implemented the API logic on CatalogManager and CDCService. - CatalogManager: repeatedly send RPCs to relevant TServvers through the API on CDCService - CDCService: check `cdc_metrics` for tablets in order to determine whether the replication on a tablet is caught-up (to some user specified point in time) **Notes on how the notion of caught-up is decided:** A new `cdc_metric` named `last_caughtup_physicaltime` is added. This metric is used to record the point in time such that we can safely assume that the consumer has caught up with the producer up to this time. In other words, this metric keeps track of the progress that the consumer has made in the replication. Metric is updated as follows: - If currently consumer has the latest record on producer, i.e. the lag is zero, update the metric to `GetCurrentTimeMicros()` - Otherwise, update the metric to its own value, or `last_checkpoint_physicaltime`, whichever is larger. This is to ensure that the metric value is always non-decreasing. The update happens in `GetChanges()`, since it is the only place that consumer could make progress in the replication. With `last_caughtup_physicaltime`, the logic of the API is greatly simplified: it merely compares the user specified timestamp (default to current time if not provided) against this new metric. **Since the API logic now entirely relies on the new cdc metric, if producer leadership changes & consumer never send GetChanges from this point on, the metric would be empty and the API would view this tablet as not drained. For this reason, the API only works when replication is still on.** Test Plan: Created three unit tests, command: ``` ./yb_build.sh --cxx-test integration-tests_twodc-test --test-timeout-sec 1200 --gtest_filter "*TwoDCTestWaitForReplicationDrain*" -n 20 ``` The tests cover three scenarios: - Consumer is unable to caught-up due to GetChanges being blocked (added a test flag `block_get_changes` in `cdc_service.cc`) - Consumer is unable to caught-up due to tservers being shut down - User specifies a point in time to check for replication drain Reviewers: rahuldesirazu, nicolas, jhe Reviewed By: jhe Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D17806

…YB-Admin Summary: Issue: #10978 Design Doc: https://docs.google.com/document/d/1HB6zT2MX3NmKnlhhhJYbuCisVV-JKvgSO9gJAhul9OE/edit As the second step, implemented the API on yb-admin. The CLI command is: ``` yb-admin wait_for_replication_drain <comma_separated_list_of_stream_ids> [<timestamp> | minus <interval>] ``` where the `minus <interval>` is the same format as in PITR (documentation [[ https://docs.yugabyte.com/preview/explore/cluster-management/point-in-time-recovery-ycql/ | here ]], or see `restore_snapshot_schedule` in `yb-admin_cli_ent.cc`). If all streams are caught-up, the API prints `All replications are caught-up.` to the console. Otherwise, it prints the non-caught-up streams in the following format: ``` Found undrained replications: - Under Stream <stream_id>: - Tablet: <tablet_id> - Tablet: <tablet_id> // ...... // ...... ``` Test Plan: ``` ./yb_build.sh --cxx-test yb-admin-test_ent --gtest_filter XClusterAdminCliTest.TestWaitForReplicationDrain -n 20 ``` Reviewers: rahuldesirazu, nicolas, jhe Reviewed By: jhe Subscribers: slingam, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D18363

rahuldesirazu added area/docdb YugabyteDB core features priority/high High Priority labels Jan 3, 2022

This was referenced Jan 3, 2022

[docdb] xCluster Roadmap Firehose #10319

Closed

[xCluster] Roadmap for 2.13 #11015

Closed

nspiegelberg mentioned this issue Feb 3, 2022

[xCluster] Improve Table Alter Cutover #11017

Closed

nspiegelberg assigned Rerenah Feb 22, 2022

nspiegelberg assigned hari90 and unassigned Rerenah Mar 29, 2022

nspiegelberg changed the title ~~[xCluster] Create API WaitForReplicationLag == 0~~ [xCluster] Create API to Wait for Replication Drain Mar 30, 2022

bmatican unassigned hari90 Mar 30, 2022

bmatican removed the priority/high High Priority label Mar 30, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Jul 30, 2022

vkulichenko mentioned this issue Aug 5, 2022

[DR] Support 2-region DR #13532

Open

yugabyte-ci closed this as completed Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xCluster] Create API to Wait for Replication Drain #10978

[xCluster] Create API to Wait for Replication Drain #10978

rahuldesirazu commented Jan 3, 2022 •

edited by yugabyte-ci

Loading

nspiegelberg commented Jan 20, 2022

bmatican commented Mar 30, 2022

[xCluster] Create API to Wait for Replication Drain #10978

[xCluster] Create API to Wait for Replication Drain #10978

Comments

rahuldesirazu commented Jan 3, 2022 • edited by yugabyte-ci Loading

Description

nspiegelberg commented Jan 20, 2022

bmatican commented Mar 30, 2022

rahuldesirazu commented Jan 3, 2022 •

edited by yugabyte-ci

Loading