-
Notifications
You must be signed in to change notification settings - Fork 30
Synchronized Cluster Validation
This guide covers how to use the Replayer to replay captured traffic from a source cluster to a target cluster during the migration process. The Replayer allows users to verify that the target cluster can handle requests in the same way as the source cluster and catch up to real-time traffic for a smooth migration.
Replayer settings are configured during the deployment of the Migration Assistant. Make sure to set the authentication mode for the Replayer so it can properly communicate with the target cluster. Refer to the Limitations section below for details on how different traffic types are handled.
The --speedup-factor
option, passed via trafficReplayerExtraArgs
, adjusts the wait times between requests. For example:
- A speedup factor of
2
sends requests at twice the original speed (e.g., a request originally sent every minute will now be sent every 30 seconds). - A speedup factor of
0.5
will space requests further apart (e.g., requests every 2 minutes instead of every minute).
This setting can be used to stress test the target cluster or to catch up to real-time traffic, ensuring the target cluster is ready for production client switchover.
After deploying the Migration Assistant, the Replayer is not running by default. It should be started only after all metadata and documents have been migrated to ensure that recent changes to the source cluster are properly reflected in the target cluster.
For example, if a document was deleted after a snapshot was taken, starting the Replayer before the document migration is complete may cause the deletion request to execute before the document is even added to the target. Running the Replayer after all other migration processes ensures that the target cluster will be consistent with the source cluster.
To manage the Replayer, use the console replay
command:
-
Start the Replayer:
console replay start
This starts the Replayer with the options specified at deployment.
-
Check Replayer Status:
console replay status
This command shows whether the Replayer is running, pending, or desired. "Running" shows how many container instances are actively running, "Pending" indicates how many are being provisioned, and "Desired" shows the total number of instances that should be running.
-
Stop the Replayer:
console replay stop
Example Interactions
Check the status of the Replayer:
root@ip-10-0-2-66:~# console replay status
(<ReplayStatus.STOPPED: 4>, 'Running=0\nPending=0\nDesired=0')
Start the Replayer:
root@ip-10-0-2-66:~# console replay start
Replayer started successfully.
Service migration-dev-traffic-replayer-default set to 1 desired count. Currently 0 running and 0 pending.
Stop the Replayer:
root@ip-10-0-2-66:~# console replay stop
Replayer stopped successfully.
Service migration-dev-traffic-replayer-default set to 0 desired count. Currently 0 running and 0 pending.
The Replayer pulls traffic from Kafka and advances its commit cursor after requests have been sent to the target cluster. This provides an "at least once" delivery guarantee—requests will be replayed, but success is not guaranteed. You will need to monitor metrics, tuple outputs, or external validation to ensure the target cluster is performing as expected.
The Replayer sends requests in the same order they were received on each connection to the source. However, relative timing between different connections is not guaranteed. For example:
- Scenario: Two connections exist—one sends a PUT request every minute, and the other sends a GET request every second.
- Behavior: The Replayer will maintain the sequence within each connection, but the relative timing between the connections (PUTs and GETs) is not preserved.
Assume a source cluster responds to requests (GETs and PUTs) within 100ms:
- With a speedup factor of 1, the target will experience the same request rates and idle periods as the source.
- With a speedup factor of 2, requests will be sent twice as fast, with GETs sent every 500ms and PUTs every 30 seconds.
- At a speedup factor of 10, requests will be sent 10x faster, and as long as the target responds quickly, the Replayer can keep pace.
If the target cannot respond fast enough, the Replayer will wait for the previous request to complete before sending the next one. This may cause delays and affect global relative ordering.
During migrations, some requests may need to be transformed between versions. For example, Elasticsearch supported multiple type mappings in indices, but this is no longer the case in OpenSearch. Clients may need to adjust accordingly by splitting documents into multiple indices or transforming request data.
The Replayer automatically rewrites host and authentication headers, but for more complex transformations, custom transformation rules can be passed via the --transformer-config
option (as described in the Traffic Replayer README).
Suppose a source request includes a "tagToExcise" element that needs to be removed and its children promoted, and the URI path includes "extraThingToRemove" which should also be removed. The following Jolt script handles this transformation:
[{ "JsonJoltTransformerProvider":
[
{
"script": {
"operation": "shift",
"spec": {
"payload": {
"inlinedJsonBody": {
"top": {
"tagToExcise": {
"*": "payload.inlinedJsonBody.top.&"
},
"*": "payload.inlinedJsonBody.top.&"
},
"*": "payload.inlinedJsonBody.&"
},
"*": "payload.&"
},
"*": "&"
}
}
},
{
"script": {
"operation": "modify-overwrite-beta",
"spec": {
"URI": "=split('/extraThingToRemove',@(1,&))"
}
}
},
{
"script": {
"operation": "modify-overwrite-beta",
"spec": {
"URI": "=join('',@(1,&))"
}
}
}
]
}]
The resulting request to the target will look like this:
PUT /oldStyleIndex/moreStuff HTTP/1.0
host: testhostname
{"top":{"properties":{"field1":{"type":"text"},"field2":{"type":"keyword"}}}}
You can pass Base64-encoded transformation scripts via --transformer-config-base64
for convenience.
See Required Client Changes for more information on how clients will need to be updated.
The Replayer provides an "at least once" delivery guarantee but does not ensure request success when a replayed request arrives at the target cluster.
Encountering a compatibility issue or missing feature?
- Search existing issues to see if it’s already reported. If it is, feel free to upvote and comment.
- Can’t find it? Create a new issue to let us know.
- Migration Assistant Overview
- Is Migration Assistant Right for You?
- Existing Data Migration - Quick Start Guide
- A. Snapshot Creation Verification
- B. Client Traffic Switchover Verification
- C. Traffic Capture Verification
- D. System Reset Before Migration