🎯 Goal: ensuring historical data, inserted before the introduction of the ZDM Proxy, is present on the Target database.
In order to completely migrate to Target, you must take care of the whole contents of the database. To this end, you will now download, build and launch DSBulk Migrator (a tool which, in turn, leverages the capabilities of DSBulk).
Note: since the data featured in this exercise is rather small, and the data migration itself is not the main topic of this exercise, we are not using "Cassandra Data Migrator" here. But if you need advanced data renconciliation features, or you are dealing with a database exceeding a few tens of GB, that might be your best option.
Verify that the entries inserted before the switch to using the ZDM Proxy are not found on Target.
To do so, if you went through the Astra CLI path, launch this command (editing the database name if different from zdmtarget
):
### host
astra db cqlsh zdmtarget \
-k zdmapp \
-e "SELECT * FROM zdmapp.user_status WHERE user='eva' limit 30;"
or, if you used the Astra UI, go to the Web CQL Console and run the statement:
### {"execute": false}
SELECT * FROM zdmapp.user_status WHERE user='eva' limit 30;
You should see just the few rows written once you restarted the API to take advantage of the ZDM Proxy.
Start the migration process by going to the migration
directory and obtain the source
code for DSBulk Migrator:
### {"terminalId": "host", "backgroundColor": "#C5DDD2"}
cd /workspace/zdm-scenario-katapod/migration/
git clone https://github.com/datastax/dsbulk-migrator.git
cd dsbulk-migrator/
# we pin a commit just to make sure the (versioned) jar name matches later on:
git checkout 9b8a3759d3b59bcbcea191164d791ec8adc83ce9
Build the project with (this may take 1-2 minutes):
### {"terminalId": "host", "backgroundColor": "#C5DDD2"}
cd /workspace/zdm-scenario-katapod/migration/dsbulk-migrator/
mvn clean package
You can now start the migration, providing the necessary connection and schema information (the "export cluster" will be Origin and the "import cluster" will be Astra DB). To make this process easier, the following commands read the required connection settings also from the dot-env file you already set up for the client application:
### {"terminalId": "host", "backgroundColor": "#C5DDD2"}
cd /workspace/zdm-scenario-katapod/migration/dsbulk-migrator/
. /workspace/zdm-scenario-katapod/scenario_scripts/find_addresses.sh
. /workspace/zdm-scenario-katapod/client_application/.env
java -jar target/dsbulk-migrator-1.0.0-SNAPSHOT-embedded-dsbulk.jar \
migrate-live \
-e \
--keyspaces=zdmapp \
--export-host=${CASSANDRA_SEED_IP} \
--export-username=cassandra \
--export-password=cassandra \
--import-username=${ASTRA_DB_CLIENT_ID} \
--import-password=${ASTRA_DB_CLIENT_SECRET} \
--import-bundle=${ASTRA_DB_SECURE_BUNDLE_PATH}
Once this command has completed, you will see that now all rows are on Target as well, including those written prior to setting up the ZDM Proxy.
To verify this,
if you went through the Astra CLI path, launch this command (editing the database name if different from zdmtarget
):
### host
astra db cqlsh zdmtarget \
-k zdmapp \
-e "SELECT * FROM zdmapp.user_status WHERE user='eva' limit 30;"
or, if you used the Astra UI, go to the Web CQL Console and run the statement:
### {"execute": false}
SELECT * FROM zdmapp.user_status WHERE user='eva' limit 30;
From this moment on, the data on Target will not diverge from Origin until the moment you decide to cut over and neglect Origin altogether.
🗒️ At this point, you might wonder whether Target is actually capable of sustaining the read workload your applications demand. Well, the perfect way to address this concern is to have the proxy perform asynchronous dual reads on it. Read on to find out.
Since the data migrator connects directly to Origin and Target, oblivious to the ZDM Proxy, the migration workload will not be reflected in the monitoring. You can confirm this by looking at the proxy instance graphs, which will show no read activity and the usual background write activity. In other words, the data migration occurs outside of the proxy's scope, hence will not be part of the metrics collected in the Grafana dashboard.