Skip to content

Read Free Replication

Yoshinori Matsunobu edited this page May 31, 2024 · 9 revisions

MyRocks has a feature called "Read Free Replication", which significantly speeds up replication performance. In general, it is useful when replicas keep lagging due to massive writes on primary instances. Replica lag often causes big reliability events such as availability loss and stale reads. Read Free Replication is an effective way to recover from large replica lags even under very high write workloads on primary instances. It has a notable downside -- loses almost all real time consistency checks over replication. It is a trade off between write throughput and consistency check, and you have a choice to pick.

Read Free Replication is skipping reading rows from databases but instead reading rows from binary logs via Before Row Image. It works as binary logs have all necessary images to compose RocksDB write API calls. This improves all INSERT (Write_rows), UPDATE (Update_rows), and DELETE (Delete_rows) performance, as they all need GetForUpdate() to read by primary keys and Read Free Replication skips them. On cache miss, GetForUpdate() reads RocksDB data block from storage, and decompresses it, which needs substantial IO and CPU. Write calls (Put(), Delete() and SingleDelete()) just write to MemTable, WAL and binary logs, which doesn't trigger foreground I/O or compression often. Especially if your database working set size is much larger than RocksDB block cache, GetForUpdate() time spent is relatively high, and Read Free Replication significantly improves replication catchup speed, often by a few times.

Read Free Replication has the following control variables.

  • rocksdb_read_free_rpl -- OFF (default), PK_ONLY and PK_SK. PK_ONLY means Read Free Replication is enabled for just tables without secondary keys. For tables with secondary keys, behavior is same as traditional replication. PK_SK means Read Free Replication is enabled for all tables. In all cases, Read Free Replication requires tables with primary keys. This is because without primary keys, Read Free Replication can not compose RocksDB API calls from binary log images.

  • rocksdb-read-free-rpl-tables=<table_name_regex> - Turn on the Read Free Replication for the specified list of tables, in regex. Default is for all tables.

Read Free Replication has some limitations and has to be used carefully, otherwise some indexes might get corrupted. General rule of thumb is you should not directly insert/update/delete on replicas, outside of replication. Here are two examples when secondary indexes get corrupted if you modify replicasdirectly.

  1. secondary keys lose some rows
create table t (id int primary key, i1 int, i2 int, value int, index (i1), index (i2)) engine=rocksdb;
insert into t values (1,1,1,1),(2,2,2,2),(3,3,3,3);

s:
delete from t where id <= 2;

m:
update t set i2=100, value=100 where id=1;

s:
mysql> select count(*) from t force index(primary);
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t force index(i1);
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t force index(i2);
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select * from t where id=1;
+----+------+------+-------+
| id | i1   | i2   | value |
+----+------+------+-------+
|  1 |    1 |  100 |   100 |
+----+------+------+-------+
1 row in set (0.00 sec)

mysql> select i1 from t where i1=1;
Empty set (0.00 sec)

mysql> select i2 from t where i2=100;
+------+
| i2   |
+------+
|  100 |
+------+
1 row in set (0.00 sec)
  1. Secondary keys have extra rows
M:
create table t (id int primary key, i1 int, i2 int, value int, index (i1), index (i2)) engine=rocksdb;
insert into t values (1,1,1,1),(2,2,2,2),(3,3,3,3);

S:
update t set i1=100 where id=1;

M:
delete from t where id=1;

S:
mysql> select count(*) from t force index(primary);
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t force index(i1);
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t force index(i2);
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select i1 from t where i1=100;
+------+
| i1   |
+------+
|  100 |
+------+
1 row in set (0.00 sec)

Read Free Replication has a downside that all real time consistency checks over Replication is not effective. With traditional replication, SQL thread stops if hitting duplicate key error, or row not found error. With Read Free Replication, SQL thread just overwrites them. The behavior is similar to replica_exec_mode=IDEMPOTENT.

In high level, you should not use Read Free Replication if you care about real time consistency checks over replication, and if your write workload is not high so that replicas don't lag often. It is a very effective feature to use Read Free Replication if write workload is so high that replicas lag very often.

Note that Read Free Replication changes behavior on replica, not primary. Primary still checks uniqueness and you hit unique key errors if you try to insert duplicate unique keys.

Clone this wiki locally