If the autoid service node is not leader, save the cached ids back would cause potential 'Duplicate entry' error #46444

tiancaiamao · 2023-08-28T07:22:00Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

By read the code logic ... not easy to reproduce without special hack.

Imagine that tidb5 is autoid service leader.
It allocates id to 30000
Then tidb1 become leader, it continue to provide service.
tidb1 allocates to 34000
Then tidb5 exit, it doesn't know it's not leader anymore, and store 30000 the current max autoid
Later, there would be duplicated entry in range [30000, 34000)

2. What did you expect to see? (Required)

If the autoid service is not leader, it should not store the max autoid back when it exit.

3. What did you see instead (Required)

This line has bug!

tidb/autoid_service/autoid.go

Line 328 in 5ecf980

if s.leaderShip != nil {

4. What is your TiDB version? (Required)

bb7133 · 2023-08-29T09:59:23Z

I would like to add more details on how 'Duplicate entry' can be encountered due to this bug(which is not easy actually).

Suppose that we have 3 TiDB servers: A, B, and C. A is the auto id owner and allocate the auto ID, say that it has been allocated ID 1:

tidb> insert into t values ();
Query OK, 1 row affected (0.01 sec)

tidb> select * from t;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

Suppose that the auto id owner is changed from A to B due to some reason(for example, unable to renew its lease in etcd when the network is bad). When B serve allocating IDs, it skip the 'old range' reserved by A and starts with a larger offset, say, allocates 4001, 4002, 4003:

# auto id owner is changed from A to B
tidb> insert into t values (), (), ();
Query OK, 3 row affected (0.21 sec)

tidb> select * from t;
+------+
| a    |
+------+
|    1 |
| 4001 |
| 4002 |
| 4003 |
+------+
4 rows in set (0.01 sec)

TiDB A shuts down gracefully and calls forceRebase() to rebase the auto_increment to a old value('1' in this example).
auto id owner changes from B to C and triggers the reload of the new base of auto_increment, and then the allocation starts from 2:

tidb> insert into t values ();
Query OK, 1 row affected (0.21 sec)

tidb> select * from t;
+------+
| a    |
+------+
|    1 |
|    2 |
| 4001 |
| 4002 |
| 4003 |
+------+
5 rows in set (0.01 sec)

And when the INSERT(allocation) goes on, it increments to 4001 again and we can see Duplicate key error:

# Keep running `insert into t values ()`
...
ERROR 1062 (23000) at line 1: Duplicate entry '4001' for key 't.PRIMARY'
ERROR 1062 (23000) at line 1: Duplicate entry '4002' for key 't.PRIMARY'
ERROR 1062 (23000) at line 1: Duplicate entry '4003' for key 't.PRIMARY'

Step 4 is a MUST to trigger this bug because if TiDB B keeps acting as the auto id owner, the range of ID cached in it is always correct, and it will overwrite the correct value for 'base' of auto id when reserving a new range of ID.

… for AUTO_ID_CACHE=1 tables (#46445) close #46444

ToOnlyGaurav · 2023-08-30T08:27:32Z

Few more queries along the same line:

Is it possible that the cluster will heal itself automatically after sometime from this situation? If yes then how? Because our cluster got recovered from this situation after few hours.
Do we have any admin commands to recover from this sate by forcing rebalancing of range of cached ID or something like this. One of the alternate we were thinking to set auto_increment to higher number.
When system is generating duplicate ID, seems like we don't have any strategy/intelligence backed into the system to discard that duplicate ID and move ahead. Since system is generating the ID it should be smart enough to not repeat the same mistake of assigning the duplicate ID again and again.

… for AUTO_ID_CACHE=1 tables (pingcap#46445) close pingcap#46444

ref #46444

tiancaiamao · 2023-08-31T02:55:19Z

Is it possible that the cluster will heal itself automatically after sometime from this situation? If yes then how? Because our cluster got recovered from this situation after few hours.

Yes, when the insert operation adds more records continuously, the autoid could grow and finally reach a new value that is greater than all the previous allocated ones. Then the duplicate entry error is gone.

Do we have any admin commands to recover from this sate by forcing rebalancing of range of cached ID or something like this. One of the alternate we were thinking to set auto_increment to higher number.

alter table t auto_increment = xxx

xxx can be get from 'show table t next_row_id'
When the duplicate entry error happen, xxx can be get from 'select max({{auto_increment_column}}) from t', it may be larger than the value from 'show table t next_row_id', and this means a bug.

When system is generating duplicate ID, seems like we don't have any strategy/intelligence backed into the system to discard that duplicate ID and move ahead. Since system is generating the ID it should be smart enough to not repeat the same mistake of assigning the duplicate ID again and again.

alter table t auto_increment = xxx can be used to rebase the value to avoid the the duplicate id.

… for AUTO_ID_CACHE=1 tables (pingcap#46445) close pingcap#46444

… for AUTO_ID_CACHE=1 tables #46445 (#46541) close #46444

… for AUTO_ID_CACHE=1 tables (#46445) (#46500) close #46444

…) (#46530) ref #46444

… for AUTO_ID_CACHE=1 tables (#46445) (#46501) close #46444

…) (#46531) ref #46444

tiancaiamao added type/bug The issue is confirmed as a bug. severity/major affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.2 labels Aug 28, 2023

ti-chi-bot bot added may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Aug 28, 2023

tiancaiamao mentioned this issue Aug 28, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit for AUTO_ID_CACHE=1 tables #46445

Merged

12 tasks

tiancaiamao removed may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Aug 28, 2023

ti-chi-bot bot closed this as completed in #46445 Aug 30, 2023

ti-chi-bot bot pushed a commit that referenced this issue Aug 30, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

f516603

… for AUTO_ID_CACHE=1 tables (#46445) close #46444

This was referenced Aug 30, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit for AUTO_ID_CACHE=1 tables (#46445) #46500

Merged

autoid_service: fix potential 'duplicated entry' error when tidb exit for AUTO_ID_CACHE=1 tables (#46445) #46501

Merged

YangKeao pushed a commit to YangKeao/tidb that referenced this issue Aug 30, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

57ef9f2

… for AUTO_ID_CACHE=1 tables (pingcap#46445) close pingcap#46444

YangKeao mentioned this issue Aug 30, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit… #46507

Closed

12 tasks

tiancaiamao added a commit to tiancaiamao/tidb that referenced this issue Aug 30, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

b6a04d1

… for AUTO_ID_CACHE=1 tables (pingcap#46445) close pingcap#46444

This was referenced Aug 30, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit for AUTO_ID_CACHE=1 tables (#46445) #46511

Closed

autoid_service: add some debug log for rebase related operation #46526

Merged

ti-chi-bot bot pushed a commit that referenced this issue Aug 30, 2023

autoid_service: add some debug log for rebase related operation (#46526)

4f2f5e1

ref #46444

This was referenced Aug 30, 2023

autoid_service: add some debug log for rebase related operation (#46526) #46530

Merged

autoid_service: add some debug log for rebase related operation (#46526) #46531

Merged

tiancaiamao added a commit to tiancaiamao/tidb that referenced this issue Aug 31, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

e63b053

… for AUTO_ID_CACHE=1 tables (pingcap#46445) close pingcap#46444

tiancaiamao mentioned this issue Aug 31, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit for AUTO_ID_CACHE=1 tables #46445 #46541

Merged

12 tasks

ti-chi-bot bot pushed a commit that referenced this issue Aug 31, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

d48d236

… for AUTO_ID_CACHE=1 tables #46445 (#46541) close #46444

jebter added the sig/sql-infra SIG: SQL Infra label Sep 11, 2023

ti-chi-bot bot pushed a commit that referenced this issue Oct 11, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

ef54d6c

… for AUTO_ID_CACHE=1 tables (#46445) (#46500) close #46444

ti-chi-bot bot pushed a commit that referenced this issue Oct 11, 2023

autoid_service: add some debug log for rebase related operation (#46526…

9fdf362

…) (#46530) ref #46444

ti-chi-bot bot pushed a commit that referenced this issue Nov 27, 2023

autoid_service: fix potential 'duplicated entry' error when tidb exit…

a2aeb3b

… for AUTO_ID_CACHE=1 tables (#46445) (#46501) close #46444

ti-chi-bot bot pushed a commit that referenced this issue Nov 27, 2023

autoid_service: add some debug log for rebase related operation (#46526…

a221f8d

…) (#46531) ref #46444

bb7133 mentioned this issue Dec 14, 2023

release: add v7.1.3 release notes pingcap/docs-cn#15672

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If the autoid service node is not leader, save the cached ids back would cause potential 'Duplicate entry' error #46444

If the autoid service node is not leader, save the cached ids back would cause potential 'Duplicate entry' error #46444

tiancaiamao commented Aug 28, 2023

bb7133 commented Aug 29, 2023

ToOnlyGaurav commented Aug 30, 2023

tiancaiamao commented Aug 31, 2023

If the autoid service node is not leader, save the cached ids back would cause potential 'Duplicate entry' error #46444

If the autoid service node is not leader, save the cached ids back would cause potential 'Duplicate entry' error #46444

Comments

tiancaiamao commented Aug 28, 2023

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

bb7133 commented Aug 29, 2023

ToOnlyGaurav commented Aug 30, 2023

tiancaiamao commented Aug 31, 2023