-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mis-reuse StoragePool::max_data_page_id cause data corruption after upgraded #9039
Comments
The root cause is:
tiflash/dbms/src/Storages/DeltaMerge/StoragePool.cpp Lines 518 to 520 in eb6272c
Related logs for a case that locate this bug:
|
A minimal reproduce ut:
|
This bug won't happen when the tiflash instance is newly deployed with version >= 6.5.0 with no existing data. Workaround after the crash happened: set the tables' tiflash replica to zero, scale-in the tiflash instance with corrupted data, then scale-out new tiflash instance. When upgrading a cluster < v6.5.0 to these LTS versions, tiflash may happen to this bug:
It is recommended that upgrading to at least v6.5.10/v7.1.6/v7.5.2/v8.1.0. If you have to upgrade from < v6.5.0 to those affected versions, there is a way to avoid this bug:
|
Reproduce another error, Steps:
|
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
alter table ... compact tiflash replica
to make the table has no data in the delta layer (this raises the possibility of the bug happening)2. What did you expect to see? (Required)
TiFlash instances restart successfully
3. What did you see instead (Required)
TiFlash fails to restart with error message like
a similar issue #8695
4. What is your TiFlash version? (Required)
v7.1.5
The text was updated successfully, but these errors were encountered: