-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing regions may be blocked for a long time when dropping a table with large amount of data #9437
Comments
When a table is dropped in tidb, and exceeds the gc_safepoint, tiflash will generate an
However, But more raft-message comes into the tiflash instance, the memory usage grows and cause OOM kills. After restarts, the tiflash instance runs into the same blocking again. And at last, all the segments (around 30,000 in total) are removed from tiflash. And tiflash begins to catch-up the raft message.
|
Affected versions For the old affected versions before v7.5.x, we can pick the same logic as fixing #8710. That is to ensure all the regions are removed before physically dropping the data from TiFlash instance. In this way, the |
Mark it as major because it may block the raft threads and cause OOM when dropping a large volume table, but it can self-recover. |
Close as #9442 is fixed in the release-7.1 branch. |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
The dropped table gets dropped smoothly, without affecting TiFlash's raft-log syncing and causing failed queries
3. What did you see instead (Required)
The raft-log syncing is blocked for tens of minutes, and coming raft messages make tiflash OOM.
And raft-log syncing is blocked also makes failed queries because the learner read timeout by waiting raft-log syncing index.
4. What is your TiFlash version? (Required)
v7.1.3
The text was updated successfully, but these errors were encountered: