-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: VTOrc should change tablet type of tablets that have errant GTIDs on them #13872
Comments
The risk is with exhausting the entire replica fleet, so that you end up with no
|
@shlomi-noach I don't know, I have mixed feelings about that too...
Is there a good way to handle these situations? |
Unless you take a backup from this server and use it to seed th erest of the tablets. I think that the suggested approach is super opinionated and that different OSS users will have different opinions. If you can make this configurable - that's good. I'd tell you that in a production environment, I'd prefer having proper alerting on errant GTID, along with tooling to fix the errant GTID, rather than have some automation purge replicas from my cluster to the point of leaving the |
Alright, I think that can be done. I'll put this functionality of changing tablet type of tablets with errant GTIDs behind a flag. As far as alerting goes, we already have that, so I think just making this feature optional should be a good addition. |
Great feature request! I'm expecting mixed feedback based on use-cases here, but adding my perspective below
I (personally) agree with this statement 👍
Replication being broken is very bad, but having no
I feel this approach (keep at least N x |
@timvaillancourt: updating that in #13873, there's a new configurable behavior (default: false), |
Feature Description
Description
We should get VTOrc to change the tablet type of tablets that have errant GTIDs on them and get the type converted to
drained
. This way we prevent these tablets from getting promoted down the line and causing a load of problems.Use Case(s)
If a tablet ends up with errant GTID (by whatever way), and if we don't remove it from the topology, there is a slight chance that it can end up getting promoted. When that happens, it breaks the replication on all the other tablets, leading to down time. This feature of VTOrc to demote a tablet with errant GTIDs would fix this problem.
The text was updated successfully, but these errors were encountered: