Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Support for gtid-errant-reset-master command; formalized GTID sets #617

Merged
merged 7 commits into from
Sep 16, 2018

Conversation

shlomi-noach
Copy link
Collaborator

This PR introduces the gtid-errant-reset-master command, applied on an instance:

Then this command "fixes" errant GTID transactions by way of RESET MASTER; SET GLOBAL gtid_purged....
This command is of course destructive to the server's binary logs. If any app is tailing the logs (https://github.com/github/gh-ost, other), or if binary logs are assumed to enable incremental restore, then this command is dangerous.


Also, GTID sets are more formally parsed. An old unused OracleGtidSet piece of code is revived.

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor September 12, 2018 08:21 Inactive
@shlomi-noach
Copy link
Collaborator Author

Web interface adds a "fix" button:

screen shot 2018-09-12 at 11 22 15

Command line looks like this:

orchestrator-client -c gtid-errant-reset-master -i myserver:3306

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor September 12, 2018 08:25 Inactive
@shlomi-noach
Copy link
Collaborator Author

Noting that errant GTID detection is incorrect for 2nd, 3rd, ... tier replicas, and is only correct for 1st tier replicas.

This is because of our way of asynchronously checking master & replicas, and where we heuristically discard master's UUID from the GTID set. However, we do not discard master's master UUID, nor 3rd degree master, ...
I need to think this over.

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor September 13, 2018 05:01 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor September 13, 2018 09:37 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor September 13, 2018 09:39 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor September 16, 2018 07:10 Inactive
@shlomi-noach
Copy link
Collaborator Author

2nd, 3rd, ..., nth tier errant GTID is now resolved correctly via introduced ancestry_uuid.

Retries on reset master and set global gtid_purged reduce risk of this dangerous operation breaking halfway through. Clear error messages indicate what the intended operation was and what the value of gtid_purged was expected to be, so that if something bad does happen, Web UI, CLI and the logs, all provide with sufficient info for a human to set things right.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant