Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

relay can't continue after upstream turn off GTID #1460

Closed
Tammyxia opened this issue Feb 25, 2021 · 3 comments
Closed

relay can't continue after upstream turn off GTID #1460

Tammyxia opened this issue Feb 25, 2021 · 3 comments
Labels
affected-v2.0.2 this issue/BUG affects v2.0.2 affected-v2.0.3 this issue/BUG affects v2.0.3 affected-v2.0.4 this issue/BUG affects v2.0.4 affected-v2.0.5 this issue/BUG affects v2.0.5 affected-v2.0.6 this issue/BUG affects v2.0.6 affected-v2.0.7 this issue/BUG affects v2.0.7 severity/minor type/bug This issue is a bug report

Comments

@Tammyxia
Copy link

Tammyxia commented Feb 25, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
    I. Create upstream MYSQL with GTID enabled, create db "enableall_21_1".
    II. Create upstream source (enable-gtid: true) and start task, data migration finished successful.
    III. Change mysql configuration to GTID disabled
    IIII. Create new upstream source(enable-gtid: false enable-relay: true), new database "relay_31", start task to migrate the new db.

  2. What did you expect to see?
    the new migration task can be finished successfully.

  3. What did you see instead?
    Downstream TiDB QPS is 0.
    tiup dmctl --master-addr 172.16.4.214:8261 query-status relay-sync-31
    Starting component dmctl: /root/.tiup/components/dmctl/v2.0.1/dmctl/dmctl --master-addr 172.16.4.214:8261 query-status relay-sync-31
    {
    "result": true,
    "msg": "",
    "sources": [
    {
    "result": true,
    "msg": "",
    "sourceStatus": {
    "source": "mysql-replica31",
    "worker": "dm-172.16.5.108-8262",
    "result": null,
    "relayStatus": {
    "masterBinlog": "(mysql-bin.000136, 1034613034)",
    "masterBinlogGtid": "c67e83ef-75a0-11eb-9d7e-0242ac110002:1-253095075",
    "relaySubDir": "c67e83ef-75a0-11eb-9d7e-0242ac110002.000001",
    "relayBinlog": "(mysql-bin.000085, 147762084)",
    "relayBinlogGtid": "c67e83ef-75a0-11eb-9d7e-0242ac110002:1-153065071",
    "relayCatchUpMaster": false,
    "stage": "Paused",
    "result": {
    "isCanceled": false,
    "errors": [
    {
    "ErrCode": 30015,
    "ErrClass": "relay-unit",
    "ErrScope": "upstream",
    "ErrLevel": "high",
    "Message": "TCPReader get relay event with error",
    "RawCause": "ERROR 1236 (HY000): Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF, at file ./mysql-bin.000085, position 147762084.; the first event 'mysql-bin.000085' at 147762084, the last event read from './mysql-bin.000085' at 147762149, the last byte read from './mysql-bin.000085' at 147762149.",
    "Workaround": ""
    }
    ],
    "detail": null
    }
    }
    },
    "subTaskStatus": [
    {
    "name": "relay-sync-31",
    "stage": "Running",
    "unit": "Load",
    "result": null,
    "unresolvedDDLLockID": "",
    "load": {
    "finishedBytes": "2207738142",
    "totalBytes": "2210916342",
    "progress": "99.86 %",
    "metaBinlog": "(mysql-bin.000135, 776111422)",
    "metaBinlogGTID": "c67e83ef-75a0-11eb-9d7e-0242ac110002:1-253095075"
    }
    }
    ]
    }
    ]
    }

MySQL [mysql]> show binlog events in 'mysql-bin.000085' limit 10;
+------------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------+
| mysql-bin.000085 | 4 | Format_desc | 1 | 123 | Server ver: 5.7.31-log, Binlog ver: 4 |
| mysql-bin.000085 | 123 | Previous_gtids | 1 | 194 | c67e83ef-75a0-11eb-9d7e-0242ac110002:1-152750684 |
| mysql-bin.000085 | 194 | Gtid | 1 | 259 | SET @@SESSION.GTID_NEXT= 'c67e83ef-75a0-11eb-9d7e-0242ac110002:152750685' |
| mysql-bin.000085 | 259 | Query | 1 | 341 | BEGIN |
| mysql-bin.000085 | 341 | Table_map | 1 | 408 | table_id: 1083 (enableall_21_1.sbtest4) |
| mysql-bin.000085 | 408 | Write_rows | 1 | 633 | table_id: 1083 flags: STMT_END_F |
| mysql-bin.000085 | 633 | Xid | 1 | 664 | COMMIT /* xid=152796446 */ |
| mysql-bin.000085 | 664 | Gtid | 1 | 729 | SET @@SESSION.GTID_NEXT= 'c67e83ef-75a0-11eb-9d7e-0242ac110002:152750686' |
| mysql-bin.000085 | 729 | Query | 1 | 811 | BEGIN |
| mysql-bin.000085 | 811 | Table_map | 1 | 878 | table_id: 1080 (enableall_21_1.sbtest3) |
+------------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------+
10 rows in set (0.00 sec)

The new task fail to migrate db "relay_31", while error log is about other db "enableall_21_1".
As dev saids, the reason is relaydir is not changed(default dir) in both source configuration file, so the new task still will to migrate the previous relaydir.
4. Versions of the cluster

- DM version (run `dmctl -V` or `dm-worker -V` or `dm-master -V`):

    ```
   Release Version: v2.0.0-beta.2-274-g2fa864f3-dev

Git Commit Hash: 2fa864f
Git Branch: master
UTC Build Time: 2021-02-20 09:51:04
Go Version: go version go1.15.5 linux/amd64

    ```

- Upstream MySQL/MariaDB server version:

    ```
    (paste upstream MySQL/MariaDB server version here)
    ```

- Downstream TiDB cluster version (execute `SELECT tidb_version();` in a MySQL client):

    ```
    (paste TiDB cluster version here)
    ```

- How did you deploy DM: DM-Ansible or manually?

    ```
    (leave DM-Ansible or manually here)
    ```

- Other interesting information (system version, hardware config, etc):

    >
    >
  1. current status of DM cluster (execute query-status in dmctl)

  2. Operation logs

    • Please upload dm-worker.log for every DM-worker instance if possible
    • Please upload dm-master.log if possible
    • Other interesting logs
    • Output of dmctl's commands with problems
  3. Configuration of the cluster and the task

    • dm-worker.toml for every DM-worker instance if possible
    • dm-master.toml for DM-master if possible
    • task config, like task.yaml if possible
    • inventory.ini if deployed by DM-Ansible
  4. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for DM if possible

@Tammyxia
Copy link
Author

Now the workaround is to change relay log directory, add "relay-dir: ./relay_log3" in source configuration file.

@lance6716
Copy link
Collaborator

We could reproduce the problem in this way

  1. start a MySQL upstream with GTID enabled and start sync
  2. insert data-1
  3. stop task, stop source
  4. insert data-2
  5. disable GTID in upstream like https://dev.mysql.com/doc/refman/5.7/en/replication-mode-change-online-disable-gtids.html
  6. disable GTID in the source config file
  7. create source, start task

Task will be continued on data-2. While data-2 is written to binlog with GTID metadata, the current GTID setting is off, so MySQL will throw an error to explain "Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF".

Both relay and sync unit (relay disabled) of DM will meet this problem. User could solve the problem by starting sync from new location (remove the checkpoint, specify task.meta, ...) or enabled GTID in upstream again. For relay unit, there's an additional checkpoint that saved in relay.meta file, which need doc to tell user purge the relay log manually if he want.

@lance6716 lance6716 added type/feature-request This issue is a feature request and removed type/feature-request This issue is a feature request labels Apr 7, 2021
@lance6716 lance6716 changed the title Failed to start DM new task after change GTID from enable to disable relay can't continue after changing GTID from enable to disable Apr 7, 2021
@lance6716 lance6716 added severity/minor type/bug This issue is a bug report affected-v2.0.2 this issue/BUG affects v2.0.2 and removed type/feature-request This issue is a feature request labels Apr 7, 2021
@lance6716 lance6716 changed the title relay can't continue after changing GTID from enable to disable relay can't continue after upstream turn off GTID Apr 27, 2021
@lance6716 lance6716 added the affected-v2.0.3 this issue/BUG affects v2.0.3 label May 8, 2021
@lance6716 lance6716 added the affected-v2.0.4 this issue/BUG affects v2.0.4 label Jun 18, 2021
@GMHDBJD GMHDBJD added the affected-v2.0.5 this issue/BUG affects v2.0.5 label Jul 30, 2021
@GMHDBJD GMHDBJD added the affected-v2.0.6 this issue/BUG affects v2.0.6 label Aug 13, 2021
@lichunzhu lichunzhu added the affected-v2.0.7 this issue/BUG affects v2.0.7 label Sep 27, 2021
@lance6716
Copy link
Collaborator

lance6716 commented Dec 6, 2021

this is a rare case and we indeed can't handle gtid turn on/off dynamically. User can manually remove then add the source

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affected-v2.0.2 this issue/BUG affects v2.0.2 affected-v2.0.3 this issue/BUG affects v2.0.3 affected-v2.0.4 this issue/BUG affects v2.0.4 affected-v2.0.5 this issue/BUG affects v2.0.5 affected-v2.0.6 this issue/BUG affects v2.0.6 affected-v2.0.7 this issue/BUG affects v2.0.7 severity/minor type/bug This issue is a bug report
Projects
None yet
Development

No branches or pull requests

4 participants