Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mysqldump work with gitbase #361

Closed
ajnavarro opened this issue Jul 2, 2018 · 15 comments
Closed

Make mysqldump work with gitbase #361

ajnavarro opened this issue Jul 2, 2018 · 15 comments
Assignees
Labels

Comments

@ajnavarro
Copy link
Contributor

Right now if you try to do a mysqldump, you will have the next error:

mysqldump --all-databases --port=3306 --host=localhost --protocol=tcp --user=root

mysqldump: Couldn't execute '/*!40100 SET @@SQL_MODE='' */': unknown error: syntax error at position 30 (1105)
@erizocosmico erizocosmico self-assigned this Jul 2, 2018
@erizocosmico
Copy link
Contributor

erizocosmico commented Jul 2, 2018

We need to support the following things:

  • Ignore comment lines, for example, /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
  • Support SHOW VARIABLES (vitess sqlparser tells you it's a SHOW VARIABLES, but it does not give you anything else, so it would have to be a handcrafted parser).
  • SET statements with @@SESSION and @@GLOBAL, which ideally should be able to access global and session scoped configurations. We do not have such things right now. We could just ignore them, though.
  • SELECT @@GLOBAL..., as the previous point, we need some way of accessing @@GLOBAL and @@SESSION as tables. These are a bit more complicated to ignore, though, as just returning nothing in those case does not work.
  • Probably more, that's how far I got just ignoring stuff.

I remember the discussion around supporting MySQL Workbench and we ended up closing it as it was not prioritary. Should we keep working on this, then, given it's the same constraints we had in that issue?

@smola
Copy link
Contributor

smola commented Jul 2, 2018

@erizocosmico gitbase interoperability is a mid-priority (P1) objective for Q3, but the tools that should be initially included in this interoperability are still not defined.

gitbase interoperability with 3rd party tools

@erizocosmico
Copy link
Contributor

erizocosmico commented Jul 3, 2018

Supported so far:

  • Remove comments.
  • Empty queries do not crash (see the /* SOME QUERY INSIDE COMMENT*/).
  • SHOW VARIABLES, which returns a dummy result.
  • SET, which returns an dummy result.
  • @@GLOBAL.FOO and @@SESSION.FOO always return nil and can be resolved.
  • SHOW DATABASES.
  • LOCK, which returns a dummy result.
  • USE, which returns a dummy result

However, these are way harder to "hack" (right now I just have a special case for them to get past them, but that should not go into the code):

SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IS NOT NULL GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE, TOTAL_EXTENTS, INITIAL_SIZE ORDER BY LOGFILE_GROUP_NAME```
SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME

These are all the queries being run:

/*!40100 SET @@SQL_MODE='' */
/*!40103 SET TIME_ZONE='+00:00' */
SHOW VARIABLES LIKE 'gtid\_mode'
SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IS NOT NULL GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE, TOTAL_EXTENTS, INITIAL_SIZE ORDER BY LOGFILE_GROUP_NAME
SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME
SHOW DATABASES
SHOW VARIABLES LIKE 'ndbinfo\_version'
SHOW CREATE DATABASE IF NOT EXISTS ``
show tables
LOCK TABLES `blobs` READ /*!32311 LOCAL */,`commit_blobs` READ /*!32311 LOCAL */,`commit_files` READ /*!32311 LOCAL */,`commit_trees` READ /*!32311 LOCAL */,`commits` READ /*!32311 LOCAL */,`files` READ /*!32311 LOCAL */,`ref_commits` READ /*!32311 LOCAL */,`refs` READ /*!32311 LOCAL */,`remotes` READ /*!32311 LOCAL */,`repositories` READ /*!32311 LOCAL */,`tree_entries` READ /*!32311 LOCAL */

And then mysqldump exits with

mysqldump: Couldn't execute 'show table status like 'blobs'': Commands out of sync; you can't run this command now (2014)

Although that query has never been executed in the server (at least it never got to the engine).

According to the manual, this is the reason this error could happen, which is very very helpful https://dev.mysql.com/doc/refman/8.0/en/commands-out-of-sync.html

So, I'm kind of in a dead end here.

Any thoughts? @smola @ajnavarro

@smola
Copy link
Contributor

smola commented Jul 3, 2018

I suggest using a more dumb thing such as mysqldump --skip-opt --no-create-db --force

@erizocosmico
Copy link
Contributor

I will try with that. I tried with --ignore-errors=2014 with no luck so far

@erizocosmico
Copy link
Contributor

erizocosmico commented Jul 3, 2018

UPDATE: same errors. Dump is completed, but the result is riddled with stuff like this:

mysqldump: Couldn't execute 'show table status like 'tree\_entries'': Commands out of sync; you can't run this command now (2014)
mysqldump: Couldn't execute 'SET SQL_QUOTE_SHOW_CREATE=1/*!40102 ,SQL_MODE=concat(@@sql_mode, _utf8 ',NO_KEY_OPTIONS,NO_TABLE_OPTIONS,NO_FIELD_OPTIONS') */': Commands out of sync; you can't run this command now (2014)
mysqldump: Couldn't execute 'SELECT `COLUMN_NAME` AS `Field`, `COLUMN_TYPE` AS `Type`, `IS_NULLABLE` AS `Null`, `COLUMN_KEY` AS `Key`, `COLUMN_DEFAULT` AS `Default`, `EXTRA` AS `Extra`, `COLUMN_COMMENT` AS `Comment` FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE TABLE_SCHEMA = 'foo' AND TABLE_NAME = 'tree_entries'': Commands out of sync; you can't run this command now (2014)
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

@mcuadros
Copy link
Contributor

@ajnavarro just a friendly ping that this is an OKR for this Q.

@ajnavarro
Copy link
Contributor Author

Actually, the description is really broad:

Ensure gitbase interoperability with 3rd party tools

We discovered that each tool is doing totally different queries, We focused to be compatible with MariaDB JDBC driver, to make it compatible with JVM applications like Spark.

Anyways, we'll have a look to see if we can make it work without implementing a lot of new statements.

@erizocosmico
Copy link
Contributor

Ran a mysql server with the log on just to get all the queries we will need to support for mysqldump to work. This is the full list of queries executed by mysqldump:

/*!40100 SET @@SQL_MODE='' */
/*!40103 SET TIME_ZONE='+00:00' */
/*!80000 SET SESSION information_schema_stats_expiry=0 */
SET SESSION NET_READ_TIMEOUT= 700, SESSION NET_WRITE_TIMEOUT= 700
SHOW VARIABLES LIKE 'gtid\_mode'
SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IS NOT NULL GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE, TOTAL_EXTENTS, INITIAL_SIZE ORDER BY LOGFILE_GROUP_NAME
SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME
SHOW DATABASES
SHOW VARIABLES LIKE 'ndbinfo\_version'
SHOW CREATE DATABASE IF NOT EXISTS `foo`
show tables
LOCK TABLES `bar` READ /*!32311 LOCAL */
show table status like 'bar'
SET SQL_QUOTE_SHOW_CREATE=1
SET SESSION character_set_results = 'binary'
show create table `bar`
SET SESSION character_set_results = 'utf8mb4'
show fields from `bar`
show fields from `bar`
SELECT /*!40001 SQL_NO_CACHE */ * FROM `bar`
SET SESSION character_set_results = 'binary'
use `foo`
select @@collation_database
SHOW TRIGGERS LIKE 'bar'
SET SESSION character_set_results = 'utf8mb4'
SET SESSION character_set_results = 'binary'
SELECT COLUMN_NAME,                       JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"')                FROM information_schema.COLUMN_STATISTICS                WHERE SCHEMA_NAME = 'foo' AND TABLE_NAME = 'bar'
SET SESSION character_set_results = 'utf8mb4'
UNLOCK TABLES

Maybe we can reduce those queries with some flags.

@ajnavarro
Copy link
Contributor Author

would be great to find some flags that reduce the number of executed queries.

@erizocosmico
Copy link
Contributor

The most I've been able to reduce it is by using --skip-triggers to skip getting triggers. What the other flags reduce is the output file.

@erizocosmico
Copy link
Contributor

erizocosmico commented Oct 5, 2018

Queries performed by mysqldump with outputs

These are the queries mysqldump performs and the outputs a real mysql server would output.

/*!40100 SET @@SQL_MODE='' */

Output: no rows

/*!40103 SET TIME_ZONE='+00:00' */

Output: no rows

/*!80000 SET SESSION information_schema_stats_expiry=0 */

Output: no rows

SET SESSION NET_READ_TIMEOUT= 700, SESSION NET_WRITE_TIMEOUT= 700

Output: no rows

SHOW VARIABLES LIKE 'gtid\_mode'

Output:

+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_mode     | OFF   |
+---------------+-------+
SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IS NOT NULL GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE, TOTAL_EXTENTS, INITIAL_SIZE ORDER BY LOGFILE_GROUP_NAME

Output: no rows

SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME

Output: no rows

SHOW DATABASES

Output:

+--------------------+
| Database           |
+--------------------+
| foo                |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
SHOW VARIABLES LIKE 'ndbinfo\_version'

Output: no rows

SHOW CREATE DATABASE IF NOT EXISTS `foo`

Output:

+----------+---------------------------------------------------------------------------------------------------------------------+
| Database | Create Database                                                                                                     |
+----------+---------------------------------------------------------------------------------------------------------------------+
| foo      | CREATE DATABASE /*!32312 IF NOT EXISTS*/ `foo` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ |
+----------+---------------------------------------------------------------------------------------------------------------------+
show tables

Output: tables

LOCK TABLES `bar` READ /*!32311 LOCAL */

Output: 0 rows

show table status like 'bar'

Output:

+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation          | Checksum | Create_options | Comment |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| bar  | InnoDB |      10 | Dynamic    |    3 |           5461 |       16384 |               0 |            0 |         0 |              4 | 2018-10-05 08:05:02 | 2018-10-05 08:05:20 | NULL       | utf8mb4_0900_ai_ci |     NULL |                |         |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
SET SQL_QUOTE_SHOW_CREATE=1

Output: no rows

SET SESSION character_set_results = 'binary'

Output: no rows

show create table `bar`

Output:

+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                      |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bar   | CREATE TABLE `bar` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SET SESSION character_set_results = 'utf8mb4'

Output: no rows

show fields from `bar`

Output:

+-------+---------+------+-----+---------+----------------+
| Field | Type    | Null | Key | Default | Extra          |
+-------+---------+------+-----+---------+----------------+
| id    | int(11) | NO   | PRI | NULL    | auto_increment |
| a     | text    | YES  |     | NULL    |                |
+-------+---------+------+-----+---------+----------------+
SELECT /*!40001 SQL_NO_CACHE */ * FROM `bar`

Output: everything in the table

SET SESSION character_set_results = 'binary'

Output: 0 rows

use `foo`

Output: 0 rows

select @@collation_database

Output:

+----------------------+
| @@collation_database |
+----------------------+
| utf8mb4_0900_ai_ci   |
+----------------------+
SET SESSION character_set_results = 'utf8mb4'

Output: no rows

SET SESSION character_set_results = 'binary'

Output: no rows

SELECT COLUMN_NAME,                       JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"')                FROM information_schema.COLUMN_STATISTICS                WHERE SCHEMA_NAME = 'foo' AND TABLE_NAME = 'bar'

Output: no rows

SET SESSION character_set_results = 'utf8mb4'

Output: no rows

UNLOCK TABLES

Output: no rows

Things that need to be implemented

  • Remove comments from queries (vitess parser does not do this)
  • Output empty result set if the query becomes empty after removing comments
  • Show databases
  • Implemente use to change database
  • Implement LOCK/UNLOCK tables
  • Implement show variables and make sure gtid_mode is returned
  • Implement INFORMATION_SCHEMA.FILES table, even if empty
  • Implement SHOW CREATE DATABASE
  • Implement SHOW CREATE TABLE
  • Implement INFORMATION_SCHEMA.COLUMN_STATISTICS table, even if empty
  • Implement JSON_EXTRACT UDF
  • Implement SHOW FIELDS FROM <table>
  • Implement show table status like <table>
  • Implement resolution of tables with database qualifier

@ajnavarro this is the full list of queries with their outputs and from them all the things we would need to implement for this to, in theory, be able to work correctly. Should we move forward with this, then?

@ajnavarro
Copy link
Contributor Author

@erizocosmico totally. Could you open several issues to be able to parallelize work? (some of that issues can be marked as hacktoberfest too)

@erizocosmico
Copy link
Contributor

Sure

@erizocosmico
Copy link
Contributor

Closing, this was already merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants