WIP testing enum as part of PK #277

shlomi-noach · 2016-10-19T09:58:16Z

reference: #273

enum values are intentionally not alphabetically monotonic

reference: #273 enum values are intentionally not alphabetically monotonic

shlomi-noach · 2016-10-19T09:59:08Z

At this time the test fails, suggesting we suffer from the enum problem:

$ ./localtests/test.sh enum-pk
Building
Testing: enum-pk......ERROR enum-pk: checksum mismatch
---
1       11      red
1       11      green
2       13      green
3       17      orange
4       11      red
4       11      green
5       13      green
6       17      orange
7       11      red
7       11      green
8       13      green
9       17      orange
10      11      red
10      11      green
11      13      green
12      17      orange
13      11      red
13      11      green
14      13      green
15      17      orange
16      11      red
16      11      green
17      13      green
18      17      orange
19      11      red
19      11      green
20      13      green
21      17      orange
---
1       11      red
2       13      green
3       17      orange
4       11      red
4       11      green
5       13      green
6       17      orange
7       11      red
7       11      green
8       13      green
9       17      orange
10      11      red
10      11      green
11      13      green
12      17      orange
13      11      red
13      11      green
14      13      green
15      17      orange
16      11      red
16      11      green
17      13      green
18      17      orange
19      11      red
19      11      green
20      13      green
21      17      orange
+ FAIL

Note the very first two rows.

shlomi-noach · 2016-10-19T13:25:54Z

Fixed, via casting enum value to string as in concat(e).
This abuses the datatype and reduces chance of using the index. That is, consider e enum(...) that is part of primary key:

if e is first column in PK, we completely negate use of PK in row iteration. Bad migration performance. Even terrible.
if e is 2nd or 3rd column in PK, we're reducing use of PK in row iteration. Likely bad migration performance.

shlomi-noach · 2016-10-20T09:38:25Z

DO NOT MERGE at this time

shlomi-noach · 2016-10-20T13:25:59Z

all tests pass at this time

shlomi-noach · 2016-10-24T07:41:50Z

sent to testing in prod

druud62 · 2017-10-19T15:16:32Z

Or use 0+enum_col.

shlomi-noach · 2017-10-19T16:57:22Z

Or use 0+enum_col.

Sorry, can you please explain?

druud62 · 2017-10-19T17:33:23Z

I meant it like this:

CREATE TABLE test (
  tmain enum ('X','A','0','Test','42') NOT NULL
, tsub enum ('9','2','M','S') NOT NULL
, tname varchar(24)
, PRIMARY KEY (tmain,tsub)
, UNIQUE (tname)
);
Query OK, 0 rows affected (0.02 sec)

INSERT INTO test (tmain,tsub,tname) 
VALUES
  ('0','M','0-M')
, ('X','M','X-M')
, ('Test','9','Test-9');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

SELECT * FROM test;
+-------+------+--------+
| tmain | tsub | tname  |
+-------+------+--------+
| 0     | M    | 0-M    |
| Test  | 9    | Test-9 |
| X     | M    | X-M    |
+-------+------+--------+
3 rows in set (0.00 sec)

SELECT * FROM test FORCE INDEX (PRIMARY);
+-------+------+--------+
| tmain | tsub | tname  |
+-------+------+--------+
| X     | M    | X-M    |
| 0     | M    | 0-M    |
| Test  | 9    | Test-9 |
+-------+------+--------+
3 rows in set (0.00 sec)

SELECT 0+tmain, 0+tsub, tname FROM test;
+---------+--------+--------+
| 0+tmain | 0+tsub | tname  |
+---------+--------+--------+
|       3 |      3 | 0-M    |
|       4 |      1 | Test-9 |
|       1 |      3 | X-M    |
+---------+--------+--------+
3 rows in set (0.00 sec)

SELECT 0+tmain, 0+tsub, tname FROM test FORCE INDEX (PRIMARY);
+---------+--------+--------+
| 0+tmain | 0+tsub | tname  |
+---------+--------+--------+
|       1 |      3 | X-M    |
|       3 |      3 | 0-M    |
|       4 |      1 | Test-9 |
+---------+--------+--------+
3 rows in set (0.00 sec)

shlomi-noach · 2017-10-20T04:02:08Z

Sorry, I still don't follow: is this a suggestion to the user on how to use an enum in a numeric way? If so, wonderful, but this isn't the right place. This place is for commenting on the changes made to the code to support enum as part of PRIMARY KEY. gh-ost should (hopefully) support arbitrary use cases.
I'd suggest turning your comment into a blog post.

druud62 · 2017-10-24T20:37:26Z

I presumed that the numeric enum-casting is needed for writing the new table in perfect PK-order.

shlomi-noach · 2017-10-25T05:19:52Z

@druud62 got it. The problem this actually reverse. We need to intentionally convert the enum to string, not to int. The numeric value is implicit -- no need to 0+val, but the correct handling is using the textual value. This is unfortunate because it leads to degraded performance.

See more on https://bugs.launchpad.net/percona-toolkit/+bug/1613915/comments/1

druud62 · 2017-10-25T09:11:39Z

In the nibbler of our internal table-clone tool, we do something like this:

CREATE TABLE `ruud_t1` (
  `e_grp` enum('E','B','U','G') NOT NULL,
  `e_id` int(10) unsigned NOT NULL DEFAULT '1',
  PRIMARY KEY (`e_grp`,`e_id`),
  KEY `e_id` (`e_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO ruud_t1 (e_grp) VALUES ('E'),('B'),('U'),('G');

# then run this several times, until you have a zillion rows
INSERT INTO ruud_t1 SELECT e_grp, e_id+(SELECT MAX(e_id) FROM ruud_t1) FROM ruud_t1;

SELECT * FROM ruud_t1 FORCE INDEX (PRIMARY) 
WHERE (e_grp,e_id) >= (2,(SELECT MAX(e_id)-2 FROM ruud_t1)) 
LIMIT 0,6
;
+-------+--------+
| e_grp | e_id   |
+-------+--------+
| B     | 524286 |
| B     | 524287 |
| B     | 524288 |
| U     |      1 |
| U     |      2 |
| U     |      3 |
+-------+--------+
6 rows in set (0.47 sec)

shlomi-noach · 2017-10-25T09:32:34Z

I see. One difference between your implementation and gh-ost's is that you are able to do one time full table scan, hence use FORCE INDEX(PRIMARY).

gh-ost takes it upon itself to iterate N rows at a time. To do that, it needs to compute the first tuple and last tuple according to the migration key (which is typically the PRIMARY KEY but not necessarily), and than take many steps, each time computing the range-end of the next N rows.

Or use 0+enum_col.

At the end of the day, whether we continue doing this as implemented by this PR (concat(enum_col)) or by your suggestion (0+enum_col), we lose index optimization since we run a function on the column value.

There may be a lead in your previous comment that's worth investigating. I'd need a clear time to switch context into this to give it more thought.

druud62 · 2017-10-25T10:14:18Z

The "FORCE INDEX (PRIMARY)" is mainly there to retrieve the right rows in the right order.
An "ORDER BY 1,2" would achieve the same.

Yes, with enums in the PK this approach needs a table scan. We store the starting-PK of each chunk, and use that list to process the chunks. No need to store the last tuple too.

We also tried this https://dev.mysql.com/doc/refman/5.7/en/handler.html
for nibbling, but the profit was zero.

The real PK of a table like this, would in many cases be an id++, with either a uniq (e_id, e_grp), or with a uniq (e_grp,e_id) and a key (e_id).
(tables with a secondary key, generally shouldn't have a multi-column PK)
So the example is quite academic.

ggunson · 2017-10-26T21:43:47Z

The "FORCE INDEX (PRIMARY)" is mainly there to retrieve the right rows in the right order.
An "ORDER BY 1,2" would achieve the same.

gh-ost doesn't require a primary key (it can use a unique key), and FORCE INDEX() in that case wouldn't guarantee an ordering on that index ascending, because it's still possible for the mysql optimizer to ignore that directive. It might decide to ignore the FORCE INDEX(PRIMARY) too, though I'm not sure. And MySQL docs say to specify the ORDER BY because it can't promise any particular/default ordering of results otherwise (even with GROUP BY after MySQL 8).

druud62 · 2017-10-27T09:19:05Z

I don't remember ever having seen it ignore a FORCE INDEX(). Idem IGNORE INDEX().

It does ignore 'USE INDEX()' hints, which generally works out good, providing that the table statistics aren't too old.

(Makes me remember that I have to re-instate the ANALYZE LOCAL TABLE cronjob, that ran locally on all our mysql/maria/etc. boxes, as we recently found more tables with old table stats. The job touches each table once or twice per week, which really helps.)

ggunson · 2017-10-27T18:35:21Z

From the docs:

The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.

Less of an issue here, but I would still be explicit re: ordering in SQL statements, including for the human readability aspect.

I presumed that the numeric enum-casting is needed for writing the new table in perfect PK-order.

I'm not sure how concerned we are about that, @druud62 -- though I don't recall talking to @shlomi-noach about it. We're copying rows while also parsing and playing real time writes to the table, so it wouldn't be in perfect PK order. And if there's a desire for parallel copy that wouldn't be in perfect order either.

Unless there's a problem with the current implementation, I think we should close this and work on fixes that have more impact or performance improvement.

zmoazeni · 2017-10-27T18:53:21Z

writing the new table in perfect PK-order

I'm a bit naive on this, but what does this mean? What are the pros/cons in terms of performance or data consistency/reliablity where that applies?

As far as I understand, it doesn't matter what order data is written, MySQL will still cluster the data on disk based on the choice and value of the Primary Key (something I've had to change on large tables in order to cluster data more effectively for reads). Also known as a clustered index.

ggunson · 2017-10-27T18:57:23Z

@zmoazeni https://blog.jcole.us/2014/10/02/visualizing-the-impact-of-ordered-vs-random-index-insertion-in-innodb/

testing enum as part of PK

3986696

reference: #273 enum values are intentionally not alphabetically monotonic

solving the enum-as-part-of-pk bug

25166e3

Shlomi Noach added 2 commits October 20, 2016 10:07

added enum limitation documentation

adad4b6

Merge branch 'master' into enum-sort-test

82a76ae

shlomi-noach changed the title ~~testing enum as part of PK~~ WIP testing enum as part of PK Oct 20, 2016

fixing bug introduced for charset and timezone tests

b696106

merged master, resolved conflicts

ac61597

shlomi-noach merged commit d5e78e4 into master Oct 26, 2016

shlomi-noach deleted the enum-sort-test branch October 26, 2016 07:58

shlomi-noach mentioned this pull request Oct 26, 2016

Investigate how gh-ost (via go-lang) handles non-alphabetical enums in primary keys #273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP testing enum as part of PK #277

WIP testing enum as part of PK #277

shlomi-noach commented Oct 19, 2016

shlomi-noach commented Oct 19, 2016

shlomi-noach commented Oct 19, 2016

shlomi-noach commented Oct 20, 2016

shlomi-noach commented Oct 20, 2016

shlomi-noach commented Oct 24, 2016 •

edited

Loading

druud62 commented Oct 19, 2017

shlomi-noach commented Oct 19, 2017

druud62 commented Oct 19, 2017 •

edited

Loading

shlomi-noach commented Oct 20, 2017

druud62 commented Oct 24, 2017

shlomi-noach commented Oct 25, 2017

druud62 commented Oct 25, 2017 •

edited

Loading

shlomi-noach commented Oct 25, 2017 •

edited

Loading

druud62 commented Oct 25, 2017

ggunson commented Oct 26, 2017

druud62 commented Oct 27, 2017

ggunson commented Oct 27, 2017

zmoazeni commented Oct 27, 2017

ggunson commented Oct 27, 2017

WIP testing enum as part of PK #277

WIP testing enum as part of PK #277

Conversation

shlomi-noach commented Oct 19, 2016

shlomi-noach commented Oct 19, 2016

shlomi-noach commented Oct 19, 2016

shlomi-noach commented Oct 20, 2016

shlomi-noach commented Oct 20, 2016

shlomi-noach commented Oct 24, 2016 • edited Loading

druud62 commented Oct 19, 2017

shlomi-noach commented Oct 19, 2017

druud62 commented Oct 19, 2017 • edited Loading

shlomi-noach commented Oct 20, 2017

druud62 commented Oct 24, 2017

shlomi-noach commented Oct 25, 2017

druud62 commented Oct 25, 2017 • edited Loading

shlomi-noach commented Oct 25, 2017 • edited Loading

druud62 commented Oct 25, 2017

ggunson commented Oct 26, 2017

druud62 commented Oct 27, 2017

ggunson commented Oct 27, 2017

zmoazeni commented Oct 27, 2017

ggunson commented Oct 27, 2017

shlomi-noach commented Oct 24, 2016 •

edited

Loading

druud62 commented Oct 19, 2017 •

edited

Loading

druud62 commented Oct 25, 2017 •

edited

Loading

shlomi-noach commented Oct 25, 2017 •

edited

Loading