Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff: can use multiple columns to split chunks #130

Merged
merged 54 commits into from
Jan 25, 2019

Conversation

WangXiangUSTC
Copy link
Contributor

@WangXiangUSTC WangXiangUSTC commented Nov 20, 2018

What problem does this PR solve?

  1. use tidb's statistical information to split chunk
  2. if no tidb's statistical information, use random function to split chunks, and can use multiple columns to split.

What is changed and how it works?

  1. for use tidb's statistical information, we can get buckets info by execute "show STATS_BUCKETS"
    for example:
mysql> SHOW STATS_BUCKETS WHERE db_name= "test" AND table_name="testa";
 			+---------+------------+----------------+-------------+----------+-----------+-------+---------+---------------------+---------------------+
 			| Db_name | Table_name | Partition_name | Column_name | Is_index | Bucket_id | Count | Repeats | Lower_Bound         | Upper_Bound         |
 			+---------+------------+----------------+-------------+----------+-----------+-------+---------+---------------------+---------------------+
 			| test    | testa      |                | PRIMARY     |        1 |         0 |    64 |       1 | 1846693550524203008 | 1846838686059069440 |
 			| test    | testa      |                | PRIMARY     |        1 |         1 |   128 |       1 | 1846840885082324992 | 1847056389361369088 |
 			+---------+------------+----------------+-------------+----------+-----------+-------+---------+---------------------+---------------------+

we can get the lower_bound and upper_bound as chunk's range.

  1. for random function, for example, there are these datas in mysql:
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    1 | b    |
|    1 | c    |
|    2 | c    |
|    2 | d    |
|    2 | c    |
+------+------+

will split to these chunks when chunk size is 1:

`id` = 1 AND `name` = "a"
`id` = 1 AND `name` > "a" AND `name` < "b"
`id` = 1 AND `name` >= "b" AND `name` < "c" 
`id` = 1 AND `name` = "c" 
`id` = 1 AND `name` > "c" 
`id` = 1 AND `name` < "a"
`id` > 1 AND `id` < 2 AND 
`id` = 2 AND `name` = "c" 
`id` = 2 AND `name` > "c" AND `name` <= "d" 
`id` = 2 AND `name` > "d" 
`id` = 2 AND `name` < "c"
`id` > 2 
`id` < 1 

Check List

Tests

  • Manual test

Related changes

  • Need to update the documentation
  • Need to be included in the release note

@IANTHEREAL
Copy link
Collaborator

CI was broken

@IANTHEREAL
Copy link
Collaborator

good job!

@IANTHEREAL
Copy link
Collaborator

IANTHEREAL commented Nov 30, 2018

@kennytm PTAL

pkg/dbutil/common.go Outdated Show resolved Hide resolved
pkg/dbutil/common.go Outdated Show resolved Hide resolved
@kennytm kennytm self-assigned this Nov 30, 2018
pkg/dbutil/common.go Outdated Show resolved Hide resolved
pkg/diff/chunk.go Outdated Show resolved Hide resolved
pkg/diff/chunk.go Outdated Show resolved Hide resolved
pkg/diff/chunk_test.go Show resolved Hide resolved
@IANTHEREAL
Copy link
Collaborator

LGTM

pkg/diff/chunk.go Outdated Show resolved Hide resolved
pkg/diff/chunk.go Show resolved Hide resolved
pkg/diff/chunk.go Outdated Show resolved Hide resolved
pkg/diff/chunk.go Show resolved Hide resolved
pkg/diff/chunk.go Outdated Show resolved Hide resolved
pkg/diff/chunk.go Show resolved Hide resolved
pkg/diff/chunk.go Outdated Show resolved Hide resolved
for _, indexColumn := range index.Columns {
for _, column := range tableInfo.Columns {
if column.Name.O == indexColumn.Name.O {
indexColumns = append(indexColumns, column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible one column appended multi times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, a index can only contain unique column

pkg/table-router/router_test.go Outdated Show resolved Hide resolved
pkg/table-router/router_test.go Outdated Show resolved Hide resolved
sync_diff_inspector/diff.go Show resolved Hide resolved
@WangXiangUSTC
Copy link
Contributor Author

@csuzhangxc PTAL again

@csuzhangxc
Copy link
Member

LGTM. but you need to resolve the conflict now.

@WangXiangUSTC WangXiangUSTC merged commit 559809b into pingcap:master Jan 25, 2019
@WangXiangUSTC
Copy link
Contributor Author

thanks @csuzhangxc

WangXiangUSTC added a commit to WangXiangUSTC/tidb-tools that referenced this pull request Feb 11, 2019
WangXiangUSTC added a commit that referenced this pull request Feb 13, 2019
* sync_diff_inspector: fix a misleading regex example in config.toml (#141)

* update pkg about database (#142)

* diff: add database name and table name router (#172)

* simplify the if/else logic (#179)

refactor to simplify the if/else logic

* diff: can use multiple columns to split chunks (#130)

* simplify the if/else logic (#191)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants