Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage index hinting for sharding data iterator query #380

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

mtaner
Copy link
Contributor

@mtaner mtaner commented Feb 12, 2025

Recently we have discovered that USE INDEX can slow the data iterator batching queries down as MySQL can decide to do a table scan instead, where as not using it allows the optimizer to select the correct index & avoid table scan. FORCE INDEX prevents MySQL from doing a table scan also but it removes the flexibility of selecting another index for the sharding key that might be more suitable. Another theory is that query optimizer is better in MySQL 8 and therefore we might not need to use "USE INDEX" to ensure MySQL is using the correct index with this version (index hinting was originally introduced for optimizer selecting the wrong index with 5.7 for some tables.)

This is why I am introducing a configuration to manage the index hinting for the sharded data iterator.

IndexHint // "use" by default. Options "none", "use", "force"

IndexHintingPerTable // has higher specificity then higher level IndexHint if specified

example:
IndexHintingPerTable: { 
   "blog": { // schema name
      "users": { // table name
	 "IndexHint": "force" // has higher level specificity then higher level IndexHint
	 "IndexName": "ix_users_some_id" // defaults to empty string, will override the index chosen by Ghostferry
	},
      "accounts": {
          "IndexName": "ix_accounts_some_id" // indexName will be ignored if it doesn't exist on the table
          // will inherit the higher level IndexHint as not specified
     }
 }

@mtaner mtaner requested a review from a team February 12, 2025 16:52
if indexHint == "force" {
return "FORCE INDEX (`" + indexName + "`)"
}

return "USE INDEX (`" + indexName + "`)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by this point, the validation should have caught "invalid" IndexHint options but for some reason it doesn't we always default to the existing behaviour with "USE INDEX"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to convert indexHint to lowercase before comparing, so it provides good flexibility for input, like use, Use, USE, ForCe .etc., which will be super convenient and less error-prone and easy to maintain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinedTables: config.JoinedTables,
DisableIndexHint: config.ShardedCopyFilterConfig.DisableIndexHint,
UseForceIndex: config.ShardedCopyFilterConfig.UseForceIndex,
IndexConfigPerTable: config.ShardedCopyFilterConfig.IndexConfigForTables(config.SourceDB),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be IndexHintingPerTable here?

Suggested change
IndexConfigPerTable: config.ShardedCopyFilterConfig.IndexConfigForTables(config.SourceDB),
IndexConfigPerTable: config.ShardedCopyFilterConfig. IndexHintingPerTable(config.SourceDB),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a method I created on ShardedCopyFilterConfig, so it is correct

Comment on lines 86 to 91
if config.ShardedCopyFilterConfig == nil {
config.ShardedCopyFilterConfig = &ShardedCopyFilterConfig{
IndexHintingPerTable: map[string]map[string]IndexConfigPerTable{},
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by default this is not configured? then where will this be configured?
And I didn't see there are unit tests for ShardedCopyFilterConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be not specified yes - we need to make sure we are backwards compatible. I didn't have to add unit tests before I was only generating an empty hash if it didn't exists but now I am specifically checking for "IndexHint" values so there is a test for this config validation.

sharding/config.go Outdated Show resolved Hide resolved
sharding/config.go Outdated Show resolved Hide resolved
sharding/config.go Outdated Show resolved Hide resolved
@mtaner mtaner force-pushed the mt/config-for-index-use-sharded-data-iterator branch 2 times, most recently from 60d1c61 to 82064d1 Compare February 13, 2025 14:50
selectPaginationKeys += " " + f.shardingKeyIndexHint(table, indexHint)
}

selectPaginationKeys += " WHERE " + quotedShardingKey + " = ? AND " + quotedPaginationKey + " > ?" +
" ORDER BY " + quotedPaginationKey + " LIMIT " + strconv.Itoa(int(batchSize))

return sq.Select(columns...).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happily surprised to see Squirrel here.

@mtaner mtaner force-pushed the mt/config-for-index-use-sharded-data-iterator branch from 82064d1 to f01145b Compare February 13, 2025 16:21
@mtaner mtaner merged commit b9d6cac into main Feb 13, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants