-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySql case sensitive collation varchar push down support #18140
MySql case sensitive collation varchar push down support #18140
Conversation
91cca67
to
0c366ad
Compare
0c366ad
to
ad066a2
Compare
c320ef3
to
120f91e
Compare
@@ -426,14 +487,14 @@ public Optional<ColumnMapping> toColumnMapping(ConnectorSession session, Connect | |||
return Optional.of(decimalColumnMapping(createDecimalType(precision, max(decimalDigits, 0)))); | |||
|
|||
case Types.CHAR: | |||
return Optional.of(defaultCharColumnMapping(typeHandle.getRequiredColumnSize(), false)); | |||
return Optional.of(mySqlDefaultCharColumnMapping(typeHandle.getRequiredColumnSize(), typeHandle.getCaseSensitivity().orElse(CASE_INSENSITIVE) == CASE_SENSITIVE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know that it is case sensitive by default? Is it possible to change the default collation by session or connection properties in MySQL if it is not defined in DDL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typeHandle.getCaseSensitivity().orElse(CASE_INSENSITIVE)
It's case insensitive by default, not to allow push downs.
In MySql there is connection/session property collation_connection
but it's not counted, when we deal with text columns:
For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
|
||
// varchar inequality | ||
assertThat(query("SELECT regionkey, nationkey, name FROM nation_collate WHERE name != 'ROMANIA' AND name != 'ALGERIA'")) | ||
.isFullyPushedDown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have hasCorrectResultsRegardlessOfPushdown();
inside so it checks that results are the same with and without pushdown
@@ -337,6 +342,87 @@ public void testPredicatePushdown() | |||
.isFullyPushedDown(); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test copied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's taking partially (for string columns) from BaseMySqlConnector.testPredicatePushdown and SqlServerConnectorTest (In SqlServer default column collation is case sensitive so we modified original test).
In MySql - no, so we need additional test with explicit collation.
@Test | ||
public void testPredicatePushdownWithCollation() | ||
{ | ||
onRemoteDatabase().execute("CREATE TABLE tpch.nation_collate AS SELECT regionkey, nationkey, CONVERT(name USING utf8) COLLATE utf8_bin AS name FROM tpch.nation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need nation like structure? Why not to have int, varchar
only columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took nation table to have the same test cases for better readability and comparison
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
120f91e
to
8079c7f
Compare
plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java
Outdated
Show resolved
Hide resolved
8079c7f
to
7ce25f1
Compare
@@ -197,6 +213,26 @@ public class MySqlClient | |||
private final ConnectorExpressionRewriter<ParameterizedExpression> connectorExpressionRewriter; | |||
private final AggregateFunctionRewriter<JdbcExpression, ?> aggregateFunctionRewriter; | |||
|
|||
private static final PredicatePushdownController MYSQL_CHARACTER_PUSHDOWN = (session, domain) -> { | |||
if (domain.isNullableSingleValue()) { | |||
return FULL_PUSHDOWN.apply(session, domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we will be applying the filter both by the engine and the connector right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we verify the data type here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only applied to text types :
case Types.CHAR:
return Optional.of(mySqlDefaultCharColumnMapping(typeHandle.getRequiredColumnSize(), typeHandle.getCaseSensitivity()));
// TODO not all these type constants are necessarily used by the JDBC driver
case Types.VARCHAR:
case Types.NVARCHAR:
case Types.LONGVARCHAR:
case Types.LONGNVARCHAR:
return Optional.of(mySqlDefaultVarcharColumnMapping(typeHandle.getRequiredColumnSize(), typeHandle.getCaseSensitivity()));
plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java
Outdated
Show resolved
Hide resolved
.isFullyPushedDown(); | ||
|
||
// varchar like | ||
assertThat(query(format("SELECT regionkey, nationkey, name FROM %s WHERE name LIKE 'ROMANIA'", objectName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIKE
with wild-card characters ? LIKE 'ROMANIA
is equivalent to = 'ROMANIA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct in case of LIKE with wild-card - it's transformed to >, < case and we don't support such cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for like, please take a look
@DataProvider | ||
public static Object[][] collation() | ||
{ | ||
return new Object[][] {{"latin1", "latin1_general_cs"}, {"utf8", "utf8_bin"}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the only collation options or ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of them, I just took 2 of them as an example - ending with _cs and _bin, which are case sensitive.
0be5375
to
5df8fdf
Compare
@@ -77,6 +77,19 @@ public JdbcConnectorExpressionRewriterBuilder to(String rewritePattern) | |||
}; | |||
} | |||
|
|||
public ExpressionMapping<JdbcConnectorExpressionRewriterBuilder> mapWithCase(String expressionPattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we name it caseSensitive
to be specific ?
@@ -76,6 +87,13 @@ else if (value instanceof ConnectorExpression) { | |||
if (rewritten.isEmpty()) { | |||
return Optional.empty(); | |||
} | |||
if (checkCase && value instanceof Variable variable) { | |||
JdbcColumnHandle columnHandle = (JdbcColumnHandle) context.getAssignment(variable.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we verify they should be applied for varchar columns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest don't know from one point of view introducing this checkCase
- means, that we just checking case, and we don't care about type.
From other side - check case make sense only for Varchar columns.
non varchar columns also have attribute case sensitivity, but it always - case insensitive
I think it depends on decision on #18140 (comment)
@@ -37,12 +40,20 @@ public class GenericRewrite | |||
|
|||
private final ExpressionPattern expressionPattern; | |||
private final String rewritePattern; | |||
// skip in case of any Case insensitive column presents in pattern | |||
private final boolean checkCase; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR is splitted,
separate PR for LIKE predicate support #18441
5df8fdf
to
97698a9
Compare
97698a9
to
8fd2cac
Compare
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
8fd2cac
to
2b004a1
Compare
plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java
Outdated
Show resolved
Hide resolved
@@ -337,6 +343,103 @@ public void testPredicatePushdown() | |||
.isFullyPushedDown(); | |||
} | |||
|
|||
@Test(dataProvider = "collation") | |||
public void testPredicatePushdownWithCollationView(String charset, String collation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why separate test with view? To see if JDBC driver can return case-sensitivity for columns of a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to be sure that it works. Sometimes if you not able to change collation on table column directly, but want push down. You can make workaround, like create view and change collation on view column and push down will work.
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
When the column uses a case-sensitive collation in MySQL we can pushdown the equality/inequality predicates on text columns without affecting correctness. Range predicates on text columns are never pushed down and equality/inequality predicates on text columns are not pushed down if the column uses a case-insensitive collation.
2b004a1
to
38d2ac7
Compare
Thanks for working on this. |
Description
Enable Equal/Not Equal and IN predicate pushdown for varchar/nvarchar columns.
It requires, that column should have case sensitive collation.
#15714
splited for like predicate #18441
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:
Trino is able to perform predicate pushdown only for case sensitive varchar/nvarchar columns.
Supported operators:
User can manually change collation for specific column in MySql to enable such pushdowns.
Collation should be case sensitive (like _CS or _BIN) .