Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySql case sensitive collation varchar push down support #18140

Conversation

vlad-lyutenko
Copy link
Contributor

@vlad-lyutenko vlad-lyutenko commented Jul 5, 2023

Description

Enable Equal/Not Equal and IN predicate pushdown for varchar/nvarchar columns.
It requires, that column should have case sensitive collation.

#15714

splited for like predicate #18441

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

Trino is able to perform predicate pushdown only for case sensitive varchar/nvarchar columns.
Supported operators:

  • equal
  • not equal
  • IN
  • NOT IN
    User can manually change collation for specific column in MySql to enable such pushdowns.
    Collation should be case sensitive (like _CS or _BIN) .
# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jul 5, 2023
@vlad-lyutenko vlad-lyutenko marked this pull request as draft July 5, 2023 14:26
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 91cca67 to 0c366ad Compare July 5, 2023 14:28
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 0c366ad to ad066a2 Compare July 16, 2023 11:42
@vlad-lyutenko vlad-lyutenko marked this pull request as ready for review July 16, 2023 11:43
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch 4 times, most recently from c320ef3 to 120f91e Compare July 16, 2023 12:07
@kokosing kokosing requested a review from wendigo July 17, 2023 10:59
@ebyhr ebyhr removed their request for review July 18, 2023 05:33
@@ -426,14 +487,14 @@ public Optional<ColumnMapping> toColumnMapping(ConnectorSession session, Connect
return Optional.of(decimalColumnMapping(createDecimalType(precision, max(decimalDigits, 0))));

case Types.CHAR:
return Optional.of(defaultCharColumnMapping(typeHandle.getRequiredColumnSize(), false));
return Optional.of(mySqlDefaultCharColumnMapping(typeHandle.getRequiredColumnSize(), typeHandle.getCaseSensitivity().orElse(CASE_INSENSITIVE) == CASE_SENSITIVE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that it is case sensitive by default? Is it possible to change the default collation by session or connection properties in MySQL if it is not defined in DDL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeHandle.getCaseSensitivity().orElse(CASE_INSENSITIVE)
It's case insensitive by default, not to allow push downs.

In MySql there is connection/session property collation_connection but it's not counted, when we deal with text columns:
For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence


// varchar inequality
assertThat(query("SELECT regionkey, nationkey, name FROM nation_collate WHERE name != 'ROMANIA' AND name != 'ALGERIA'"))
.isFullyPushedDown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have hasCorrectResultsRegardlessOfPushdown(); inside so it checks that results are the same with and without pushdown

@@ -337,6 +342,87 @@ public void testPredicatePushdown()
.isFullyPushedDown();
}

@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test copied?

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's taking partially (for string columns) from BaseMySqlConnector.testPredicatePushdown and SqlServerConnectorTest (In SqlServer default column collation is case sensitive so we modified original test).
In MySql - no, so we need additional test with explicit collation.

@Test
public void testPredicatePushdownWithCollation()
{
onRemoteDatabase().execute("CREATE TABLE tpch.nation_collate AS SELECT regionkey, nationkey, CONVERT(name USING utf8) COLLATE utf8_bin AS name FROM tpch.nation");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need nation like structure? Why not to have int, varchar only columns?

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took nation table to have the same test cases for better readability and comparison

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 120f91e to 8079c7f Compare July 18, 2023 22:01
@vlad-lyutenko vlad-lyutenko requested a review from atanasenko July 21, 2023 09:23
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 8079c7f to 7ce25f1 Compare July 23, 2023 13:43
@@ -197,6 +213,26 @@ public class MySqlClient
private final ConnectorExpressionRewriter<ParameterizedExpression> connectorExpressionRewriter;
private final AggregateFunctionRewriter<JdbcExpression, ?> aggregateFunctionRewriter;

private static final PredicatePushdownController MYSQL_CHARACTER_PUSHDOWN = (session, domain) -> {
if (domain.isNullableSingleValue()) {
return FULL_PUSHDOWN.apply(session, domain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we will be applying the filter both by the engine and the connector right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we verify the data type here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only applied to text types :

case Types.CHAR:
                return Optional.of(mySqlDefaultCharColumnMapping(typeHandle.getRequiredColumnSize(), typeHandle.getCaseSensitivity()));

            // TODO not all these type constants are necessarily used by the JDBC driver
            case Types.VARCHAR:
            case Types.NVARCHAR:
            case Types.LONGVARCHAR:
            case Types.LONGNVARCHAR:
                return Optional.of(mySqlDefaultVarcharColumnMapping(typeHandle.getRequiredColumnSize(), typeHandle.getCaseSensitivity()));

.isFullyPushedDown();

// varchar like
assertThat(query(format("SELECT regionkey, nationkey, name FROM %s WHERE name LIKE 'ROMANIA'", objectName)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIKE with wild-card characters ? LIKE 'ROMANIA is equivalent to = 'ROMANIA

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in case of LIKE with wild-card - it's transformed to >, < case and we don't support such cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for like, please take a look

@DataProvider
public static Object[][] collation()
{
return new Object[][] {{"latin1", "latin1_general_cs"}, {"utf8", "utf8_bin"}};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the only collation options or ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of them, I just took 2 of them as an example - ending with _cs and _bin, which are case sensitive.

@@ -77,6 +77,19 @@ public JdbcConnectorExpressionRewriterBuilder to(String rewritePattern)
};
}

public ExpressionMapping<JdbcConnectorExpressionRewriterBuilder> mapWithCase(String expressionPattern)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name it caseSensitive to be specific ?

@@ -76,6 +87,13 @@ else if (value instanceof ConnectorExpression) {
if (rewritten.isEmpty()) {
return Optional.empty();
}
if (checkCase && value instanceof Variable variable) {
JdbcColumnHandle columnHandle = (JdbcColumnHandle) context.getAssignment(variable.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we verify they should be applied for varchar columns ?

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest don't know from one point of view introducing this checkCase - means, that we just checking case, and we don't care about type.
From other side - check case make sense only for Varchar columns.
non varchar columns also have attribute case sensitivity, but it always - case insensitive

I think it depends on decision on #18140 (comment)

@@ -37,12 +40,20 @@ public class GenericRewrite

private final ExpressionPattern expressionPattern;
private final String rewritePattern;
// skip in case of any Case insensitive column presents in pattern
private final boolean checkCase;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if GenericRewrite should handle a feature specific to varchar columns ?

cc: @hashhar / @wendigo / @findepi your thoughts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is splitted,
separate PR for LIKE predicate support #18441

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 5df8fdf to 97698a9 Compare July 27, 2023 14:05
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 8fd2cac to 2b004a1 Compare July 29, 2023 11:10
@@ -337,6 +343,103 @@ public void testPredicatePushdown()
.isFullyPushedDown();
}

@Test(dataProvider = "collation")
public void testPredicatePushdownWithCollationView(String charset, String collation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why separate test with view? To see if JDBC driver can return case-sensitivity for columns of a view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to be sure that it works. Sometimes if you not able to change collation on table column directly, but want push down. You can make workaround, like create view and change collation on view column and push down will work.

When the column uses a case-sensitive collation in MySQL we can
pushdown the equality/inequality predicates on text columns without
affecting correctness.

Range predicates on text columns are never pushed down and
equality/inequality predicates on text columns are not pushed down if
the column uses a case-insensitive collation.
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/mysql-collation-support branch from 2b004a1 to 38d2ac7 Compare August 1, 2023 08:35
@Praveen2112 Praveen2112 merged commit 7e9513e into trinodb:master Aug 1, 2023
@Praveen2112
Copy link
Member

Thanks for working on this.

@github-actions github-actions bot added this to the 423 milestone Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants