Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlServer Equal/Not Equal/In predicate support for case sensitive varchar columns #15714

Merged

Conversation

vlad-lyutenko
Copy link
Contributor

@vlad-lyutenko vlad-lyutenko commented Jan 13, 2023

Description

Enable Equal/Not Equal and IN predicate pushdown for varchar, nvarchar columns.
It requires, that column should have case sensitive collation.

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

Trino is able to perform predicate pushdown only for case sensitive varchar/nvarchar columns.
Supported operators:

  • equal
  • not equal
  • IN
  • NOT IN
    User can manually change collation for specific column in SqlServer to enable such pushdowns.
    Collation should be case sensitive (like _CS or _BIN) .
# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 13, 2023
@vlad-lyutenko vlad-lyutenko marked this pull request as draft January 13, 2023 16:48
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch from 4c4443c to 4814f24 Compare January 16, 2023 10:21
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 3 times, most recently from 2de35e3 to dc821bb Compare January 30, 2023 16:43
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 3 times, most recently from 59b9195 to 84be1f1 Compare January 31, 2023 15:49
@vlad-lyutenko vlad-lyutenko changed the title Sqlserver collation SqlServer Equal/In predicate support for case sensitive columns Jan 31, 2023
@vlad-lyutenko vlad-lyutenko self-assigned this Jan 31, 2023
@vlad-lyutenko vlad-lyutenko marked this pull request as ready for review January 31, 2023 15:55
@vlad-lyutenko vlad-lyutenko changed the title SqlServer Equal/In predicate support for case sensitive columns SqlServer Equal/In predicate support for case sensitive varchar columns Jan 31, 2023
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 2 times, most recently from f84f793 to 5d14ca6 Compare February 1, 2023 17:16
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 2 times, most recently from a2225e4 to 99af0bb Compare February 1, 2023 17:42
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 3 times, most recently from b28c844 to c3f619f Compare February 2, 2023 10:11
Copy link
Contributor

@ssheikin ssheikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! lgtm

@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch from a049ea1 to 0225a17 Compare February 9, 2023 13:03
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice

@@ -59,6 +53,8 @@ protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior)
{
switch (connectorBehavior) {
case SUPPORTS_PREDICATE_PUSHDOWN_WITH_VARCHAR_EQUALITY:
return true;
case SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_EQUALITY:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this one is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_EQUALITY(fallback -> fallback.test(SUPPORTS_JOIN_PUSHDOWN) && fallback.test(SUPPORTS_PREDICATE_PUSHDOWN_WITH_VARCHAR_EQUALITY)),

It was disabled before, because SUPPORTS_PREDICATE_PUSHDOWN_WITH_VARCHAR_EQUALITY was disabled.
Now we enabled it, but SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_EQUALITY should be disabled, because we don't support join on varchar in SQLServer:

    @Override
    protected boolean isSupportedJoinCondition(ConnectorSession session, JdbcJoinCondition joinCondition)
    {
        if (joinCondition.getOperator() == JoinCondition.Operator.IS_DISTINCT_FROM) {
            // Not supported in SQL Server
            return false;
        }

        // Remote database can be case insensitive.
        return Stream.of(joinCondition.getLeftColumn(), joinCondition.getRightColumn())
                .map(JdbcColumnHandle::getColumnType)
                .noneMatch(type -> type instanceof CharType || type instanceof VarcharType);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually maybe now we can enable it on case sensitive columns?

@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 2 times, most recently from 6ca2576 to ad99fca Compare February 12, 2023 20:48
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch from ad99fca to b2b4a2a Compare February 13, 2023 11:03
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments.

@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch 3 times, most recently from 3bc934e to 300a7c4 Compare February 13, 2023 13:55
Comment on lines 222 to 240
private static final PredicatePushdownController SQLSERVER_CHARACTER_PUSHDOWN = (session, domain) -> {
if (domain.isNullableSingleValue()) {
return FULL_PUSHDOWN.apply(session, domain);
}

if (!domain.getValues().isDiscreteSet()) {
// Push down inequality predicate
ValueSet complement = domain.getValues().complement();
if (complement.isDiscreteSet()) {
return FULL_PUSHDOWN.apply(session, domain);
}
// Push down of range predicate for varchar/char types could lead to incorrect results
// when the remote database is case insensitive
return DISABLE_PUSHDOWN.apply(session, domain);
}

Domain simplifiedDomain = domain.simplify(getDomainCompactionThreshold(session));
if (!simplifiedDomain.getValues().isDiscreteSet()) {
// Domain#simplify can turn a discrete set into a range predicate
// Push down of range predicate for varchar/char types could lead to incorrect results
// when the remote database is case insensitive
return DISABLE_PUSHDOWN.apply(session, domain);
}
return FULL_PUSHDOWN.apply(session, simplifiedDomain);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveen2112 Can you please take a look if the Domain handling appears correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we run getDomainCompactionThreshold check first and then check if the complement of given ValueSet is discrete ? What if we complement of a ValueSet is discrete but it has too many values so - we might compact it and then domain.getValues().complement().isDiscreteSet might not be valid.

It would be nice if we have a test coverage for this edge case also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, looks like I disable compaction threshold functionality by performing this check.
Refactored and test case added.


// varchar range
assertThat(query("SELECT regionkey, nationkey, name FROM nation WHERE name BETWEEN 'POLAND' AND 'RPA'"))
.matches("VALUES (BIGINT '3', BIGINT '19', CAST('ROMANIA' AS varchar(25)))")
// SQL Server is case insensitive by default
// We are not supporting range predicate pushdown for varchars
.isNotFullyPushedDown(FilterNode.class);

// varchar IN without domain compaction
assertThat(query("SELECT regionkey, nationkey, name FROM nation WHERE name IN ('POLAND', 'ROMANIA', 'VIETNAM')"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have additional test with NOT IN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case added

@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch from 300a7c4 to 46b6e85 Compare February 16, 2023 11:56
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlad-lyutenko reminder to update commit message with something like below, right now it just talks about equality.

Pushdown equality/inequality predicates on text columns in SQL Server

When the column uses a case-sensitive collation in SQL Server we can
pushdown the equality/inequality predicates on text columns without
affecting correctness.

Range predicates on text columns are never pushed down and
equality/inequality predicates on text columns are not pushed down if
the column uses a case-insensitive collation.

LGTM otherwise.

When the column uses a case-sensitive collation in SQL Server we can
pushdown the equality/inequality predicates on text columns without
affecting correctness.

Range predicates on text columns are never pushed down and
equality/inequality predicates on text columns are not pushed down if
the column uses a case-insensitive collation.
@vlad-lyutenko vlad-lyutenko force-pushed the huberty89/sqlserver-collation branch from 46b6e85 to 1b23ddf Compare February 20, 2023 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants