Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve IS NULL pushdown for ClickHouse complex expression #23459

Conversation

ssheikin
Copy link
Contributor

@ssheikin ssheikin commented Sep 17, 2024

@ssheikin
Copy link
Contributor Author

@Praveen2112 @hashhar @ebyhr please review.

@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch 2 times, most recently from 5982fe3 to a8c8d0c Compare October 1, 2024 14:00
Copy link
Contributor

@krvikash krvikash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm.

{
checkState(!testCases.isEmpty(), "No test cases");
for (int specialColumn = 0; specialColumn < SPECIAL_COLUMNS; specialColumn++) {
checkArgument(!"NULL".equalsIgnoreCase(testCases.get(specialColumn).inputLiteral()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are verifying here actually? It is verifying for only first test case that inputLiteral should not be null.
It seems that before adding the round trip test case, we should be knowing that 1st test case should not have NULL as input literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going through the code, I got the intention of why we needed SPECIAL_COLUMNS. May be NON_NULL_COLUMNS makes more sense here. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had TestClickHouseConnectorTest#testTextualPredicatePushdown in mind, where SPECIAL_COLUMNS are

                        unsupported_1 Point,
                        unsupported_2 Point,
                        some_column String,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we could handle this special columns as part of this framework ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible however I'm not sure how to make it look better that it is right now, because the type of the special columns may differ for different cases:

                .addTestCase("String", "'z'", VARCHAR, "CAST('z' AS varchar)") // special, non null column
                .addTestCase("String", "'z'", VARBINARY, "CAST('z' AS varbinary)") // special, non null column

@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch 2 times, most recently from 9d6bf97 to 48e2843 Compare October 3, 2024 10:35
@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch from 48e2843 to 6c17091 Compare October 3, 2024 12:21
+ withConnectorExpression;

// Closing QueryAssertions would close the QueryRunner
QueryAssertions queryAssertions = new QueryAssertions(queryRunner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inject queryAssertions instead of query runner ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It conforms to the api of SqlDataTypeTest if we later want to unify these assertions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if it gives benefits.

{
checkState(!testCases.isEmpty(), "No test cases");
for (int specialColumn = 0; specialColumn < SPECIAL_COLUMNS; specialColumn++) {
checkArgument(!"NULL".equalsIgnoreCase(testCases.get(specialColumn).inputLiteral()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we could handle this special columns as part of this framework ?

@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch from 6c17091 to 15d804f Compare October 16, 2024 08:57
@ssheikin
Copy link
Contributor Author

@Praveen2112 please approve and merge.

Comment on lines 101 to 123
try {
assertPushdown(expectPushdown,
assertResult(isNull ? Optional.of(firstCase.expectedLiteral()) : Optional.empty(),
assertThat(queryAssertions.query(session, queryWithAll))));
}
catch (AssertionError e) {
// if failed - identify exact column which caused the failure
for (int column = SPECIAL_COLUMNS; column < testCases.size(); column++) {
String queryWithSingleColumnPredicate = "SELECT " + firstColumnName + " FROM " + temporaryRelation.getName() + " WHERE " + getPredicate(column, isNull) + withConnectorExpression;
assertPushdown(expectPushdown,
assertResult(isNull ? Optional.of(firstCase.expectedLiteral()) : Optional.empty(),
assertThat(queryAssertions.query(session, queryWithSingleColumnPredicate))));
}
throw new IllegalStateException("Single column assertion should fail for at least one column, if query of all column failed", e);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asserting them twice like once with all columns AND ? Can we assert them each column at a time ? Like

            for (int column = SPECIAL_COLUMNS; column < testCases.size(); column++) {
                String queryWithSingleColumnPredicate = "SELECT " + firstColumnName + " FROM " + temporaryRelation.getName() + " WHERE " + getPredicate(column, isNull) + withConnectorExpression;
                assertPushdown(expectPushdown,
                        assertResult(isNull ? Optional.of(firstCase.expectedLiteral()) : Optional.empty(),
                                assertThat(queryAssertions.query(session, queryWithSingleColumnPredicate))));
            }

This would allow us to identify the specific columns which would cause failure in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow your suggestion. We already asserting each column at a time, in case of failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of handling twice can we try asserting only once for each column so that it would be easier to debug in case of failures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first check for all columns is N times faster than the second for single columns. So it's an improvement for sunny-day scenario.
In case of failure there is overhead of 1 additional check, performing N additional checks for single columns.
This was copied from SqlDataTypeTest.
Please let me know if you want to simplify code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense !! Let's have it like this.

Comment on lines 118 to 137
private String getPredicate(int column, boolean isNull)
{
checkArgument(column >= SPECIAL_COLUMNS, "Special columns should not be a part of a predicate, as they are helpers and do not participate in the assertions");
String columnName = "col_" + column;
checkArgument("NULL".equalsIgnoreCase(testCases.get(column).inputLiteral()));
return isNull
? columnName + " IS NULL"
: columnName + " IS NOT NULL";
}

private static QueryAssert assertResult(Optional<String> value, QueryAssert assertion)
{
return value.isPresent()
? assertion.matches("VALUES %s".formatted(value.get()))
: assertion.returnsEmptyResult();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could inline IS NULL and IS NOT NULL and the assertion could be inlined right ? Ideally for the test data - we for IS NULL we get a value and for IS NOT NULL we get an empty value - Do we need this dedicated method for each assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow. I'm ok with any solution, just to merge this PR.

@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch from 15d804f to 2360265 Compare October 16, 2024 14:05
@ssheikin
Copy link
Contributor Author

ssheikin commented Oct 16, 2024

@Praveen2112 do you think we are good to merge this PR?
It has a nice score of modified lines:
image

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch from 2360265 to 2ed946e Compare October 17, 2024 13:16
@ssheikin
Copy link
Contributor Author

@Praveen2112 ptal

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this

@Praveen2112
Copy link
Member

@ssheikin Can we update the PR description.

@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch 3 times, most recently from b8c37ac to 0a81434 Compare October 17, 2024 14:37
@ssheikin ssheikin force-pushed the ssheikin/54/trino/clickhouse-pushdown-predicate-not-null branch from 0a81434 to cd7ddc4 Compare October 17, 2024 14:42
@Praveen2112 Praveen2112 merged commit 5936d81 into trinodb:master Oct 17, 2024
16 checks passed
@github-actions github-actions bot added this to the 463 milestone Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants