Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL for JDBC expression rewrites #11125

Merged
merged 3 commits into from
Mar 15, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Feb 21, 2022

Introduce DSL for building function pushdown rules in JDBC.

As we are building up a registry of allowed function pushdowns, the
declaration verbosity matters. This commit adds support for defining
rewrite rules tersely.

As a usage example, a previously hand-written RewriteLike rule is now
replaced with a mere one-liner:

.map("$like_pattern(value: varchar, pattern: varchar): boolean").to("value LIKE pattern")

Relates to #18

@cla-bot cla-bot bot added the cla-signed label Feb 21, 2022
@findepi
Copy link
Member Author

findepi commented Feb 21, 2022

Currently depends on #11086

cc @wendigo @hashhar @losipiuk @martint @assaf2 @ebyhr @grantatspothero

@findepi
Copy link
Member Author

findepi commented Feb 21, 2022

cc @slhmy

@@ -317,7 +316,7 @@ public PostgreSqlClient(
.addDefaultRules()
// TODO allow all comparison operators for numeric types
.add(new RewriteComparison(RewriteComparison.ComparisonOperator.EQUAL, RewriteComparison.ComparisonOperator.NOT_EQUAL))
.add(new RewriteLike())
.map("$like_pattern(value: varchar, pattern: varchar): boolean").to("value LIKE pattern")
Copy link
Contributor

@grantatspothero grantatspothero Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get away with writing the AST manually in java instead of defining an antlr grammar/using a visitor to create the AST?

In my opinion, this doesn't seem too bad to manually write. But maybe others feel differently 🤷‍♀️

new CallPattern(
                        "$like_pattern",
                        List.of(
                                new ExpressionCapture("value", type("varchar")),
                                new ExpressionCapture("pattern", type("varchar"))),
                        Optional.of(type("boolean"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a test case for a more complex example would show the benefits of the DSL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DSL syntax here is similar to SQL, so hopefully feels familiar, with notable differences:

  • call parameters have explicit types, call result has optional explicit type (this is central for function pattern matching)
  • $ sign is allowed in identifiers, so that we don't need quotes

In my view, your example already proves the value.
The had to use CallPattern, List, ExpressionCapture, type, Optional for the same effect, without adding any actual value for readability (quite contrary, IMO).

@findepi findepi force-pushed the findepi/sql-pushdown-dsl branch from 09ebbe7 to 05b99d5 Compare March 4, 2022 12:38
@findepi
Copy link
Member Author

findepi commented Mar 4, 2022

(rebased)

@findepi findepi force-pushed the findepi/sql-pushdown-dsl branch from 05b99d5 to 4d1d095 Compare March 9, 2022 16:11
@findepi
Copy link
Member Author

findepi commented Mar 9, 2022

Rebased on current top of #11086.
@wendigo 's #11308 helped me to identify a bug in the generic rewrite.

@findepi findepi marked this pull request as ready for review March 9, 2022 16:14
@findepi findepi force-pushed the findepi/sql-pushdown-dsl branch from 4d1d095 to d05294f Compare March 10, 2022 13:01
@findepi
Copy link
Member Author

findepi commented Mar 10, 2022

rebased and fixed codestyle issues.

@findepi
Copy link
Member Author

findepi commented Mar 10, 2022

When reviewing, you can start from here:
image

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything except GenericRewrite#rewrite looks good to me % comments.

* limitations under the License.
*/

grammar ConnectorExpressionPattern;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't pretend I'm very familiar with ANTLR - I'll review this in detail later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess @martint @kasiafi and maybe @electrum are.

return (TypePattern) invokeParser(typePattern, ConnectorExpressionPatternParser::standaloneType);
}

public Object invokeParser(String input, Function<ConnectorExpressionPatternParser, ParserRuleContext> parseFunction)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me + Antlr usage matches existing code in other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. Maybe someone with Antlr API usage knowledge can take a look - maybe @martint / @kasiafi .

@findepi findepi force-pushed the findepi/sql-pushdown-dsl branch 2 times, most recently from 3c10495 to 7a5f232 Compare March 11, 2022 12:28
Introduce DSL for building function pushdown rules in JDBC.

As we are building up a registry of allowed function pushdowns, the
declaration verbosity matters. This commit adds support for defining
rewrite rules tersely.

As a usage example, a previously hand-written `RewriteLike` rule is now
replaced with a mere one-liner:

```
.map("$like_pattern(value: varchar, pattern: varchar): boolean").to("value LIKE pattern")
```
@findepi findepi force-pushed the findepi/sql-pushdown-dsl branch from 7a5f232 to a5d3deb Compare March 11, 2022 15:14
;

IDENTIFIER
: (LETTER | '_' | '$') (LETTER | DIGIT | '_' | '$')*
Copy link
Member

@martint martint Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these are not SQL identifiers (no delimited vs non-delimited), so they should be treated as already-canonicalized. Also, that implies that the character set is potentially anything that can go inside a delimited SQL identifier (i.e., any UTF-8 character). This is going to be hard to do without introducing some kind of escaping for whitespace or other characters.

Concretely, if a SQL query does something like:

SELECT "123 some function"(a, b, c) FROM t

(i.e., it calls a function named 123 some function)

That name needs to be representable in this grammar. The following pattern would obviously not work due to 1) whitespace 2) leading numbers in the function name:

123 some function(a : varchar, b : varchar, c : varchar) : bigint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these are not SQL identifiers (no delimited vs non-delimited)

yes, they aren't

. so they should be treated as already-canonicalized.

in a sense of lower/upper-case?
yes, it should be case-sensitive

Also, that implies that the character set is potentially anything that can go inside a delimited SQL identifier (i.e., any UTF-8 character). This is going to be hard to do without introducing some kind of escaping for whitespace or other characters.

For such a case we will introduce quotes to allow a name be anything.
We don't have such need yet, but once we have, we'll do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is convenience syntax to simplify most of the cases. It doesn't need to support all the cases. For example, it doesn't support varargs today. It can be made to support varargs, and we may or may not choose to add that, depending on demand. "99%" cases are function names that are ordinary words, with fixed number of arguments, potentially with overloads for different types.

@findepi findepi merged commit b44824b into trinodb:master Mar 15, 2022
@findepi findepi deleted the findepi/sql-pushdown-dsl branch March 15, 2022 09:41
@github-actions github-actions bot added this to the 374 milestone Mar 15, 2022
@findepi findepi added the no-release-notes This pull request does not require release notes entry label Mar 16, 2022
msafonov added a commit to joomcode/trino that referenced this pull request Jul 4, 2022
msafonov added a commit to joomcode/trino that referenced this pull request Jul 4, 2022
msafonov added a commit to joomcode/trino that referenced this pull request Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

4 participants