Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite LIKE patterns to string matching #4576

Closed
wants to merge 2 commits into from

Conversation

takezoe
Copy link
Member

@takezoe takezoe commented Jul 25, 2020

Improve LIKE predicates performance by rewriting to UDFs which don't use a regular expression.

  • Add ENDS_WITH and CONTAINS functions
  • Rewrite LIKE predicates to STARTS_WITH, ENDS_WITH or CONTAINS if possible

@cla-bot cla-bot bot added the cla-signed label Jul 25, 2020
@takezoe takezoe force-pushed the like_optimization branch 2 times, most recently from 477293d to 6d4fd1a Compare July 25, 2020 08:10
@findepi
Copy link
Member

findepi commented Jul 25, 2020

LIKE has special treatment in DomainTranslator.
Currently we do not have to add same treatment for starts_with
because DesugarLikeRewriter is called pretty much at the end
of the optimization process. However, it doesn't have to be always
so. @martint should we have similar support for starts_with in
DomainTranslator?

@takezoe
Copy link
Member Author

takezoe commented Aug 2, 2020

I didn't know what DomainTranslator is doing. However, after reading it, I think there is no reason to not apply the same domain translation to STARTS_WITH. I created another pull request for that: #4669

By the way, if Presto has CONTAINS (or MATCHES function which can compare the specified part of string with another string), we will be able to rewrite more patterns of LIKE predicates to not use regular expressions.

@takezoe takezoe force-pushed the like_optimization branch from 6d4fd1a to e19f549 Compare August 3, 2020 01:15
@martint
Copy link
Member

martint commented Aug 3, 2020

@martint should we have similar support for starts_with in DomainTranslator?

Yes, that makes sense (for now -- eventually we should be able to get rid of DomainTranslator or drastically reduce the scope of what it does and where it's used).

@takezoe
Copy link
Member Author

takezoe commented Aug 9, 2020

Is there a possibility to adopt this kind of optimization? If this optimization has a possibility to be adopted but adding new UDFs is not an ideal solution, I would propose two options:

  1. Optimize cases where can be rewritten to STARTS_WITH
  2. Introduce a hidden UDF for this optimization

@takezoe takezoe changed the title Rewrite LIKE to STARTS_WITH or ENDS_WITH Rewrite LIKE patterns to string matching to avoid regular expression overhead Aug 16, 2020
@takezoe takezoe changed the title Rewrite LIKE patterns to string matching to avoid regular expression overhead Rewrite LIKE patterns to string matching Aug 16, 2020
@ubyyj
Copy link
Contributor

ubyyj commented Sep 27, 2022

hi, what's the reason we haven't applied the optimzation in DesugarLikeRewriter? I think it would help a lot.

@findepi findepi requested a review from martint September 27, 2022 10:03
@ubyyj
Copy link
Contributor

ubyyj commented Sep 30, 2022

the rewrite could also happen in ExpressionInterpreter.visitLikePredicate()

@martint
Copy link
Member

martint commented Oct 3, 2022

FYI, as part of #13184, we're trying to remove usage of LikePredicate within the optimizer, as well as all desugaring rules. Those will be part of the initial translation of SQL query AST -> Plan IR.

@colebow
Copy link
Member

colebow commented Oct 19, 2022

@martint is it safe to say we can close this PR in favor of the work being done on #13184?

@martint
Copy link
Member

martint commented Nov 18, 2022

It's possible some of the rewrites proposed here are still relevant, but they will need to be reimplemented to match the function calls for the like matching functions instead of the Like expression node, as that's going away in the planner very soon (as part of #13961)

@takezoe
Copy link
Member Author

takezoe commented Nov 19, 2022

Sounds like it's better to raise a fresh pull request after #13961. I would close this pull request for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants