-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scalar regex match physical expr #12270
base: main
Are you sure you want to change the base?
feat: scalar regex match physical expr #12270
Conversation
Thank you for this PR @zhuliquan . Have you run any benchmarks that show this approach is noticeably faster than the existing approach? It makes sense that it would be faster as it does not re-compile the regular expression for each batch, but I think it would help to quantify this difference |
yeah add benchmarks
|
d49edca
to
e9fc6c7
Compare
9f02ab6
to
f1a81a7
Compare
f1a81a7
to
493a47a
Compare
ed8688d
to
62f86a5
Compare
b794f95
to
22b5297
Compare
Hello @alamb, I have compared my approach to original |
I wonder if we can see improvements on queries in benchmarks with scalar regexes, e.g. clickbench? |
Emm, It means that we should add some regex matching queries in benchmarks first. |
5f166ec
to
0de7a4f
Compare
* Optimize performance of function Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> * Add pre-check array is null * Fix clippy warnings --------- Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. - [Release notes](https://github.com/tokio-rs/prost/releases) - [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md) - [Commits](tokio-rs/prost@v0.13.3...v0.13.4) --- updated-dependencies: - dependency-name: prost-build dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Minor: Output elapsed time for sql logic test
…rojectionMapping` and `EquivalenceGroup` (apache#13675) * refactor: replace Vec with IndexMap for expression mappings in ProjectionMapping and EquivalenceGroup * chore * chore: Fix CI * chore: comment * chore: simplify
* fix: Fix parse_sql_expr not handling alias * cargo fmt * fix parse_sql_expr example(remove alias) * add testing * add SUM udaf to TestContextProvider and modify test_sql_to_expr_with_alias for function * revert change on example `parse_sql_expr`
apache#13730) Debug trait is useful for understanding what something is and how it's configured, especially if the implementation is behind dyn trait.
…13660) * add `unnest_as_table_factor` and `UnnestRelationBuilder` * unparse unnest as table factor * fix typo * add tests for the default configs * add a static const for unnest_placeholder * fix tests * fix tests
hello, @Dandandan I have write a benchmark for testing scalar regex match in PR #13789. I got below diff (before: without pre-compiled pattern, after: with pre-compiled pattern)
I'am very confused that some cases have improved and others have regressed. |
Which issue does this PR close?
Closes #11146.
Rationale for this change
This PR is successor of PR #11455
BinaryExpr
will compile literal regex pattern when it evaluatingRecordBatch
every time, Sometime, the time of compiling regex pattern is also expensive. In our approach, literal regex pattern will be compiled once and cached to be reused in execution. It's will save compile time of pre execution and speed up execution.What changes are included in this PR?
ScalarRegexMatchExpr
to handle regexp match with literal regrex pattern.PhysicalScalarRegexMatchExprNode
in proto to handleScalarRegexMatchExpr
and add arm in funcparse_physical_expr
andserialize_physical_expr
.BinaryExpr
arm increate_physical_expr
. CreatingScalarRegexMatchExpr
instead ofBinaryExpr
when Rhs is string literal expr andop
isRegexMatch | RegexIMatch | RegexNotMatch | RegexNotIMatch
.Are these changes tested?
Yes, test mod in
scalar_regex_match.rs
Are there any user-facing changes?