Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate PPL-builtin functions to Spark-builtin functions #448

Merged

Conversation

LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Jul 24, 2024

Description

Add a BuiltinFunctionTranslator to translate PPL builtin functions to Spark builtin functions in Parser.

This PR only allows the PPL-builtin functions to be translated. In another words, using non-PPL-builtin functions in PPL query will throw an exception, even they are native Spark-builtin functions. In order to support more functions in PPL, we should update the PPL builtin functions later.
For example, following PPL builtin functions will be translated to Spark builtin functions:

String functions:

concat
length
like
lower
ltrim
position
rtrim
substring
trim
upper
replace

Math Functions:

ceil
floor
sqrt
ln
pow
mod
abs

Time Functions:

from_unixtime
unix_timestamp

Issues Resolved

Resolves #419, #420, #422, #423, #424, #425, #426, #427, #428, #491, #492, #480, #481, #469, #466, #459, #457, #456, #455

Check List

  • [-] Updated documentation
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@@ -263,17 +263,20 @@ logicalExpression

comparisonExpression
: left = valueExpression comparisonOperator right = valueExpression # compareExpr
| valueExpression IN valueList # inExpr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added inExpr back since it is required by position functions.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin changed the title Translate PPL string and math builtin functions to Spark builtin functions Translate PPL-builtin functions to Spark-builtin functions Jul 24, 2024
Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add additional documentation (markdown reference page) for each new command

assertEquals(expectedPlan, logPlan)
}

test("test arithmetic: + - * / %") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a few more complex test for such operators

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.apache.spark.sql.catalyst.expressions.{EqualTo, Literal}
import org.apache.spark.sql.catalyst.plans.logical.{Filter, Project}

class PPLLogicalPlanMathFunctionsTranslatorTestSuite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add additional IT tests that use these commands with a more complex context of full fledge query use cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.apache.spark.sql.catalyst.expressions.{EqualTo, Like, Literal, NamedExpression}
import org.apache.spark.sql.catalyst.plans.logical.{Filter, Project}

class PPLLogicalPlanStringFunctionsTranslatorTestSuite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add additional IT tests that use these commands with a more complex context of full fledge query use cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.apache.spark.sql.catalyst.expressions.{Alias, ExprId}
import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, Project}

/**
* general utility functions for ppl to spark transformation test
*/
trait LogicalPlanTestUtils {
trait LogicalPlanTestUtils extends AnalysisTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what / where will this trait be use is ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I tried to checkAnalysis in unit tests I added but failed with table not found. It never used, will remove it.

@LantaoJin
Copy link
Member Author

LantaoJin commented Jul 25, 2024

please add additional documentation (markdown reference page) for each new command

Which file or where?
AFAIK, https://github.com/opensearch-project/opensearch-spark/tree/main/ppl-spark-integration#roadmap listed all commands, but this change is adding functions supporting. Not sure should I list all supported functions in this doc. @YANG-DB

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin
Copy link
Member Author

Part of expressions removed by #478 are added back in order to support builtin functions translation.

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question.
What is the relationship between OpenSearch PPL and Spark PPL? How do we ensure the syntax remains consistent with OpenSearch PPL? Should the function specifications be aligned?

@LantaoJin
Copy link
Member Author

High level question. What is the relationship between OpenSearch PPL and Spark PPL?

There is no "Spark PPL" at all as far as I'm concerned, the source of truth of PPL should be "OpenSearch PPL". I don't know why opensearch-spark copied out the PPL syntax from SQL repository. From long-term perspective, they should be generated by an unity grammar code.

How do we ensure the syntax remains consistent with OpenSearch PPL? Should the function specifications be aligned?

This PR only translates the PPL-builtin functions to Spark-builtin functions, so required by opensearch-spark repository only. It won't introduce any split-syntax between SQL and openseach-spark repositories as long as the builtin functions in two repositories are same. If we want to add new builtin functions for any reason, for short-term, submitting PRs to this repository as well as SQL's.

@penghuo
Copy link
Collaborator

penghuo commented Jul 29, 2024

From long-term perspective, they should be generated by an unity grammar code.

+1, we should do it asap. otherwise the grammer will diverage.

Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work - thanks !!

@YANG-DB
Copy link
Member

YANG-DB commented Jul 30, 2024

@penghuo As part of the suggested original PPL spark Issue was to extract the grammer away from the SQL plugin into an independed repository which will become the single point of truth for the PPL syntax, documentation, grammar and such

The issue

@YANG-DB YANG-DB merged commit 1f81ddf into opensearch-project:main Jul 30, 2024
4 checks passed
@dai-chen
Copy link
Collaborator

dai-chen commented Jul 30, 2024

@penghuo As part of the suggested original PPL spark Issue was to extract the grammer away from the SQL plugin into an independed repository which will become the single point of truth for the PPL syntax, documentation, grammar and such

The issue

FYI, I think this repo is for that purpose: https://github.com/opensearch-project/piped-processing-language

For the long term solution, probably something like https://github.com/partiql/partiql-scribe is an option to consider.

@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support backport 0.5 labels Aug 14, 2024
@opensearch-trigger-bot
Copy link

The backport to 0.5 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.5 0.5
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.5
# Create a new branch
git switch --create backport/backport-448-to-0.5
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1f81ddf88ac96cd0130c9bd8f32abc4454e90699
# Push it to GitHub
git push --set-upstream origin backport/backport-448-to-0.5
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.5

Then, create a pull request where the base branch is 0.5 and the compare/head branch is backport/backport-448-to-0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 0.5 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]Add PPL Concat command
4 participants