Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualification tool: Parse expressions in ProjectExec #5952

Merged
merged 3 commits into from
Jul 7, 2022

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Jul 6, 2022

This contributes to #5617 .
In this PR we parse expressions which are in ProjectExec. Project may have aliases so made sure we are taking that into account. Tested with several expressions locally but added only 1 test each for supporting and non-supporting case. Supporting case pretty much covers all cases where the ProjectExec string needs to be parsed.
There was a bug in FilterExec parser where we are not taking into account if the predicate was +. Updated that as well.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1 nartal1 added the tools label Jul 6, 2022
@nartal1 nartal1 added this to the Jun 20 - Jul 8 milestone Jul 6, 2022
@nartal1 nartal1 requested a review from tgravescs July 6, 2022 00:09
@nartal1 nartal1 self-assigned this Jul 6, 2022
@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 6, 2022

build

@tgravescs
Copy link
Collaborator

If you haven't yet we should test against the NDS event logs

@@ -268,6 +268,25 @@ object SQLPlanParser extends Logging {
funcName
}

def parseProjectExpressions(exprStr: String): Array[String] = {
val parsedExpressions = ArrayBuffer[String]()
// remove the alias names before parsing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example in the comment

val pattern = """(AS) ([(\w# )]+)""".r
// This is to split multiple column names in Project. Project may have a function on a column.
// This will contain array of columns names specified in ProjectExec.
val paranRemoved = pattern.replaceAllIn(exprStr.replace("),", "::"), "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets start putting more description on these so when we come back to them we can understand.... put example as well.

"ProjectExprsSupported") { spark =>
import spark.implicits._
val df1 = Seq(9.9, 10.2, 11.6, 12.5).toDF("value")
df1.write.parquet(s"$outputLoc/testtext")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put parquet in name of directory

"ProjectExprsNotSupported") { spark =>
import spark.implicits._
val df1 = spark.sparkContext.parallelize(List(10, 20, 30, 40)).toDF
df1.write.parquet(s"$outputLoc/testtext")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here put parquet inname for outputloc

@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 6, 2022

If you haven't yet we should test against the NDS event logs

I tested this against NDS event logs and surprisingly we don't have any unsupported exprs in any of the ProjectExec. I created some custom eventlogs with unsupported exprs to be sure that the testing is done correctly.

@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 6, 2022

build

@tgravescs tgravescs merged commit b16f295 into NVIDIA:branch-22.08 Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants