-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification tool: Parse expressions in ProjectExec #5952
Conversation
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
build |
If you haven't yet we should test against the NDS event logs |
@@ -268,6 +268,25 @@ object SQLPlanParser extends Logging { | |||
funcName | |||
} | |||
|
|||
def parseProjectExpressions(exprStr: String): Array[String] = { | |||
val parsedExpressions = ArrayBuffer[String]() | |||
// remove the alias names before parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example in the comment
val pattern = """(AS) ([(\w# )]+)""".r | ||
// This is to split multiple column names in Project. Project may have a function on a column. | ||
// This will contain array of columns names specified in ProjectExec. | ||
val paranRemoved = pattern.replaceAllIn(exprStr.replace("),", "::"), "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets start putting more description on these so when we come back to them we can understand.... put example as well.
"ProjectExprsSupported") { spark => | ||
import spark.implicits._ | ||
val df1 = Seq(9.9, 10.2, 11.6, 12.5).toDF("value") | ||
df1.write.parquet(s"$outputLoc/testtext") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put parquet in name of directory
"ProjectExprsNotSupported") { spark => | ||
import spark.implicits._ | ||
val df1 = spark.sparkContext.parallelize(List(10, 20, 30, 40)).toDF | ||
df1.write.parquet(s"$outputLoc/testtext") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here put parquet inname for outputloc
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
I tested this against NDS event logs and surprisingly we don't have any unsupported exprs in any of the ProjectExec. I created some custom eventlogs with unsupported exprs to be sure that the testing is done correctly. |
build |
This contributes to #5617 .
In this PR we parse expressions which are in ProjectExec. Project may have aliases so made sure we are taking that into account. Tested with several expressions locally but added only 1 test each for supporting and non-supporting case. Supporting case pretty much covers all cases where the ProjectExec string needs to be parsed.
There was a bug in FilterExec parser where we are not taking into account if the predicate was
+
. Updated that as well.