-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30764][SQL] Improve the readability of EXPLAIN FORMATTED style #27509
Conversation
ok to test |
Thanks for your work, @Eric5553 ! |
Test build #118102 has finished for PR 27509 at commit
|
9d30fad
to
4d0fcff
Compare
Also cc @maryannxue @hvanhovell |
Test build #118110 has finished for PR 27509 at commit
|
Test build #118138 has finished for PR 27509 at commit
|
|Left keys : ${leftKeys} | ||
|Right keys: ${rightKeys} | ||
|Join condition : ${joinCondStr} | ||
|${ExplainUtils.generateFieldString("Left keys", leftKeys)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be related to this PR, but can we do the same thing as https://github.com/apache/spark/pull/27368/files#diff-ddb517fe44ae649ddda3c733c2adcb76R70 for joins? Just for symmetry and future handiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make HashJoin
extend BinaryExecNode
, and ShuffledHashJoinExec/BroadcastHashJoinExec
extend HashJoin
, right? Yea, I can make it here together :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I meant creating a trait for all physical joins. It'll make pattern matching easier although we don't have this requirement right now. We could do it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yea. I just recall the conversation, thanks for your explanation :-)
I'll submit a follow-up PR for joins accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #27595 opened for this follow-up.
s""" | ||
|($operatorId) $nodeName $codegenIdStr | ||
|Arguments: ${if (argStr != null && !argStr.isEmpty) argStr else "None"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to mute "Arguments" if no arguments instead of printing "None"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for the suggestion!
4d0fcff
to
9d52f92
Compare
Also trying to address the improvement #27368 (comment) here in this PR. I tried with adding new format string function for |
Test build #118441 has finished for PR 27509 at commit
|
Test build #118458 has finished for PR 27509 at commit
|
retest this please |
Test build #118480 has finished for PR 27509 at commit
|
9d52f92
to
f0bcf12
Compare
Test build #118507 has finished for PR 27509 at commit
|
|($operatorId) $nodeName $codegenIdStr | ||
""".stripMargin | ||
if (argumentString != null && !argumentString.isEmpty) { | ||
s"""${result} |Arguments: $argumentString\n""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too hard to read. How about
if (argumentString.nonEmpty) {
result + s"Arguments: $argumentString\n"
} else {
result
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I just remember why I tried so complicated writing here.
result + s"Arguments: $argumentString\n"
will lead to unexpected padding before Arguments.
Thus to avoid the padding and not using improper multiline string maybe we can use
s"${result} |Arguments: $argumentString\n".stripMargin
?
Any other suggestions? Thanks @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about result.trim + s"Arguments: $argumentString\n"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems trim will break the first line padding of result
.
result.trim + s"\nArguments: $argumentString\n"
Shows
(3) ObjectHashAggregate
Input [2]: [key#x, val#x]
Keys [1]: [key#x]
Functions [1]: [partial_collect_set(val#x, 0, 0)]
Aggregate Attributes [1]: [buf#x]
Results [2]: [key#x, buf#x]
(4) Exchange
Input [2]: [key#x, buf#x]
Arguments: hashpartitioning(key#x, 4), true, [id=#x]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
val baseStr = s"($operatorId) $nodeName $codegenIdStr"
if (argumentString.nonEmpty) {
s"""
|$baseStr
|Arguments: $argumentString
""". stripMargin
} else {
s"""
|$baseStr
""". stripMargin
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it works with better code style. Thanks a lot!
case iter: Iterable[_] => s"${fieldName} [${iter.size}]: ${iter.mkString("[", ", ", "]")}" | ||
case str: String if (str == null || str.isEmpty) => s"${fieldName}: None" | ||
case str: String => s"${fieldName}: ${str}" | ||
case _ => s"${fieldName}: Unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not expected. Shall we just throw exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|${ExplainUtils.generateFieldString("Output", producedAttributes)} | ||
""".stripMargin | ||
if (argumentString != null && !argumentString.isEmpty) { | ||
s"""${result} |Arguments: $argumentString\n""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|${ExplainUtils.generateFieldString("Input", child.output)} | ||
""".stripMargin | ||
if (argumentString != null && !argumentString.isEmpty) { | ||
s"""${result} |Arguments: $argumentString\n""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not use multiline string if it's not multiline
@Eric5553 can you fix the pyspark test? you should update
|
@cloud-fan Sorry for missed the failed unit test(just found where to get the jenkins error log...). I'll fix the pyspark test and address comments today. And I got an implementation of removing useless |
Let's open a new PR for that. |
LGTM |
Test build #118745 has finished for PR 27509 at commit
|
retest this please |
1 similar comment
retest this please |
Test build #118765 has finished for PR 27509 at commit
|
thanks, merging to master/3.0! |
### What changes were proposed in this pull request? The style of `EXPLAIN FORMATTED` output needs to be improved. We’ve already got some observations/ideas in #27368 (comment) #27368 (comment) Observations/Ideas: 1. Using comma as the separator is not clear, especially commas are used inside the expressions too. 2. Show the column counts first? For example, `Results [4]: …` 3. Currently the attribute names are automatically generated, this need to refined. 4. Add arguments field in common implementations as `EXPLAIN EXTENDED` did by calling `argString` in `TreeNode.simpleString`. This will eliminate most existing minor differences between `EXPLAIN EXTENDED` and `EXPLAIN FORMATTED`. 5. Another improvement we can do is: the generated alias shouldn't include attribute id. collect_set(val, 0, 0)#123 looks clearer than collect_set(val#456, 0, 0)#123 This PR is currently addressing comments 2 & 4, and open for more discussions on improving readability. ### Why are the changes needed? The readability of `EXPLAIN FORMATTED` need to be improved, which will help user better understand the query plan. ### Does this PR introduce any user-facing change? Yes, `EXPLAIN FORMATTED` output style changed. ### How was this patch tested? Update expect results of test cases in explain.sql Closes #27509 from Eric5553/ExplainFormattedRefine. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 1f0300f) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thanks so much! @cloud-fan @maryannxue @maropu @gatorsmile |
### What changes were proposed in this pull request? Currently the join operators are not well abstracted, since there are lot of common logic. A trait can be created for easier pattern matching and other future handiness. This is a follow-up PR based on comment #27509 (comment) . This PR refined from the following aspects: 1. Refined structure of all physical join operators 2. Add missing joinType field for CartesianProductExec operator 3. Refined codes related to Explain Formatted The EXPLAIN FORMATTED changes are 1. Converge all join operator `verboseStringWithOperatorId` implementations to `BaseJoinExec`. Join condition displayed, and join keys displayed if it’s not empty. 2. `#1` will add Join condition to `BroadcastNestedLoopJoinExec`. 3. `#1` will **NOT** affect `CartesianProductExec`,`SortMergeJoin` and `HashJoin`s, since they already got there override implementation before. 4. Converge all join operator `simpleStringWithNodeId` to `BaseJoinExec`, which will enhance the one line description for `CartesianProductExec` with `JoinType` added. 5. Override `simpleStringWithNodeId` in `BroadcastNestedLoopJoinExec` to show `BuildSide`, which was only done for `HashJoin`s before. ### Why are the changes needed? Make the code consistent with other operators and for future handiness of join operators. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #27595 from Eric5553/RefineJoin. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? The style of `EXPLAIN FORMATTED` output needs to be improved. We’ve already got some observations/ideas in apache#27368 (comment) apache#27368 (comment) Observations/Ideas: 1. Using comma as the separator is not clear, especially commas are used inside the expressions too. 2. Show the column counts first? For example, `Results [4]: …` 3. Currently the attribute names are automatically generated, this need to refined. 4. Add arguments field in common implementations as `EXPLAIN EXTENDED` did by calling `argString` in `TreeNode.simpleString`. This will eliminate most existing minor differences between `EXPLAIN EXTENDED` and `EXPLAIN FORMATTED`. 5. Another improvement we can do is: the generated alias shouldn't include attribute id. collect_set(val, 0, 0)apache#123 looks clearer than collect_set(val#456, 0, 0)apache#123 This PR is currently addressing comments 2 & 4, and open for more discussions on improving readability. ### Why are the changes needed? The readability of `EXPLAIN FORMATTED` need to be improved, which will help user better understand the query plan. ### Does this PR introduce any user-facing change? Yes, `EXPLAIN FORMATTED` output style changed. ### How was this patch tested? Update expect results of test cases in explain.sql Closes apache#27509 from Eric5553/ExplainFormattedRefine. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Currently the join operators are not well abstracted, since there are lot of common logic. A trait can be created for easier pattern matching and other future handiness. This is a follow-up PR based on comment apache#27509 (comment) . This PR refined from the following aspects: 1. Refined structure of all physical join operators 2. Add missing joinType field for CartesianProductExec operator 3. Refined codes related to Explain Formatted The EXPLAIN FORMATTED changes are 1. Converge all join operator `verboseStringWithOperatorId` implementations to `BaseJoinExec`. Join condition displayed, and join keys displayed if it’s not empty. 2. `apache#1` will add Join condition to `BroadcastNestedLoopJoinExec`. 3. `apache#1` will **NOT** affect `CartesianProductExec`,`SortMergeJoin` and `HashJoin`s, since they already got there override implementation before. 4. Converge all join operator `simpleStringWithNodeId` to `BaseJoinExec`, which will enhance the one line description for `CartesianProductExec` with `JoinType` added. 5. Override `simpleStringWithNodeId` in `BroadcastNestedLoopJoinExec` to show `BuildSide`, which was only done for `HashJoin`s before. ### Why are the changes needed? Make the code consistent with other operators and for future handiness of join operators. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes apache#27595 from Eric5553/RefineJoin. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@@ -243,7 +243,7 @@ case class FilterExec(condition: Expression, child: SparkPlan) | |||
override def verboseStringWithOperatorId(): String = { | |||
s""" | |||
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)} | |||
|Input : ${child.output.mkString("[", ", ", "]")} | |||
|${ExplainUtils.generateFieldString("Input", child.output)} | |||
|Condition : ${condition} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the space before ":"?
(3) Filter [codegen id : 1]
Input [1]: [col.dots#22]
Condition : (isnotnull(col.dots#22) AND (col.dots#22 = 500))
@@ -76,7 +76,7 @@ trait DataSourceScanExec extends LeafExecNode { | |||
|
|||
s""" | |||
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)} | |||
|Output: ${producedAttributes.mkString("[", ", ", "]")} | |||
|${ExplainUtils.generateFieldString("Output", producedAttributes)} | |||
|${metadataStr.mkString("\n")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are only for DSV1. Could we make the corresponding changes when using DSV2? Open the ticket https://issues.apache.org/jira/browse/SPARK-31480
Also, please check the output when the schema is very long. For example, containing 250+ columns.
What changes were proposed in this pull request?
The style of
EXPLAIN FORMATTED
output needs to be improved. We’ve already got some observations/ideas in#27368 (comment)
#27368 (comment)
Observations/Ideas:
Results [4]: …
EXPLAIN EXTENDED
did by callingargString
inTreeNode.simpleString
. This will eliminate most existing minor differences betweenEXPLAIN EXTENDED
andEXPLAIN FORMATTED
.This PR is currently addressing comments 2 & 4, and open for more discussions on improving readability.
Why are the changes needed?
The readability of
EXPLAIN FORMATTED
need to be improved, which will help user better understand the query plan.Does this PR introduce any user-facing change?
Yes,
EXPLAIN FORMATTED
output style changed.How was this patch tested?
Update expect results of test cases in explain.sql