-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711
Conversation
This comment has been minimized.
This comment has been minimized.
Test build #123454 has finished for PR 28711 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Looks reasonable.
btw, typo in the description?
|
Oh, I meant this is also consistent with |
Thanks @maropu. Merged to master and branch-3.0! |
…ase to be consistent with Scala side ### What changes were proposed in this pull request? Scala: ```scala scala> spark.range(10).explain("cost") ``` ``` == Optimized Logical Plan == Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B) == Physical Plan == *(1) Range (0, 10, step=1, splits=12) ``` PySpark: ```python >>> spark.range(10).explain("cost") ``` ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain raise TypeError(err_msg) TypeError: extended (optional) should be provided as bool, got <class 'str'> ``` In addition, it is consistent with other codes too, for example, `DataFrame.sample` also can support `DataFrame.sample(1.0)` and `DataFrame.sample(False)`. ### Why are the changes needed? To provide the consistent API support across APIs. ### Does this PR introduce _any_ user-facing change? Nope, it's only changes in unreleased branches. If this lands to master only, yes, users will be able to set `mode` as `df.explain("...")` in Spark 3.1. After this PR: ```python >>> spark.range(10).explain("cost") ``` ``` == Optimized Logical Plan == Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B) == Physical Plan == *(1) Range (0, 10, step=1, splits=12) ``` ### How was this patch tested? Unittest was added and manually tested as well to make sure: ```python spark.range(10).explain(True) spark.range(10).explain(False) spark.range(10).explain("cost") spark.range(10).explain(extended="cost") spark.range(10).explain(mode="cost") spark.range(10).explain() spark.range(10).explain(True, "cost") spark.range(10).explain(1.0) ``` Closes #28711 from HyukjinKwon/SPARK-31895. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit e1d5201) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
hahaha, I see. |
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it. ```py >>> df.spark.explain("extended") # doctest: +ELLIPSIS == Parsed Logical Plan == ... == Analyzed Logical Plan == ... == Optimized Logical Plan == ... == Physical Plan == ... ```
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it. ```py >>> df.spark.explain("extended") # doctest: +ELLIPSIS == Parsed Logical Plan == ... == Analyzed Logical Plan == ... == Optimized Logical Plan == ... == Physical Plan == ... ```
What changes were proposed in this pull request?
Scala:
PySpark:
In addition, it is consistent with other codes too, for example,
DataFrame.sample
also can supportDataFrame.sample(1.0)
andDataFrame.sample(False)
.Why are the changes needed?
To provide the consistent API support across APIs.
Does this PR introduce any user-facing change?
Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set
mode
asdf.explain("...")
in Spark 3.1.After this PR:
How was this patch tested?
Unittest was added and manually tested as well to make sure: