[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711

HyukjinKwon · 2020-06-03T01:16:43Z

What changes were proposed in this pull request?

Scala:

scala> spark.range(10).explain("cost")

== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

PySpark:

>>> spark.range(10).explain("cost")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain
    raise TypeError(err_msg)
TypeError: extended (optional) should be provided as bool, got <class 'str'>

In addition, it is consistent with other codes too, for example, DataFrame.sample also can support DataFrame.sample(1.0) and DataFrame.sample(False).

Why are the changes needed?

To provide the consistent API support across APIs.

Does this PR introduce any user-facing change?

Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set mode as df.explain("...") in Spark 3.1.

After this PR:

>>> spark.range(10).explain("cost")

== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

How was this patch tested?

Unittest was added and manually tested as well to make sure:

spark.range(10).explain(True)
spark.range(10).explain(False)
spark.range(10).explain("cost")
spark.range(10).explain(extended="cost")
spark.range(10).explain(mode="cost")
spark.range(10).explain()
spark.range(10).explain(True, "cost")
spark.range(10).explain(1.0)

…cala side

SparkQA · 2020-06-03T01:57:10Z

Test build #123454 has finished for PR 28711 at commit 984f33b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu

Thanks for the update! Looks reasonable.

maropu · 2020-06-03T02:01:52Z

btw, typo in the description?

This is also consistent with DataFrame.sample case.
                                       ^^^^^

HyukjinKwon · 2020-06-03T03:05:03Z

Oh, I meant this is also consistent with DataFrame.sample in PySpark and Scala side - supporting DataFrame.sample(1.0) and DataFrame.sample(False). Let me clarify.

HyukjinKwon · 2020-06-03T03:06:40Z

Thanks @maropu. Merged to master and branch-3.0!

…ase to be consistent with Scala side ### What changes were proposed in this pull request? Scala: ```scala scala> spark.range(10).explain("cost") ``` ``` == Optimized Logical Plan == Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B) == Physical Plan == *(1) Range (0, 10, step=1, splits=12) ``` PySpark: ```python >>> spark.range(10).explain("cost") ``` ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain raise TypeError(err_msg) TypeError: extended (optional) should be provided as bool, got <class 'str'> ``` In addition, it is consistent with other codes too, for example, `DataFrame.sample` also can support `DataFrame.sample(1.0)` and `DataFrame.sample(False)`. ### Why are the changes needed? To provide the consistent API support across APIs. ### Does this PR introduce _any_ user-facing change? Nope, it's only changes in unreleased branches. If this lands to master only, yes, users will be able to set `mode` as `df.explain("...")` in Spark 3.1. After this PR: ```python >>> spark.range(10).explain("cost") ``` ``` == Optimized Logical Plan == Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B) == Physical Plan == *(1) Range (0, 10, step=1, splits=12) ``` ### How was this patch tested? Unittest was added and manually tested as well to make sure: ```python spark.range(10).explain(True) spark.range(10).explain(False) spark.range(10).explain("cost") spark.range(10).explain(extended="cost") spark.range(10).explain(mode="cost") spark.range(10).explain() spark.range(10).explain(True, "cost") spark.range(10).explain(1.0) ``` Closes #28711 from HyukjinKwon/SPARK-31895. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit e1d5201) Signed-off-by: HyukjinKwon <gurwls223@apache.org>

maropu · 2020-06-03T03:28:11Z

hahaha, I see.

Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it. ```py >>> df.spark.explain("extended") # doctest: +ELLIPSIS == Parsed Logical Plan == ... == Analyzed Logical Plan == ... == Optimized Logical Plan == ... == Physical Plan == ... ```

HyukjinKwon requested a review from maropu June 3, 2020 01:16

probot-autolabeler bot added PYTHON SQL labels Jun 3, 2020

This comment has been minimized.

Sign in to view

Support DataFrame.explain(extended: str) case to be consistent with S…

984f33b

…cala side

HyukjinKwon force-pushed the SPARK-31895 branch from 8d73c4b to 984f33b Compare June 3, 2020 01:24

maropu approved these changes Jun 3, 2020

View reviewed changes

HyukjinKwon closed this in e1d5201 Jun 3, 2020

ueshin mentioned this pull request Jun 3, 2020

Support DataFrame.spark.explain(extended: str) case. databricks/koalas#1563

Merged

zero323 mentioned this pull request Jul 18, 2020

[SPARK-31895] Support DataFrame.explain(extended: str) case to be consistent with Scala side zero323/pyspark-stubs#432

Closed

HyukjinKwon deleted the SPARK-31895 branch July 27, 2020 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711

[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711

HyukjinKwon commented Jun 3, 2020 •

edited

Loading

This comment has been minimized.

SparkQA commented Jun 3, 2020

maropu left a comment

maropu commented Jun 3, 2020

HyukjinKwon commented Jun 3, 2020 •

edited

Loading

HyukjinKwon commented Jun 3, 2020

maropu commented Jun 3, 2020

[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711

[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711

Conversation

HyukjinKwon commented Jun 3, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

This comment has been minimized.

SparkQA commented Jun 3, 2020

maropu left a comment

Choose a reason for hiding this comment

maropu commented Jun 3, 2020

HyukjinKwon commented Jun 3, 2020 • edited Loading

HyukjinKwon commented Jun 3, 2020

maropu commented Jun 3, 2020

HyukjinKwon commented Jun 3, 2020 •

edited

Loading

HyukjinKwon commented Jun 3, 2020 •

edited

Loading