Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 3, 2020

What changes were proposed in this pull request?

Scala:

scala> spark.range(10).explain("cost")
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

PySpark:

>>> spark.range(10).explain("cost")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain
    raise TypeError(err_msg)
TypeError: extended (optional) should be provided as bool, got <class 'str'>

In addition, it is consistent with other codes too, for example, DataFrame.sample also can support DataFrame.sample(1.0) and DataFrame.sample(False).

Why are the changes needed?

To provide the consistent API support across APIs.

Does this PR introduce any user-facing change?

Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set mode as df.explain("...") in Spark 3.1.

After this PR:

>>> spark.range(10).explain("cost")
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

How was this patch tested?

Unittest was added and manually tested as well to make sure:

spark.range(10).explain(True)
spark.range(10).explain(False)
spark.range(10).explain("cost")
spark.range(10).explain(extended="cost")
spark.range(10).explain(mode="cost")
spark.range(10).explain()
spark.range(10).explain(True, "cost")
spark.range(10).explain(1.0)

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123454 has finished for PR 28711 at commit 984f33b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Looks reasonable.

@maropu
Copy link
Member

maropu commented Jun 3, 2020

btw, typo in the description?

This is also consistent with DataFrame.sample case.
                                       ^^^^^

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 3, 2020

Oh, I meant this is also consistent with DataFrame.sample in PySpark and Scala side - supporting DataFrame.sample(1.0) and DataFrame.sample(False). Let me clarify.

@HyukjinKwon
Copy link
Member Author

Thanks @maropu. Merged to master and branch-3.0!

HyukjinKwon added a commit that referenced this pull request Jun 3, 2020
…ase to be consistent with Scala side

### What changes were proposed in this pull request?

Scala:

```scala
scala> spark.range(10).explain("cost")
```
```
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```

PySpark:

```python
>>> spark.range(10).explain("cost")
```
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain
    raise TypeError(err_msg)
TypeError: extended (optional) should be provided as bool, got <class 'str'>
```

In addition, it is consistent with other codes too, for example, `DataFrame.sample` also can support `DataFrame.sample(1.0)` and `DataFrame.sample(False)`.

### Why are the changes needed?

To provide the consistent API support across APIs.

### Does this PR introduce _any_ user-facing change?

Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set `mode` as `df.explain("...")` in Spark 3.1.

After this PR:

```python
>>> spark.range(10).explain("cost")
```
```
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```

### How was this patch tested?

Unittest was added and manually tested as well to make sure:

```python
spark.range(10).explain(True)
spark.range(10).explain(False)
spark.range(10).explain("cost")
spark.range(10).explain(extended="cost")
spark.range(10).explain(mode="cost")
spark.range(10).explain()
spark.range(10).explain(True, "cost")
spark.range(10).explain(1.0)
```

Closes #28711 from HyukjinKwon/SPARK-31895.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit e1d5201)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@maropu
Copy link
Member

maropu commented Jun 3, 2020

hahaha, I see.

HyukjinKwon pushed a commit to databricks/koalas that referenced this pull request Jun 4, 2020
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it.

```py
>>> df.spark.explain("extended")  # doctest: +ELLIPSIS
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
 ...
```
@HyukjinKwon HyukjinKwon deleted the SPARK-31895 branch July 27, 2020 07:44
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it.

```py
>>> df.spark.explain("extended")  # doctest: +ELLIPSIS
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
 ...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants