Skip to content

Commit

Permalink
[SPARK-45755][SQL] Improve Dataset.isEmpty() by applying global lim…
Browse files Browse the repository at this point in the history
…it `1`

### What changes were proposed in this pull request?

This PR makes `Dataset.isEmpty()` to execute global limit 1 first. `LimitPushDown` may push down global limit 1 to lower nodes to improve query performance.

Note that we use global limit 1 here, because the local limit cannot be pushed down the group only case: https://github.com/apache/spark/blob/89ca8b6065e9f690a492c778262080741d50d94d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L766-L770

### Why are the changes needed?

Improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing:
```scala
spark.range(300000000).selectExpr("id", "array(id, id % 10, id % 100) as eo").write.saveAsTable("t1")
spark.range(100000000).selectExpr("id", "array(id, id % 10, id % 1000) as eo").write.saveAsTable("t2")
println(spark.sql("SELECT * FROM t1 LATERAL VIEW explode_outer(eo) AS e UNION SELECT * FROM t2 LATERAL VIEW explode_outer(eo) AS e").isEmpty)
```

Before this PR | After this PR
-- | --
<img width="430" alt="image" src="https://github.com/apache/spark/assets/5399861/417adc05-4160-4470-b63c-125faac08c9c"> | <img width="430" alt="image" src="https://github.com/apache/spark/assets/5399861/bdeff231-e725-4c55-9da2-1b4cd59ec8c8">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43617 from wangyum/SPARK-45755.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <yumwang@apache.org>
Signed-off-by: Jiaan Geng <beliefer@163.com>
  • Loading branch information
2 people authored and beliefer committed Nov 1, 2023
1 parent e6b4fa8 commit c7bba9b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -652,7 +652,7 @@ class Dataset[T] private[sql](
* @group basic
* @since 2.4.0
*/
def isEmpty: Boolean = withAction("isEmpty", select().queryExecution) { plan =>
def isEmpty: Boolean = withAction("isEmpty", select().limit(1).queryExecution) { plan =>
plan.executeTake(1).isEmpty
}

Expand Down

0 comments on commit c7bba9b

Please sign in to comment.