Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-45755][SQL] Improve
Dataset.isEmpty()
by applying global lim…
…it `1` ### What changes were proposed in this pull request? This PR makes `Dataset.isEmpty()` to execute global limit 1 first. `LimitPushDown` may push down global limit 1 to lower nodes to improve query performance. Note that we use global limit 1 here, because the local limit cannot be pushed down the group only case: https://github.com/apache/spark/blob/89ca8b6065e9f690a492c778262080741d50d94d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L766-L770 ### Why are the changes needed? Improve query performance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual testing: ```scala spark.range(300000000).selectExpr("id", "array(id, id % 10, id % 100) as eo").write.saveAsTable("t1") spark.range(100000000).selectExpr("id", "array(id, id % 10, id % 1000) as eo").write.saveAsTable("t2") println(spark.sql("SELECT * FROM t1 LATERAL VIEW explode_outer(eo) AS e UNION SELECT * FROM t2 LATERAL VIEW explode_outer(eo) AS e").isEmpty) ``` Before this PR | After this PR -- | -- <img width="430" alt="image" src="https://github.com/apache/spark/assets/5399861/417adc05-4160-4470-b63c-125faac08c9c"> | <img width="430" alt="image" src="https://github.com/apache/spark/assets/5399861/bdeff231-e725-4c55-9da2-1b4cd59ec8c8"> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43617 from wangyum/SPARK-45755. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <yumwang@apache.org> Signed-off-by: Jiaan Geng <beliefer@163.com>
- Loading branch information