[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests #46098

xi-db · 2024-04-17T08:22:24Z

What changes were proposed in this pull request?

In the previous PR, we cache plans of AnalyzePlan requests. We're also enabling it for ExecutePlan in this PR.

Why are the changes needed?

Some operations like spark.sql() issue ExecutePlan requests. By caching them, we can also improve performance if subsequent plans to be analyzed include the plan returned by ExecutePlan as a subtree.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

vicennial · 2024-04-17T11:49:19Z

cc @ueshin @zhengruifeng

zhengruifeng · 2024-04-17T12:21:25Z

python/pyspark/sql/tests/connect/test_parity_udf_profiler.py

@@ -35,6 +49,7 @@ def action(df):
        with self.sql_conf({"spark.sql.pyspark.udf.profiler": "perf"}):
            _do_computation(self.spark, action=action)

+        # Without the plan cache, UDF ID will be different for each action


also cc @xinrong-meng to check the profiler tests

hvanhovell · 2024-04-24T14:16:52Z

@xi-db please update the PR.

# Conflicts: # connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala

xi-db · 2024-04-25T09:59:56Z

@xi-db please update the PR.

Hi @hvanhovell, the CI is green and its ready to merge.

hvanhovell · 2024-04-26T17:14:33Z

Merging.

…tPlanner to improve performance of Analyze requests ### What changes were proposed in this pull request? In [the previous PR](apache#46012), we cache plans of AnalyzePlan requests. We're also enabling it for ExecutePlan in this PR. ### Why are the changes needed? Some operations like spark.sql() issue ExecutePlan requests. By caching them, we can also improve performance if subsequent plans to be analyzed include the plan returned by ExecutePlan as a subtree. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46098 from xi-db/SPARK-47818-plan-cache-followup. Authored-by: Xi Lyu <xi.lyu@databricks.com> Signed-off-by: Herman van Hovell <herman@databricks.com>

Cache plan in SparkConnectPlanExecution

1b6a8ef

github-actions bot added SQL CONNECT labels Apr 17, 2024

Fix failed tests

8cbde04

github-actions bot added the PYTHON label Apr 17, 2024

xi-db changed the title ~~[WIP][SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests~~ [SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests Apr 17, 2024

zhengruifeng reviewed Apr 17, 2024

View reviewed changes

HyukjinKwon approved these changes Apr 24, 2024

View reviewed changes

hvanhovell approved these changes Apr 24, 2024

View reviewed changes

Merge branch 'master' into SPARK-47818-plan-cache-followup

f79ee90

# Conflicts: # connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala

hvanhovell closed this in 675f5f0 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests #46098

[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests #46098

xi-db commented Apr 17, 2024 •

edited

Loading

vicennial commented Apr 17, 2024

zhengruifeng Apr 17, 2024

hvanhovell commented Apr 24, 2024

xi-db commented Apr 25, 2024

hvanhovell commented Apr 26, 2024

[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests #46098

[SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests #46098

Conversation

xi-db commented Apr 17, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

vicennial commented Apr 17, 2024

zhengruifeng Apr 17, 2024

Choose a reason for hiding this comment

hvanhovell commented Apr 24, 2024

xi-db commented Apr 25, 2024

hvanhovell commented Apr 26, 2024

xi-db commented Apr 17, 2024 •

edited

Loading