Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveEx…
…pression ### What changes were proposed in this pull request? - This PR instead of calling `conf.resolver` for each call in `resolveExpression`, reuses the `resolver` obtained once. ### Why are the changes needed? - Consider a view with large number of columns (~1000s). When looking at the RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT * FROM large_view`, observed that a large fraction of time is spent in `ResolveReferences` and `ResolveRelations`. Of these, the majority of the driver time went in initializing the `conf` to obtain `conf.resolver` for each of the column in the view. - Since, the same `conf` is used in each of these calls, calling the `conf.resolver` again and again can be avoided by initializing it once and reusing the same resolver. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Created a dummy view with 3000 columns. - Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`. - `RuleExecutor` metrics before this change (after multiple runs) ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs: 1483 Total time: 8.026801698 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 4060159342 / 4062186814 1 / 6 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 3789405037 / 3809203288 2 / 6 org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule 0 / 20741164 0 / 6 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 17800584 / 19431350 1 / 6 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 15036018 / 15060440 1 / 6 org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability 0 / 14929810 0 / 7 ``` - `RuleExecutor` metrics after this change (after multiple runs) ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs: 1483 Total time: 2.892630859 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 1490357745 / 1492398446 1 / 6 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 1212205822 / 1241729981 2 / 6 org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule 0 / 23857161 0 / 6 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 16603250 / 18806065 1 / 6 org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability 0 / 16749306 0 / 7 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 11158299 / 11183593 1 / 6 ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46248 from nikhilsheoran-db/SPARK-48010. Authored-by: Nikhil Sheoran <125331115+nikhilsheoran-db@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
- Loading branch information