Skip to content

Commit

Permalink
[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveEx…
Browse files Browse the repository at this point in the history
…pression

### What changes were proposed in this pull request?
- This PR instead of calling `conf.resolver` for each call in `resolveExpression`, reuses the `resolver` obtained once.

### Why are the changes needed?
- Consider a view with large number of columns (~1000s). When looking at the RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT * FROM large_view`, observed that a large fraction of time is spent in `ResolveReferences` and `ResolveRelations`. Of these, the majority of the driver time went in initializing the `conf` to obtain `conf.resolver` for each of the column in the view.
- Since, the same `conf` is used in each of these calls, calling the `conf.resolver` again and again can be avoided by initializing it once and reusing the same resolver.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Created a dummy view with 3000 columns.
- Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`.
- `RuleExecutor` metrics before this change (after multiple runs)
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 8.026801698 seconds

Rule                                                                                    Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                        4060159342 / 4062186814                         1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                       3789405037 / 3809203288                         2 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule        0 / 20741164                                    0 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                  17800584 / 19431350                             1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                           15036018 / 15060440                             1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                       0 / 14929810                                    0 / 7
```
- `RuleExecutor` metrics after this change (after multiple runs)
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 2.892630859 seconds

Rule                                                                                    Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                        1490357745 / 1492398446                         1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                       1212205822 / 1241729981                         2 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule        0 / 23857161                                    0 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                  16603250 / 18806065                             1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                       0 / 16749306                                    0 / 7
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                           11158299 / 11183593                             1 / 6
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46248 from nikhilsheoran-db/SPARK-48010.

Authored-by: Nikhil Sheoran <125331115+nikhilsheoran-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
nikhilsheoran-db authored and dongjoon-hyun committed Apr 26, 2024
1 parent 9cf6dc8 commit 6098bd9
Showing 1 changed file with 5 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,9 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
getAttrCandidates: () => Seq[Attribute],
throws: Boolean,
includeLastResort: Boolean): Expression = {

val resolver = conf.resolver

def innerResolve(e: Expression, isTopLevel: Boolean): Expression = withOrigin(e.origin) {
if (e.resolved) return e
val resolved = e match {
Expand All @@ -149,7 +152,7 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
case GetViewColumnByNameAndOrdinal(
viewName, colName, ordinal, expectedNumCandidates, viewDDL) =>
val attrCandidates = getAttrCandidates()
val matched = attrCandidates.filter(a => conf.resolver(a.name, colName))
val matched = attrCandidates.filter(a => resolver(a.name, colName))
if (matched.length != expectedNumCandidates) {
throw QueryCompilationErrors.incompatibleViewSchemaChangeError(
viewName, colName, expectedNumCandidates, matched, viewDDL)
Expand Down Expand Up @@ -183,7 +186,7 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
case u @ UnresolvedExtractValue(child, fieldName) =>
val newChild = innerResolve(child, isTopLevel = false)
if (newChild.resolved) {
ExtractValue(newChild, fieldName, conf.resolver)
ExtractValue(newChild, fieldName, resolver)
} else {
u.copy(child = newChild)
}
Expand Down

0 comments on commit 6098bd9

Please sign in to comment.