spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled

japila-books · Mar 10, 2024 · a2af337 · a2af337
1 parent 9a4122a
commit a2af337
Show file tree

Hide file tree

Showing 2 changed files with 38 additions and 15 deletions.
diff --git a/docs/SQLConf.md b/docs/SQLConf.md
@@ -948,11 +948,15 @@ Used when:
 
 [spark.sql.optimizer.runtime.bloomFilter.expectedNumItems](configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.expectedNumItems)
 
-## <span id="sessionLocalTimeZone"><span id="SESSION_LOCAL_TIMEZONE"> sessionLocalTimeZone
+## <span id="RUNTIME_ROW_LEVEL_OPERATION_GROUP_FILTER_ENABLED"> runtimeRowLevelOperationGroupFilterEnabled { #runtimeRowLevelOperationGroupFilterEnabled }
+
+[spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled](configuration-properties.md#spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled)
+
+## <span id="SESSION_LOCAL_TIMEZONE"> sessionLocalTimeZone { #sessionLocalTimeZone }
 
 [spark.sql.session.timeZone](configuration-properties.md#spark.sql.session.timeZone)
 
-## <span id="SESSION_WINDOW_BUFFER_IN_MEMORY_THRESHOLD"><span id="sessionWindowBufferInMemoryThreshold"> sessionWindowBufferInMemoryThreshold
+## <span id="SESSION_WINDOW_BUFFER_IN_MEMORY_THRESHOLD"> sessionWindowBufferInMemoryThreshold { #sessionWindowBufferInMemoryThreshold }
 
 [spark.sql.sessionWindow.buffer.in.memory.threshold](configuration-properties.md#spark.sql.sessionWindow.buffer.in.memory.threshold)
 

diff --git a/docs/configuration-properties.md b/docs/configuration-properties.md
@@ -841,34 +841,34 @@ Default: `(undefined)`
 
 Default: `true`
 
-### runtime.bloomFilter.enabled { #spark.sql.optimizer.runtime.bloomFilter.enabled }
+### runtime.bloomFilter.creationSideThreshold { #spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold }
 
-**spark.sql.optimizer.runtime.bloomFilter.enabled**
+**spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold**
 
-Enables a bloom filter on one side of a shuffle join if the other side has a selective predicate (to reduce the amount of shuffle data)
+Size threshold of the bloom filter creation side plan.
+Estimated size needs to be under this value to try to inject bloom filter.
 
-Default: `true`
+Default: `10MB`
 
-Use [SQLConf.runtimeFilterBloomFilterEnabled](SQLConf.md#runtimeFilterBloomFilterEnabled) for the current value
+Use [SQLConf.runtimeFilterCreationSideThreshold](SQLConf.md#runtimeFilterCreationSideThreshold) for the current value
 
 Used when:
 
-* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed
+* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed (to [injectBloomFilter](logical-optimizations/InjectRuntimeFilter.md#injectBloomFilter))
 
-### runtime.bloomFilter.creationSideThreshold { #spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold }
+### runtime.bloomFilter.enabled { #spark.sql.optimizer.runtime.bloomFilter.enabled }
 
-**spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold**
+**spark.sql.optimizer.runtime.bloomFilter.enabled**
 
-Size threshold of the bloom filter creation side plan.
-Estimated size needs to be under this value to try to inject bloom filter.
+Enables a bloom filter on one side of a shuffle join if the other side has a selective predicate (to reduce the amount of shuffle data)
 
-Default: `10MB`
+Default: `true`
 
-Use [SQLConf.runtimeFilterCreationSideThreshold](SQLConf.md#runtimeFilterCreationSideThreshold) for the current value
+Use [SQLConf.runtimeFilterBloomFilterEnabled](SQLConf.md#runtimeFilterBloomFilterEnabled) for the current value
 
 Used when:
 
-* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed (to [injectBloomFilter](logical-optimizations/InjectRuntimeFilter.md#injectBloomFilter))
+* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed
 
 ### runtime.bloomFilter.expectedNumItems { #spark.sql.optimizer.runtime.bloomFilter.expectedNumItems }
 
@@ -900,6 +900,25 @@ Used when:
 
 * `BloomFilterAggregate` is requested to [checkInputDataTypes](expressions/BloomFilterAggregate.md#checkInputDataTypes) and for the [numBits](expressions/BloomFilterAggregate.md#numBits)
 
+### <span id="RUNTIME_ROW_LEVEL_OPERATION_GROUP_FILTER_ENABLED"> runtime.rowLevelOperationGroupFilter.enabled { #spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled }
+
+**spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled**
+
+Enables runtime group filtering for group-based row-level operations.
+
+Data sources that replace groups of data (e.g. files, partitions) may prune entire groups using provided data source filters when planning a row-level operation scan.
+However, such filtering is limited as not all expressions can be converted into data source filters and some expressions can only be evaluated by Spark (e.g. subqueries).
+Since rewriting groups is expensive, Spark can execute a query at runtime to find what records match the condition of the row-level operation.
+The information about matching records will be passed back to the row-level operation scan, allowing data sources to discard groups that don't have to be rewritten.
+
+Default: `true`
+
+Current value: [SQLConf.runtimeRowLevelOperationGroupFilterEnabled](SQLConf.md#runtimeRowLevelOperationGroupFilterEnabled)
+
+Used when:
+
+* `RowLevelOperationRuntimeGroupFiltering` logical optimization is executed
+
 ### runtimeFilter.number.threshold { #spark.sql.optimizer.runtimeFilter.number.threshold }
 
 **spark.sql.optimizer.runtimeFilter.number.threshold**