Skip to content

Commit

Permalink
spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Mar 10, 2024
1 parent 9a4122a commit a2af337
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 15 deletions.
8 changes: 6 additions & 2 deletions docs/SQLConf.md
Original file line number Diff line number Diff line change
Expand Up @@ -948,11 +948,15 @@ Used when:

[spark.sql.optimizer.runtime.bloomFilter.expectedNumItems](configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.expectedNumItems)

## <span id="sessionLocalTimeZone"><span id="SESSION_LOCAL_TIMEZONE"> sessionLocalTimeZone
## <span id="RUNTIME_ROW_LEVEL_OPERATION_GROUP_FILTER_ENABLED"> runtimeRowLevelOperationGroupFilterEnabled { #runtimeRowLevelOperationGroupFilterEnabled }

[spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled](configuration-properties.md#spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled)

## <span id="SESSION_LOCAL_TIMEZONE"> sessionLocalTimeZone { #sessionLocalTimeZone }

[spark.sql.session.timeZone](configuration-properties.md#spark.sql.session.timeZone)

## <span id="SESSION_WINDOW_BUFFER_IN_MEMORY_THRESHOLD"><span id="sessionWindowBufferInMemoryThreshold"> sessionWindowBufferInMemoryThreshold
## <span id="SESSION_WINDOW_BUFFER_IN_MEMORY_THRESHOLD"> sessionWindowBufferInMemoryThreshold { #sessionWindowBufferInMemoryThreshold }

[spark.sql.sessionWindow.buffer.in.memory.threshold](configuration-properties.md#spark.sql.sessionWindow.buffer.in.memory.threshold)

Expand Down
45 changes: 32 additions & 13 deletions docs/configuration-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -841,34 +841,34 @@ Default: `(undefined)`

Default: `true`

### runtime.bloomFilter.enabled { #spark.sql.optimizer.runtime.bloomFilter.enabled }
### runtime.bloomFilter.creationSideThreshold { #spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold }

**spark.sql.optimizer.runtime.bloomFilter.enabled**
**spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold**

Enables a bloom filter on one side of a shuffle join if the other side has a selective predicate (to reduce the amount of shuffle data)
Size threshold of the bloom filter creation side plan.
Estimated size needs to be under this value to try to inject bloom filter.

Default: `true`
Default: `10MB`

Use [SQLConf.runtimeFilterBloomFilterEnabled](SQLConf.md#runtimeFilterBloomFilterEnabled) for the current value
Use [SQLConf.runtimeFilterCreationSideThreshold](SQLConf.md#runtimeFilterCreationSideThreshold) for the current value

Used when:

* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed
* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed (to [injectBloomFilter](logical-optimizations/InjectRuntimeFilter.md#injectBloomFilter))

### runtime.bloomFilter.creationSideThreshold { #spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold }
### runtime.bloomFilter.enabled { #spark.sql.optimizer.runtime.bloomFilter.enabled }

**spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold**
**spark.sql.optimizer.runtime.bloomFilter.enabled**

Size threshold of the bloom filter creation side plan.
Estimated size needs to be under this value to try to inject bloom filter.
Enables a bloom filter on one side of a shuffle join if the other side has a selective predicate (to reduce the amount of shuffle data)

Default: `10MB`
Default: `true`

Use [SQLConf.runtimeFilterCreationSideThreshold](SQLConf.md#runtimeFilterCreationSideThreshold) for the current value
Use [SQLConf.runtimeFilterBloomFilterEnabled](SQLConf.md#runtimeFilterBloomFilterEnabled) for the current value

Used when:

* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed (to [injectBloomFilter](logical-optimizations/InjectRuntimeFilter.md#injectBloomFilter))
* [InjectRuntimeFilter](logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed

### runtime.bloomFilter.expectedNumItems { #spark.sql.optimizer.runtime.bloomFilter.expectedNumItems }

Expand Down Expand Up @@ -900,6 +900,25 @@ Used when:

* `BloomFilterAggregate` is requested to [checkInputDataTypes](expressions/BloomFilterAggregate.md#checkInputDataTypes) and for the [numBits](expressions/BloomFilterAggregate.md#numBits)

### <span id="RUNTIME_ROW_LEVEL_OPERATION_GROUP_FILTER_ENABLED"> runtime.rowLevelOperationGroupFilter.enabled { #spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled }

**spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled**

Enables runtime group filtering for group-based row-level operations.

Data sources that replace groups of data (e.g. files, partitions) may prune entire groups using provided data source filters when planning a row-level operation scan.
However, such filtering is limited as not all expressions can be converted into data source filters and some expressions can only be evaluated by Spark (e.g. subqueries).
Since rewriting groups is expensive, Spark can execute a query at runtime to find what records match the condition of the row-level operation.
The information about matching records will be passed back to the row-level operation scan, allowing data sources to discard groups that don't have to be rewritten.

Default: `true`

Current value: [SQLConf.runtimeRowLevelOperationGroupFilterEnabled](SQLConf.md#runtimeRowLevelOperationGroupFilterEnabled)

Used when:

* `RowLevelOperationRuntimeGroupFiltering` logical optimization is executed

### runtimeFilter.number.threshold { #spark.sql.optimizer.runtimeFilter.number.threshold }

**spark.sql.optimizer.runtimeFilter.number.threshold**
Expand Down

0 comments on commit a2af337

Please sign in to comment.