Increase dynamic filter limits for fault tolerant execution #16875

arhimondr · 2023-04-04T14:48:57Z

Description

In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator.

Increasing the limit is safe as the memory used by dynamic filters is tracked.

Additional context and related issues

#16104
#16110

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator. Increasing the limit is safe as the memory used by dynamic filters is tracked.

raunaqmorarka · 2023-04-04T15:22:38Z

Do we have benchmark results showing improvement ?
Increasing limits have be safe in terms of memory, but increasing the distinct values count can result in increased CPU usage due to TypeSet#add

arhimondr · 2023-04-04T15:52:05Z

@raunaqmorarka This problem was discovered when running TPC/DS benchmarks on 10TB partitioned schema.

I've noticed that the CPU is much higher with FTE than with streaming (+20-25%). I started looking more into it, and I realized that dynamic filters are very often not available in FTE.

After increasing the limits CPU went down to the level close to streaming.

Here's a detailed comparison:
df-benchmark.pdf

raunaqmorarka · 2023-04-04T16:01:00Z

core/trino-main/src/main/java/io/trino/execution/DynamicFilterConfig.java

+    public void applyFaultTolerantExecutionDefaults()
+    {
+        smallPartitionedMaxDistinctValuesPerDriver = 100_000;
+        smallPartitionedMaxSizePerDriver = DataSize.of(100, KILOBYTE);


Why not increase *RangeRowLimitPerDriver as well ?
That limit could be kept at 2x the distinct values limit.

raunaqmorarka · 2023-04-04T16:09:09Z

Current limits were specifically tuned to get best results for 1TB partitioned scale. Probably the streaming mode would also improve on 10TB scale if we drastically upped the limits.
Do we want to tune the defaults of streaming for 1TB scale and the defaults of FTE for 10TB scale ?
cc: @sopel39

github-actions · 2023-04-25T22:32:47Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

losipiuk · 2023-04-27T07:13:41Z

@arhimondr @raunaqmorarka can we close this one given #17130?

raunaqmorarka · 2023-04-27T08:03:47Z

@arhimondr @raunaqmorarka can we close this one given #17130?

We need to re-run the FTE sf10k benchmark to find out if the increased limits are sufficient.

arhimondr · 2023-05-18T17:04:47Z

@raunaqmorarka Working on it

arhimondr · 2023-06-09T20:12:32Z

@raunaqmorarka I rerun TPC-DS 10000 and I still see queries that would benefit from higher limits. Opened a new PR: #17831

Increase dynamic filter limits for fault tolerant execution

486c188

In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator. Increasing the limit is safe as the memory used by dynamic filters is tracked.

arhimondr requested review from losipiuk, raunaqmorarka and linzebing April 4, 2023 14:48

cla-bot bot added the cla-signed label Apr 4, 2023

raunaqmorarka reviewed Apr 4, 2023

View reviewed changes

raunaqmorarka mentioned this pull request Apr 19, 2023

Increase default dynamic filter collection limits #17130

Merged

github-actions bot added the stale label Apr 25, 2023

github-actions bot removed the stale label Apr 27, 2023

arhimondr mentioned this pull request Jun 9, 2023

Use non partitioned dynamic filter limits for FTE #17831

Merged

arhimondr closed this Jun 9, 2023

arhimondr deleted the increase-dynamic-filter-size-for-fte branch June 9, 2023 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase dynamic filter limits for fault tolerant execution #16875

Increase dynamic filter limits for fault tolerant execution #16875

arhimondr commented Apr 4, 2023

raunaqmorarka commented Apr 4, 2023

arhimondr commented Apr 4, 2023

raunaqmorarka Apr 4, 2023

raunaqmorarka commented Apr 4, 2023

github-actions bot commented Apr 25, 2023

losipiuk commented Apr 27, 2023

raunaqmorarka commented Apr 27, 2023

arhimondr commented May 18, 2023

arhimondr commented Jun 9, 2023

Increase dynamic filter limits for fault tolerant execution #16875

Increase dynamic filter limits for fault tolerant execution #16875

Conversation

arhimondr commented Apr 4, 2023

Description

Additional context and related issues

Release notes

raunaqmorarka commented Apr 4, 2023

arhimondr commented Apr 4, 2023

raunaqmorarka Apr 4, 2023

Choose a reason for hiding this comment

raunaqmorarka commented Apr 4, 2023

github-actions bot commented Apr 25, 2023

losipiuk commented Apr 27, 2023

raunaqmorarka commented Apr 27, 2023

arhimondr commented May 18, 2023

arhimondr commented Jun 9, 2023