-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase dynamic filter limits for fault tolerant execution #16875
Increase dynamic filter limits for fault tolerant execution #16875
Conversation
In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator. Increasing the limit is safe as the memory used by dynamic filters is tracked.
Do we have benchmark results showing improvement ? |
@raunaqmorarka This problem was discovered when running TPC/DS benchmarks on 10TB partitioned schema. I've noticed that the CPU is much higher with FTE than with streaming (+20-25%). I started looking more into it, and I realized that dynamic filters are very often not available in FTE. After increasing the limits CPU went down to the level close to streaming. Here's a detailed comparison: |
public void applyFaultTolerantExecutionDefaults() | ||
{ | ||
smallPartitionedMaxDistinctValuesPerDriver = 100_000; | ||
smallPartitionedMaxSizePerDriver = DataSize.of(100, KILOBYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not increase *RangeRowLimitPerDriver
as well ?
That limit could be kept at 2x the distinct values limit.
Current limits were specifically tuned to get best results for 1TB partitioned scale. Probably the streaming mode would also improve on 10TB scale if we drastically upped the limits. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@arhimondr @raunaqmorarka can we close this one given #17130? |
We need to re-run the FTE sf10k benchmark to find out if the increased limits are sufficient. |
@raunaqmorarka Working on it |
@raunaqmorarka I rerun TPC-DS 10000 and I still see queries that would benefit from higher limits. Opened a new PR: #17831 |
Description
In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator.
Increasing the limit is safe as the memory used by dynamic filters is tracked.
Additional context and related issues
#16104
#16110
Release notes
(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: