-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(swordfish): Optimize grouped aggregations #3534
Conversation
CodSpeed Performance ReportMerging #3534 will degrade performances by 41.94%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3534 +/- ##
========================================
Coverage 77.81% 77.81%
========================================
Files 716 716
Lines 87987 88241 +254
========================================
+ Hits 68463 68667 +204
- Misses 19524 19574 +50
|
Full TPCH SF10 queries on swordfish before and after this optimization.
|
// This is the threshold for when we should aggregate the unaggregated partitions | ||
const PARTIAL_AGG_THRESHOLD: usize = 10_000; | ||
// This is the maximum number of partitions we can have | ||
const MAX_NUM_PARTITIONS: usize = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we cap this rather than always take num cores?
Not a suggestion, just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it would be bad for low cardinality aggs if there's too many partitions. But I modified the algorithm to cater for low cardinality aggs, so this is removed.
const HIGH_CARDINALITY_THRESHOLD_RATIO: f64 = 0.8; | ||
|
||
fn new(num_partitions: usize) -> Self { | ||
let inner_states = (0..num_partitions).map(|_| None).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also do vec![None; num_partitions]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vec!
macro requires SinglePartitionAggregateState
to impl Clone, which it doesn't because Micropartition
doesn't implement it.
strategy.execute_strategy(inner_states, input, params)?; | ||
} | ||
// Otherwise, determine a strategy based on the input data. | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an issue I see here is that this decision is determined at a per partition basis rather than globally.
This means that as a morsel goes into each core, the decision is made independently of one another so you can have multiple different strategies at the same time.
I think we can instead use a Mutex to protect the strategy (shared across all partition) and then we can cache it in the state after we read it once since it's not going to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, implemented. Tested and found no regressions
} | ||
|
||
// Otherwise, determine the strategy and execute | ||
let decided_strategy = determine_agg_strategy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be cleaner to have this as a method and it performs the caching of the decided_strategy
. that way the code could look like
let decided_strategy = self.determine_agg_strategy();
decided_strategy.execute_strategy(inner_states, input, params)?;
Optimize swordfish grouped aggs for high cardinality groups
Approach
There's 3 strategies for grouped aggs:
N
partitions, then do a partial agg. (good for high cardinality).N
partitions. (good for low cardinality). Can be optimized with Sharded probe table #3556Notes on alternative approaches
Benchmarks
MrPowers Benchmarks results (seconds, lower is better).