-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance regression with stddev
being enabled by default
#840
Conversation
stddev
being enabled by default
fn prep_join_input(plan: Arc<dyn ExecutionPlan>) -> Arc<dyn ExecutionPlan> { | ||
if can_reuse_input_batch(&plan) { | ||
Arc::new(CopyExec::new(plan, CopyMode::UnpackOrDeepCopy)) | ||
} else { | ||
Arc::new(CopyExec::new(plan, CopyMode::UnpackOrClone)) | ||
} | ||
} | ||
|
||
let left = prep_join_input(left); | ||
let right = prep_join_input(right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code cleanup: no functional change
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #840 +/- ##
============================================
- Coverage 34.05% 34.04% -0.01%
+ Complexity 879 861 -18
============================================
Files 112 112
Lines 42959 42912 -47
Branches 9488 9473 -15
============================================
- Hits 14629 14609 -20
+ Misses 25330 25312 -18
+ Partials 3000 2991 -9 ☔ View full report in Codecov by Sentry. |
@parthchandra Could I get your opinion on the config changes in this PR? I'm not sure this is the best approach but it was previously not possible to have a default value of |
Would it be better to have have an exclusion list (does not have to be a config, could just be an internal list) which is explicitly checked if EXEC_ALL is enabled? |
stddev
being enabled by defaultstddev
being enabled by default
There is an upstream PR in DataFusion to improve stddev performance |
I am closing this issue because we can disable this expression via a config now that #855 is merged |
Which issue does this PR close?
Related to #824
Rationale for this change
Fix a performance regression and simplify configs for enabling operators and expressions
We now fall back to Spark for
stddev_sample
aggregates.What changes are included in this PR?
stddev
is now disabled by default. There was a recent regression related to configs that had enabled this by default, and this operation is much slower in DataFusion than in Spark.spark.comet.exec.all.enabled
and enable all operators by default. Each operator can be disabled individually be changing it's enabled config to false. These are all documented.How are these changes tested?
Existing tests