-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce clones of LogicalPlan in planner #7775
Conversation
I draft this pr because it seems that we need more thread stack space -- tpcds_physical_q54 meet the problem of thread stack overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the changes, I wonder if we should wrap LogicalPlan
into an Arc
similar to the physical version (even though it's not dyn-dispatch). That would safe stack space and makes cloning very cheap. I think this should also be done for all "child" plans in LogicalPlan
that are currently Box
ed.
One of the tensions is that if we wrapped the plan in Arc it is harder to match plan {
LogicalPlan::Scan(..) => {..}
LogicalPlan::Project(..) => {..}
...
} |
Depends on how you want to match. You can use |
There are some functions taking borrow of plan and return a new plan like: pub fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> If we wrap plan with |
I think pub fn optimize(&self, plan: Arc<LogicalPlan>) -> Result<Arc<LogicalPlan>> If the plan stays the same, you can just pass through the |
I think pub fn optimize(&self, plan: Arc<LogicalPlan>) -> Result<Arc<LogicalPlan>> I agree that sounds like a more sensible plan. |
fyi @sadboy, @schulte-lukas and @wolfram-s |
Hi, just saw this thread. FWIW we (SDF) recently changed all our internal use of What did turn out to have a huge perf impact on our workloads, was the asymptotic behavior of the logical plan constructors. Specifically, many methods in Anyway, tl;dr is that
|
Thank you @sadboy this is great feedback. I wonder if we could / should make "don't error check" type constructors for this kind of optimization Perhaps something like impl ProjectionExec {
// Creates a new projection exec without any error checking. Use this only
// if you know the correct arguments
pub fn try_new_unchecked(
expr: Vec<(Arc<dyn PhysicalExpr>, String)>,
input: Arc<dyn ExecutionPlan>
) -> Result<ProjectionExec, DataFusionError> {
...
}
} |
As a quick and simple solution, that's what I would recommend, yes. More fundamentally, I think the contention arises from the de-facto "dual
As things currently stand, the constructor methods in Ideally, however, I believe these two use cases are different enough that it
Anyway, that's just my $0.02 🙂 (and as I just realized, probably way off topic for |
I think it is a great discussion to have -- I filed #8556 to get it out of this thread (on a closed ticket) into a new issue for hopefully wider discussions |
Rationale for this change
To reduce clone of the logical plan. This pr may have some relation with #5637
And the clone of input plan will be reduced after #4628 closed.
What changes are included in this PR?
Speedup the planner but make some tests slower than before because of some more clones.
Are these changes tested?
yes.
Are there any user-facing changes?
no.