-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Box
es with Arc
in the Expr
enum
.
#9577
Comments
Summary for my note:
|
Exactly. I also think that there are pros and cons of these approaches. For recursive types, I think deep clone problem is more evident. |
It would be an interesting experiment, It has its own |
I don't know how to demonstrate if it is shallow copy in rust playground #[cfg(not(no_global_oom_handling))]
/// Specialize clones into pre-allocated, uninitialized memory.
/// Used by `Box::clone` and `Rc`/`Arc::make_mut`.
pub(crate) trait WriteCloneIntoRaw: Sized {
unsafe fn write_clone_into_raw(&self, target: *mut Self);
}
#[cfg(not(no_global_oom_handling))]
impl<T: Clone> WriteCloneIntoRaw for T {
#[inline]
default unsafe fn write_clone_into_raw(&self, target: *mut Self) {
// Having allocated *first* may allow the optimizer to create
// the cloned value in-place, skipping the local and move.
unsafe { target.write(self.clone()) };
}
}
#[cfg(not(no_global_oom_handling))]
impl<T: Copy> WriteCloneIntoRaw for T {
#[inline]
unsafe fn write_clone_into_raw(&self, target: *mut Self) {
// We can always copy in-place, without ever involving a local value.
unsafe { target.copy_from_nonoverlapping(self, 1) };
}
} Based on https://stackoverflow.com/questions/31012923/what-is-the-difference-between-copy-and-clone, Doc for clone and many random comments. Clone do either shallow copy or deep copy.
I think types like &T, Rc, Arc are shallow copy (clone the pointer only not the underlying data) |
I basically agree with this assessment but I have an alternate proposal for how to improve performance
Here is an example of what I think is a good pattern (there are no copies except when needed) Here is an example of where Expr cloning is being used unnecessarily Thus my suggestion is to go through the planner and remove the calls to This would avoid any changes required for downstream consumers |
Thanks @alamb for your answer. I also think that removing existing unnecessary |
I'm interested in optimizing these (avoid clones), btw What is this profiling tool? |
❤️
I used I have also used hotspot for Linux https://github.com/KDAB/hotspot which has similar capabilities Maybe I should make a video about "how to profile / interpret stack traces to optimize DataFusion" 🤔 |
BTW I think #9140 would be a good first start as the inlist simplifier both uses clones as well as does (yet another) tree walk Maybe we could port it over into the main |
Would be great: video or screens, whatever works and we can attach it to DF docs |
I will try and do this over the next week or so |
I thought about this challenge last night and wrote up my thoughts here: #9637 |
@jayzhan211 and @comphead here is a video showing what I do to profile datafusion: https://youtu.be/P3dXH61Kr5U -- do you think it is worthwhile adding to the docs? |
I'd say its great, thanks @alamb, the font not always clear though, but the video gives the understanding what should be happening. Today/tomorrow I'm planning to add a profiling doc for MacOS only, how to do a profiling and build flamegraphs and also include this Youtube link. Unix and Window related contributors can add their part later |
@alamb Thanks for your video, I think it is really helpful. |
Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
According to following stackoverflow discussion.
Box
s can deep copy when called with.clone()
method (according to.clone()
implementation of the underlying type.).For
Box<Expr>
this is the case. I think this usage might be the reason of some deep stack usages seen during the planning.See related issues: #9375, #8837.
I think, replacing
Box<Expr>
usages withArc<Expr>
under theenum Expr
would improve performance. I am not familiar with the implications of these two approaches in other places. I wonder what community thinks about this change. Would it be better, unnecessary, etc?Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: