-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust): Tighten up error checking on join keys #17517
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17517 +/- ##
==========================================
- Coverage 80.69% 80.68% -0.01%
==========================================
Files 1484 1484
Lines 195421 195417 -4
Branches 2782 2782
==========================================
- Hits 157695 157679 -16
- Misses 37214 37226 +12
Partials 512 512 ☔ View full report in Codecov by Sentry. |
// Otherwise every expression must be elementwise | ||
// so that we are guaranteed the keys for a join | ||
// are all the same length. | ||
|| aexprs.iter().all(|expr_ir| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use has_aexpr
here.
I do think we should be stricter here. For instance a head
is not allowed as its length doesn't match the dataframe. Probably allowing only elementwise makes most sense, as it doesn't seem sensible to join by a col(..).sort()
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we should be stricter here.
Should I drop the aexprs.len() <= 1
part of the condition?
Is there a full classification of "elementwise" yet? As you note, sort
answers "yes" to single_aexpr_is_elementwise
, but should probably (?) be rejected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove single_aexpr_is_elementwise
and replace it with is_streamable
. It seems redundant. is_streamable
is more correctly implemented as well.
Should I drop the aexprs.len() <= 1 part of the condition?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Otherwise every expression must be elementwise | ||
// so that we are guaranteed the keys for a join | ||
// are all the same length. | ||
|| aexprs.iter().all(|expr_ir| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
}; | ||
single_aexpr_is_elementwise(ae) | ||
}); | ||
let is_elementwise = is_streamable(expr_ir.node(), arena, Context::Default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Should one perhaps introduce an aliasing name (is_elementwise
?) to allow for the option that the implementation of is_streamable
(which might encompass more than just elementwise) diverges from the usage here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, might be more future proof indeed. Currently we see it as anything that is elementwise and can operate on batches.
If more than one join key is provided, they should all be elementwise so that all computed join keys for a given table are the same length. We use is_streamable to define if an AExpr is elementwise
A key-coalescing join is only supported if all join keys are column references. If they are not, we turn off coalescing. Join in the case that the user explicitly requested coalescing but is not going to get it.
Use is_streamable instead in the one remaining place it is used (slice pushdown).
After discussion in #17184: