-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PhysicalExpr Orderings with Range Information #10504
Conversation
Update configs.md
I haven't had a chance to review this PR yet @berkaysynnada -- I wonder if you have seen the API in #10117 from @tinfoil-knight |
Yes, I have. It is becoming a nicer and more useful API for the current version of monotonicity. However, I believe we will eventually need to carry this range data along with the expressions to be able to perform deeper analyses and optimizations. My initial motivation was just to solve the CastExpr bug, but then I realized that this bug is part of a larger need. I am, of course, open to iterating over this PR. |
@berkaysynnada kindly reminded me that the type undergoing the refactor disappears in this extended formulation. In that case this PR may supersede the refactor one. I will review this one in detail tomorrow and expand further. |
("a".to_string(), "a".to_string()), | ||
("b".to_string(), "b".to_string()), | ||
("c".to_string(), "c".to_string()), | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test bug shows up, fixed here.
Indeed, though I still see /// Calculates the [`SortProperties`] of this function based on its children's properties.
fn monotonicity(&self, _input: &[ExprProperties]) -> Result<SortProperties> {
Ok(SortProperties::Unordered)
} In this PR |
This is a pretty clever idea. I think as long as we ensure that it is easy to implement the common patterns of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this carefully and sent two commits improving on it. This is a fantastic PR that hits two birds in one stone:
- It fixes the
CAST
bug, - It enables us to perform more optimizations w.r.t. monotonicity by utilizing range information (and obviating the need for a type like
FuncMonotonicity
in the process).
I only have a few asks and then we can merge this:
- The monotonicity definition example in
advanced_udf.rs
seems to be removed. If possible, let's add it back in to serve as an example to users. - Similar comment for
function_factory.rs
. - If all the inputs of a function are
Singleton
, should the outputSortProperties
also beSingleton
? If so, maybe we can do this check once before calling individual monotonicity functions.
@alamb, it'd be good if you could take a quick look before we merge (but not necessary as I reviewed carefully). FYI, defining an "all-increasing" function becomes a one-liner in this approach, so your point is incorporated in this design as well.
Sounds awesome -- I will try and do so tomorrow but if I don't get a chance feel free to merge it and I'll review afterwards. Thank you @ozankabak and @berkaysynnada |
For the most functions, it is so, but there could be edge cases like some time dependent functions or somehow depending randomly generated values. We have discussed it and decided to keep it as it is. I have addressed other minor issues, and this PR is ready to be merged. |
I will go ahead and merge this and we will fix any issues with a quick follow-on PR in case @alamb discovers any when he has time to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @berkaysynnada -- I looked at the API a little bit more today, and I think we may want to consider renaming ScalarUDF::monotonicity
to better reflect what it does
The overall idea of allowing scalar UDFs to be part of the interval /boundary analysis is really nice
|
||
/// Calculates the [`SortProperties`] of this function based on its | ||
/// children's properties. | ||
fn monotonicity(&self, _inputs: &[ExprProperties]) -> Result<SortProperties> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we release this PR, I wonder if we should call this function sort_properties
or calculate_sort
, rather than monotonicty
Calling it monotonicty
may cause upgrade pain as the signature (types) changed from
/// This function specifies monotonicity behaviors for User defined scalar functions.
fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(None)
}
It also seems like monotonicity
doesn't accurately reflect what it does any more, though I may misunderstand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I can rename it as calculate_order
if you think it is explanatory enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just order()
or output_ordering()
(to mirror ExecutionPlan::output_ordering)?
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
use arrow::datatypes::DataType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see many (any?) tests for this new code
Specifically, I am imagining tests like "if we broke / introduced a bug in one of these implementations would a test fail"
Did I miss them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. These functions were not tested with the old API, and they haven't been tested now either. Should we add unit tests for them in this file, or would it be better to cover them in the .slt tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I would recommend .slt tests (that show sorts/not sorts for exmaple), but I am not sure if you have sufficient bandwidth to do so. Maybe unit tests would be best.
* Self review * Fix null interval accumulation * Refactor monotonicity * Ignore failing tests * Initial impl * Ready for review * Update properties.rs * Update configs.md Update configs.md * cargo doc * Add abs test * Update properties.rs * Update udf.rs * Review Part 1 * Review Part 2 * Minor --------- Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Which issue does this PR close?
Closes #9832.
Rationale for this change
There exists a bug in
CastExpr
orderings, as detailed in the issue. To solve this problem, we need to appendDataType
information of children in theget_ordering
method ofPhysicalExpr
's. However, this problem can be solved in a more clever way offering more functionality.I propose to extend
get_ordering
method toget_properties
such that it can also carry the range information of expressions, which could have an important meaning for expressions when investigating the order information.Having range information would help
ScalarFunctionExpr
's order calculations since many of them have monotonicity pattern on some defined intervals. I have given an example of it for ABS() function.CastExpr
bug is also solved since the range carries the datatype information.What changes are included in this PR?
ScalarFunctionExpr's
in interval arithmetic, we will have alsoevaluate_bounds()
andpropagate_constraints()
methods inScalarUDF
.Are these changes tested?
Yes, in order.slt
Are there any user-facing changes?