-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster key statistics in block meta #5194
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
@@ -258,6 +273,62 @@ impl DataValue { | |||
} | |||
} | |||
|
|||
impl Eq for DataValue {} | |||
|
|||
impl Ord for DataValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can define a meaningful total order relation between DataValue
s.
@sundy-li any comments about this? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sundy-li, shall/can we reuse the impl<S> ChangeIf<S> for CmpMin/Max
for the comparession of DataValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregate_scalar_state
is generic comparision impls without matching each DataValue
, so it's optimized by performance.
In cluster key
, we already have min/max indexes, so we don't need to copy this indexes into a Column and apply eval_aggr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregate_scalar_state
is generic comparision impls without matching eachDataValue
, so it's optimized by performance...
got it.
Two other concerns:
- Although we can define our own total order for DataValue (even for weird things like NaN...)
but a PartialOrder may be easier to define (or just use rust's derive mechanism) - The min/max logic is also implemented by the aggregate_min/max
It would be better if these two keep in coherent.
A suggestion:
-
derive
PartialOrd
for DataValue -
implements the min/max like the
aggregate_scalar_state
does
maybe something like this (hope it is doing the same thing as aggregate min max does:-)let min = min_stats .iter() .filter(|x| !x.is_null()) .min_by(|x, y| min.partial_cmp(&r).unwrap_or(Ordering::Equal)) .unwrap_or(DataValue::Null); let max = max_stats .iter() .filter(|x| !x.is_null()) .max_by(|x, y| max.partial_cmp(&r).unwrap_or(Ordering::Equal)) .unwrap_or(DataValue::Null);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logics want to follow:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dantengsky . This is a great suggestion. I will try it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, hope it helps.
As long as the format kept backward compatible, I think we could merge this first and polish it later.
There is a CI error:
https://github.com/datafuselabs/databend/runs/6318429474?check_suite_focus=true#step:3:2198 |
/LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Add cluster key statistics in block meta.
Changelog
Related Issues
Fixes #issue