Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert scalar value to correct type based on arrow data type. #336

Merged
merged 3 commits into from
Jul 26, 2021

Conversation

viirya
Copy link
Contributor

@viirya viirya commented Jul 26, 2021

Description

The description of the main changes of your pull request

This is a follow up of #327 for the comment #327 (comment).
For the max/min column scalar values, we convert it to correct ScarlarValue based on arrow datatype.

Related Issue(s)

Documentation

@viirya
Copy link
Contributor Author

viirya commented Jul 26, 2021

I tried to add a test on date type with delta-0.8.0-date table. But the query like SELECT max(date), min(date) FROM date_table throws an error:

Error: Internal("Unsupported CAST from Date32 to Utf8")

Seems the aggregation on date types is not ready?

I can look at it later.

@houqp
Copy link
Member

houqp commented Jul 26, 2021

Seems the aggregation on date types is not ready?

Yes, datafusion only supports string, numeric and timestamp types for min max aggregation at the moment. We need to update physical_plan/expressions/min_max.rs and physical_plan/aggregates.rs to add date support.

houqp
houqp previously approved these changes Jul 26, 2021
ScalarValue::from(raw_value)
}
ArrowDataType::Int32 => {
let raw_value = i64::try_from(value).unwrap() as i32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we prefer downcasting from i64 to i32 over direct i32::try_from(value) call?

Copy link
Contributor Author

@viirya viirya Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i32::try_from call only applies on an input of ScalarValue::Int32. For number scalar value, we all use ScalarValue::Int64 to compare to get max/min values. That's said here the input is ScalarValue::Int64 and i32::try_from will throw Cannot convert ... error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, i see. perhaps in the future, we could change to_scalar_value to do the scalar value type cast based on field type so we can avoid this intermediate 64 bits conversion overhead.

@houqp
Copy link
Member

houqp commented Jul 26, 2021

FYI @Dandandan

@viirya
Copy link
Contributor Author

viirya commented Jul 26, 2021

Seems the aggregation on date types is not ready?

Yes, datafusion only supports string, numeric and timestamp types for min max aggregation at the moment. We need to update physical_plan/expressions/min_max.rs and physical_plan/aggregates.rs to add date support.

Thanks for the info. I may take some time looking at it later.

@viirya
Copy link
Contributor Author

viirya commented Jul 26, 2021

@houqp @Dandandan Thanks for the review. Is this good to go?

@houqp houqp merged commit a25fb71 into delta-io:main Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants