Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding is_distinct_from kernels #960

Closed
alamb opened this issue Nov 20, 2021 · 3 comments · Fixed by #4716
Closed

Consider adding is_distinct_from kernels #960

alamb opened this issue Nov 20, 2021 · 3 comments · Fixed by #4716
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Nov 20, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Several databases have is distinct from and is not distinct from operators in addition to = and !=.

We have added this function in DataFusion -- see apache/datafusion#1117 from @Dandandan and apache/datafusion#1163

The is distinct from operator differs in how nulls are handled

From the Postgres manual
https://www.postgresql.org/docs/14/functions-comparison.html

datatype IS DISTINCT FROM datatype → boolean
Not equal, treating null as a comparable value.
1 IS DISTINCT FROM NULL → t (rather than NULL)
NULL IS DISTINCT FROM NULL → f (rather than NULL)
--
datatype IS NOT DISTINCT FROM datatype → boolean
Equal, treating null as a comparable value.
1 IS NOT DISTINCT FROM NULL → f (rather than NULL)
NULL IS NOT DISTINCT FROM NULL → t (rather than NULL)

Describe the solution you'd like
We propose bringing the implementations from DataFusion into the arrow-rs crate

This would look like implementing kernels is_distinct_from, is_distinct_from_scalar, is_not_distinct_from, and is_not_distinct_from_scalar

Ideally starting from the implementatons in apache/datafusion#1117 and apache/datafusion#1163 and modifying them to follow the pattern demonstrated in @Dandandan 's pr for eq_bool #844 -- namely doing the comparisons in 64-bit chunks when possible.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@alamb alamb added enhancement Any new improvement worthy of a entry in the changelog arrow Changes to the arrow crate labels Nov 20, 2021
@liukun4515
Copy link
Contributor

I can try it

@comphead
Copy link
Contributor

comphead commented Dec 9, 2022

Hi @tustvold would you like me to add and test the code from apache/datafusion#4560 (comment)

@tustvold
Copy link
Contributor

tustvold commented Dec 9, 2022

If you could that would be awesome, one detail I omitted there is consistently handling floats using total ordering. I suspect this will involve comparing primitives using ArrowNativeTypeOp instead of PartialEq.

I have some half-baked ideas on how to do this, so if you just want to add and test the PartialEq based impl, that would also be fine 😀

@tustvold tustvold assigned tustvold and unassigned liukun4515 Aug 18, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 18, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 18, 2023
tustvold added a commit that referenced this issue Aug 21, 2023
* Add distinct kernels (#960) (#4438)

* Fixes

* Add tests

* Handle NullArray

* Fix comparisons between scalar and empty array

* Clippy

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants