Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comparison kernels for BinaryArray #1108

Closed
tustvold opened this issue Dec 29, 2021 · 9 comments · Fixed by #1238
Closed

Add comparison kernels for BinaryArray #1108

tustvold opened this issue Dec 29, 2021 · 9 comments · Fixed by #1238
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently there is no support for comparing BinaryArray

Describe the solution you'd like

It should be possible to call arrow::compute::eq_dyn, and friends with BinaryArray as arguments and have them not return an error. It may be possible to share code with the existing utf8 implementation.

Additional context

Encountered whilst implementing #1053

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Dec 29, 2021
@alamb alamb added the good first issue Good for newcomers label Jan 13, 2022
@alamb
Copy link
Contributor

alamb commented Jan 13, 2022

This is a pretty good ticket for anyone who wants to mess with the compute kernels -- basically it should be add a small amount of code and a a test

@HaoYang670
Copy link
Contributor

Hi Raphael, I am newcomer and want to contribute to this project.
Is it reasonable to add a function called eq_dyn_binary_scalar and tests in the file comparison.rs?

@tustvold
Copy link
Contributor Author

tustvold commented Jan 17, 2022

Yes it should be largely just a case of adding _binary versions of the _utf8 kernels in that module.

Typically there are three variants of each operator:

  • One with fully qualified array types e.g. lt_eq_utf8
  • One with a fully quality array type and a scalar e.g. lt_eq_utf8_scalar
  • One with a dynamic array type and qualified scalar e.g. lt_eq_dyn_utf8_scalar

Finally there are then two dynamic dispatch functions for each operator

  • One that takes two dynamic array types - e.g. lt_eq_dyn (the dispatch logic is macro-ified)
  • One that takes a dynamic array and a scalar - e.g. lt_eq_dyn_scalar (this doesn't support non-primitive types AFAICT)

I'd perhaps recommend starting with the fully qualified kernels and build up from there with successive PRs. The scalar variants are also optimisations and so could definitely be ignored for an initial cut, they were only added in the last month 😀

@HaoYang670
Copy link
Contributor

Thank you, Raphael! I'd like to try if no one else has been doing it.

@HaoYang670
Copy link
Contributor

Following your advice, I will file 2 PRs to match this feature. The first PR will add support for fully qualified binary array. The second PR will support comparison for dynamic binary array.

@HaoYang670
Copy link
Contributor

Is there something like BinaryDictionaryBuilder in the code, which is needed when testing the eq_dyn_binary_scalar_with_dict?

@HaoYang670
Copy link
Contributor

HaoYang670 commented Jan 24, 2022

BTW, could you please assign this issue to me?
I have no right to self-assign.

@alamb
Copy link
Contributor

alamb commented Jan 24, 2022

Is there something like BinaryDictionaryBuilder in the code, which is needed when testing the eq_dyn_binary_scalar_with_dict?

I do not know of any such thing, sadly

@HaoYang670
Copy link
Contributor

I do not include DictionaryArray in my PR, because I do not find a straight forward way to build a binary dictionary. If it is needed, I will include it later in my PR.

@alamb alamb changed the title Add native comparison kernel support for BinaryArray Add comparison kernels for BinaryArray Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants