Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support string concat || for StringViewArray #11766

Closed
Tracked by #11752
alamb opened this issue Aug 1, 2024 · 6 comments · Fixed by #12063
Closed
Tracked by #11752

Support string concat || for StringViewArray #11766

alamb opened this issue Aug 1, 2024 · 6 comments · Fixed by #12063
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 1, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

datafusion-cli

Then run

DataFusion CLI v40.0.0
> create table foo as values ('value1', arrow_cast('one', 'Utf8View')), ('value1', arrow_cast('two', 'Utf8View'));
0 row(s) fetched.
> select column2||'ff' from foo;
Internal error: Data type Utf8View not supported for binary operation 'concat_elements' on string arrays.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Describe the solution you'd like
I would like the query to work

Describe alternatives you've considered
We can probably get the query to work by fixing the coercion logic to coerce to Utf8, but that will be less efficient than implementing native StringView support for concat

The coercion is here:

StringConcat => {
string_concat_coercion(lhs, rhs).map(Signature::uniform).ok_or_else(|| {
plan_datafusion_err!(
"Cannot infer common string type for string concat operation {lhs} {op} {rhs}"
)
})
}

The physical expression implementation starts here

StringConcat => binary_string_array_op!(left, right, concat_elements),

Additional context

@alamb alamb added the enhancement New feature or request label Aug 1, 2024
@alamb alamb changed the title Support || for StringViewArray Support string concat || for StringViewArray Aug 1, 2024
@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2024

If anyone wants to work on this, I would suggest starting with a PR that has some tests and does coercsion (which will make the query run) and then we can implement optimziation later

@alamb alamb transferred this issue from apache/arrow-rs Aug 1, 2024
@dharanad
Copy link
Contributor

dharanad commented Aug 1, 2024

i'm really interested in working on this issue. Feeling little doubtful, but let me try. I'm might seek some help, if not i will unassign myself .

@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2024

Thanks @dharanad --- I would recommend starting with a test in string_view.slt that shows the problem

Then I would recommend trying to change the coercion logic so the arguments are coerced to Utf8 (not Utf8view) to get the query passing

Then in a second PR (or maybe even ticket) we can figure out how to implement the native opreration of concat (that is likely much tricker than coercing)

@dharanad
Copy link
Contributor

dharanad commented Aug 2, 2024

take

@dharanad
Copy link
Contributor

Linked issue : #11881

@alamb I was thinking the right way to solve the issue would to add support for StringViewArray in arrow::compute::kernels::concat_elements by into introducing a function named concat_elements_utf8view.

Do you have any alternative approaches in mind? If so, any pointers would be greatly appreciated.

@alamb
Copy link
Contributor Author

alamb commented Aug 14, 2024

@alamb I was thinking the right way to solve the issue would to add support for StringViewArray in arrow::compute::kernels::concat_elements by into introducing a function named concat_elements_utf8view.

Yes, I agree this would be ideal. I recommend calling it concate_elements_bytes_view to be consistent with

https://docs.rs/arrow/latest/arrow/compute/kernels/concat_elements/fn.concat_elements_bytes.html

I started looking into how to explain wiring this into binary.rs and it was a mess, so I just made a skeleton PR here:
#11995

Perhaps you can use that a template ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants