-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] StringViewArray.from_buffers does not seem to work as expected #44651
Comments
ok, the problem seems to be if there is any null because this works:
even the original works if I cheat and tell from_buffers that the size is one less than expected (avoid the null at the end):
This seems like a bug :) |
I remember that I had this on my (mental) to do list when initially implementing support for StringView in Python (#39652), but it seems I never got around opening an issue or PR for it. I think we have to either override BTW, if you just remove that check for a moment, does your initial example then work? |
Ah, I added a bullet point mentioning it in the TODO list of #39633 ... |
Yes, if I remove the check: diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index eaedbf1e3..4ddbbea32 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -1174,10 +1174,10 @@ cdef class Array(_PandasConvertible):
"({0}) did not match the passed number "
"({1}).".format(type.num_fields, len(children)))
- if type.num_buffers != len(buffers):
- raise ValueError("Type's expected number of buffers "
- "({0}) did not match the passed number "
- "({1}).".format(type.num_buffers, len(buffers)))
+ #if type.num_buffers != len(buffers):
+ # raise ValueError("Type's expected number of buffers "
+ # "({0}) did not match the passed number "
+ # "({1}).".format(type.num_buffers, len(buffers)))
for buf in buffers:
# None will produce a null buffer pointer then it works as expected:
|
…on (#44701) ### Rationale for this change Currently `from_buffers` is not working with StringView on Python because we validate against num_buffers. This only take into account the mandatory buffers but does not take into account the variadic_spec that can be present for both string_view and binary_view ### What changes are included in this PR? Take into account whether the type contains a variadic_spec for the non-mandatory buffers and only check lower_bound number of buffers. ### Are these changes tested? Yes, I've added a couple of tests. ### Are there any user-facing changes? We are exposing a new method on the Python DataType. `has_variadic_buffers` which tells us whether the number of buffers expected is only lower-bounded by num_buffers. * GitHub Issue: #44651 Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
Issue resolved by pull request 44701 |
Describe the usage question you have. Please include as many useful details as possible.
I am trying to create a pyarrow string view array using from_buffers but it does not seem to be correctly supported, or I might not know how to use it:
I tried this basic snippet:
If I try removing the null bitmap buffers to only pass the views buffer + the data buffer it complains on the buffer size:
Of course, if the buffer is wrong (just change the order to avoid the complaints about size) it fails but we probably shouldn't get a segmentation fault:
I am unsure if this is just me not knowing how to use
from_buffers
or if there is an issue with string view.Component(s)
Python
The text was updated successfully, but these errors were encountered: