-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-206: Expose a C++ api to compare ranges of slots between two arrays #80
Conversation
@emkornfield Honestly, but uncertainly however, I've to say, for Struct case, the key point is how to determine the equality of Struct bitmap to the child fields' slots, i.e. whether these bitmaps are comparable. But I don't think any clues are exposed from your current API. Again, in terms of PR #66, if it's right to compare the slots in Struct bitmap with the corresponding slots in the child fields' bitmap. Or if it's not OK, then, what's the plan to implement the |
int32_t start_idx, int32_t end_idx, const std::shared_ptr<Array>& arr) const { | ||
if (this == arr.get()) { return true; } | ||
if (this->type_enum() != arr->type_enum()) { return false; } | ||
auto other = static_cast<ListArray*>(arr.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
This seems like a reasonable API to me. I don't think it's worth worrying about performance too much on these methods for the moment until we see applications where it matters. @fengguangyuan if the struct-level bitmap indicates that the slot is null, there is no need to inspect the child arrays. if the value is not null, then the child arrays' bitmaps and values can be inspected as needed. |
@fengguangyuan The plan for validate on structs should be that all children arrays are of the same length as the struct array, and that all children arrays are valid. No comparison of null bitmaps. For equality, (to reiterate Wes's pont), you only check equality for slots on the children that are valid in the parent struct. |
@@ -48,6 +48,14 @@ bool Array::EqualsExact(const Array& other) const { | |||
return true; | |||
} | |||
|
|||
bool Array::RangeEqualsExact(int32_t start_idx, int32_t end_idx, const Array& arr) const { | |||
if (this == &arr) { return true; } | |||
for (int i = start_idx; i < end_idx; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of the case of start_idx == end_idx or something else?
All else being equal it seems that an empty set should equal an empty. Do you disagree?
I had to modify the API to take the start index of the other array, to handle edge cases for list arrays. I added some basic sanity unit tests, but I could add more. I feel not-horrible about the coverage (I think this just end of weekend laziness though), so if you would like me to add more please let me know. |
Thanks for your answering. But currently, the check for equality is ignored, considering its potentially expensive costs. Personally, I trust your changes will make the API more flexible and stronger. Good work. :) |
Looks like a good checkpoint to me. One thought was that the +1, thank you |
Just rebased my current changes on this and realised a disadvantage of it: |
@xhochy can you give more details on where this caused you problems? |
I used to instantiate a |
How does this sound as a resolution (I can open up the JIRA once we agree)?
|
Sounds good! |
…tribution globally (apache#80) * Add an offset for seed to achieve genuine random distribution globally * Filter out rand in projection cache * Evaluate expr with literal input for getting seed value
…tribution globally (apache#80) * Add an offset for seed to achieve genuine random distribution globally * Filter out rand in projection cache * Evaluate expr with literal input for getting seed value
…tribution globally (apache#80) * Add an offset for seed to achieve genuine random distribution globally * Filter out rand in projection cache * Evaluate expr with literal input for getting seed value
…tribution globally (apache#80) * Add an offset for seed to achieve genuine random distribution globally * Filter out rand in projection cache * Evaluate expr with literal input for getting seed value
@wesm the need for this grew out of @fengguangyuan PR to add struct type (#66) and struct builder. I considered a different APIs before settling on this:
Let me know if you would prefer a different API.
WIP because I need to add more unit tests (I also need to think about if it is worth mirroring the EqualsExact in addition to the Equals method). Which I should get to by the end of the weekend.
@fengguangyuan let me know if this makes sense to you as a way forward on your PR