-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32104: [C++] Add support for Run-End encoded data to Arrow #33641
Conversation
|
6156a9f
to
1d4de2a
Compare
|
dd7fd09
to
f5d88b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first pass on basic facilities. I didn't look at kernels and REE utils yet.
@@ -1177,6 +1197,7 @@ constexpr bool is_nested(Type::type type_id) { | |||
case Type::STRUCT: | |||
case Type::SPARSE_UNION: | |||
case Type::DENSE_UNION: | |||
case Type::RUN_END_ENCODED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick glance, I think this is mostly used as 'physically nested' in the codebase, so this is OK. I don't think we have the right abstractions to differentiate between physical/logical and deal with encodings, e.g. the way we handle dictionaries is often to just decode them, and kernel implementations are very much often lost in the weeds of encoding details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second part of code review: it doesn't seem clear what the offset
and length
of a run-end encoded array mean. Once this is settled and applied consistently, I'll be able to do a third review pass.
|
7ff1ce7
to
fab46a5
Compare
@lidavidm @wjones127 @westonpace @zeroshade @bkietz Before this being fully ready for review, I want to:
but feel free to skim the code and give higher level architectural feedback before a more detailed review |
6396a80
to
2811015
Compare
Lead-authored-by: Tobias Zagorni <tobias@zagorni.eu> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Since jemalloc.h does # define malloc je_malloc when JEMALLOC_MANGLE is defined, we can get this error in CI during an unity build /arrow/cpp/src/arrow/vendored/ProducerConsumerQueue.h: In constructor 'arrow_vendored::folly::ProducerConsumerQueue<T>::ProducerConsumerQueue(uint32_t)': /arrow/cpp/src/arrow/vendored/ProducerConsumerQueue.h:82:39: error: 'je_arrow_malloc' is not a member of 'std'; did you mean 'je_arrow_malloc'? 82 | records_(static_cast<T*>(std::malloc(sizeof(T) * size))), | ^~~~~~ jemalloc_ep-prefix/src/jemalloc_ep/dist/include/jemalloc/jemalloc.h:254:32: note: 'je_arrow_malloc' declared here 254 | void JEMALLOC_SYS_NOTHROW *je_malloc(size_t size) | ^~~~~~~~~ /arrow/cpp/src/arrow/vendored/ProducerConsumerQueue.h: In destructor 'arrow_vendored::folly::ProducerConsumerQueue<T>::~ProducerConsumerQueue()': /arrow/cpp/src/arrow/vendored/ProducerConsumerQueue.h:106:10: error: 'je_arrow_free' is not a member of 'std'; did you mean 'je_arrow_free'? 106 | std::free(records_); | ^~~~ jemalloc_ep-prefix/src/jemalloc_ep/dist/include/jemalloc/jemalloc.h:269:43: note: 'je_arrow_free' declared here 269 | JEMALLOC_EXPORT void JEMALLOC_SYS_NOTHROW je_free(void *ptr) | ^~~~~~~
@zeroshade this is now passing all the builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@lidavidm should I rebase and force push so this CI issue goes away or this can be merged as is? |
I kicked the builds, let's see. |
…pache#33641) This PR gathers work from multiple PRs that can be closed after this one is merged: - Closes apache#13752 - Closes apache#13754 - Closes apache#13842 - Closes apache#13882 - Closes apache#13916 - Closes apache#14063 - Closes apache#13970 And the issues associated with those PRs can also be closed: - Fixes apache#20350 - Add RunEndEncodedScalarType - Fixes apache#32543 - Fixes apache#32544 - Fixes apache#32688 - Fixes apache#32731 - Fixes apache#32772 - Fixes apache#32774 * Closes: apache#32104 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Tobias Zagorni <tobias@zagorni.eu> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Benchmark runs are scheduled for baseline = 157b8f5 and contender = 1264e40. 1264e40 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…pache#33641) This PR gathers work from multiple PRs that can be closed after this one is merged: - Closes apache#13752 - Closes apache#13754 - Closes apache#13842 - Closes apache#13882 - Closes apache#13916 - Closes apache#14063 - Closes apache#13970 And the issues associated with those PRs can also be closed: - Fixes apache#20350 - Add RunEndEncodedScalarType - Fixes apache#32543 - Fixes apache#32544 - Fixes apache#32688 - Fixes apache#32731 - Fixes apache#32772 - Fixes apache#32774 * Closes: apache#32104 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Tobias Zagorni <tobias@zagorni.eu> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Related issue number -------------------- See also: - Homebrew/homebrew-core#129859 - apache/arrow#33608 - apache/arrow#33641 Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
This PR gathers work from multiple PRs that can be closed after this one is merged:
And the issues associated with those PRs can also be closed: