-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Add ExtensionType implementation for 8-bit boolean values #17682
Comments
Wes McKinney / @wesm: |
Philipp Moritz / @pcmoritz: |
Wes McKinney / @wesm: |
Antoine Pitrou / @pitrou: |
Weston Pace / @westonpace: |
### Rationale for this change Closes: #17682 Arrow Boolean arrays store values as individual bits, which is a very compact representation but does not match the layout of many systems with which it interoperates. By adding an 8-bit Boolean extension type, zero-copy compatibility with many systems can be improved at the cost of large physical representation. Go implementation: #43323 C++ / Python implementation: #43488 ### What changes are included in this PR? Proposal and documentation for `Bool8` canonical extension type. ### Are these changes tested? N/A ### Are there any user-facing changes? N/A * GitHub Issue: #17682 Lead-authored-by: Joel Lubinitsky <joellubi@gmail.com> Co-authored-by: Joel Lubinitsky <33523178+joellubi@users.noreply.github.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Joel Lubinitsky <joellubi@gmail.com>
### Rationale for this change Go implementation of #43234 ### What changes are included in this PR? - Go implementation of the `Bool8` extension type - Minor refactor of existing extension builder interfaces ### Are these changes tested? Yes, unit tests and basic read/write benchmarks are included. ### Are there any user-facing changes? - A new extension type is added - Custom extension builders no longer need another builder created and released separately. * GitHub Issue: #17682 Authored-by: Joel Lubinitsky <joellubi@gmail.com> Signed-off-by: Joel Lubinitsky <joellubi@gmail.com>
### Rationale for this change C++ and Python implementations of #43234 ### What changes are included in this PR? - Implement C++ `Bool8Type`, `Bool8Array`, `Bool8Scalar`, and tests - Implement Python bindings to C++, as well as zero-copy numpy conversion methods - TODO: docs waiting for rebase on #43458 ### Are these changes tested? Yes ### Are there any user-facing changes? Bool8 extension type will be available in C++ and Python libraries * GitHub Issue: #17682 Authored-by: Joel Lubinitsky <joellubi@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Some libraries (e.g. NumPy) represent boolean values using an array of int8 or uint8 values of 1's and 0's. This can present a challenge at times to receive such memory without copying.
Now that we have ExtensionType capabilities, we could define an extension type distinguish UInt8/Int8-annotated-as-boolean to be able to flow through such data in applications.
A discussion about introducing a new logical type didn't go anywhere, so having a custom container that can be used for these specialized applications is one way to unblock the use case. If we develop some endogenous use of such data in C++, we would need to be mindful to sanitize it to bitpacked boolean before sending to another Arrow application
Reporter: Wes McKinney / @wesm
PRs and other links:
Note: This issue was originally created as ARROW-1674. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: