-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33984: [C++][Python] DLPack implementation for Arrow Arrays (producer) #38472
Conversation
cc @rok: this is an initial draft for the dlpack support. Will still need to create a separate method for fixed shape tensor array plus add more tests (inspecting the produced PyCapsule with cffi). |
TODO: C++ tests, tests with arrays with offsets |
cpp/src/arrow/dlpack.cc
Outdated
|
||
DLDevice ctx; | ||
ctx.device_id = 0; | ||
ctx.device_type = DLDeviceType::kDLCPU; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if someone hands an array backed by CudaBuffer
buffers?
I think we need to validate these are CPU buffers, or better yet support the cases where we have non-CPU buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense yes. We planned to support only CPU device at first. But I should not hardcode it like I did but validate they they are in fact CPU buffers and raise if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check for CPU device: 6c886fd
It would be easy to add support for other cases in a follow-up I think. Not sure about the tests though =)
@pitrou thank you so much for the reviews! I addressed all your comments so far. |
// Create ManagerCtx with the reference to | ||
// the data of the array | ||
std::shared_ptr<ArrayData> array_ref = arr->data(); | ||
std::unique_ptr<ManagerCtx> ctx(new ManagerCtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I think it's best practice to use std::make_unique
instead of wrapping new
call. Any reason we can't use make_unique
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change to make_unique
. @pitrou any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have the explicit delete
as well, I personally like seeing the new
+delete
combo explicitly, even though make_unique
is exactly equivalent AFAIU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless that delete
call can then be replaced with a unique_ptr reset()
?
@kkraus14 I am double-checking if I am correct in assuming DLpack only supports byte-packed booleans. Or are bit-packed booleans also supported but not really used? I am not sure what came out of the discussion in dmlc/dlpack#75 and I am finding the docs unclear: Will be happy to hear your thoughts on this. |
@pitrou would you have some time to run through this PR again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after a couple minor cleanups. Thank you @AlenkaF !
@github-actions crossbow submit -g cpp |
@github-actions crossbow submit cuda |
Revision: 49a978f Submitted crossbow builds: ursacomputing/crossbow @ actions-bbd691007d |
Revision: 49a978f Submitted crossbow builds: ursacomputing/crossbow @ actions-b8525f4884
|
Thank you @pitrou for all the suggestions and help! Listing possible follow-up issues: |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6c326db. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…(producer) (apache#38472) ### Rationale for this change DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only). ### What changes are included in this PR? - `ExportArray` and `ExportDevice` methods on Arrow C++ Arrays - `__dlpack__` method on the base PyArrow Array class exposing `ExportArray` method - `__dlpack_device__` method on the base PyArrow Array class exposing `ExportDevice` method ### Are these changes tested? Yes, tests are added to `dlpack_test.cc` and `test_array.py`. ### Are there any user-facing changes? No. * Closes: apache#33984 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…(producer) (apache#38472) ### Rationale for this change DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only). ### What changes are included in this PR? - `ExportArray` and `ExportDevice` methods on Arrow C++ Arrays - `__dlpack__` method on the base PyArrow Array class exposing `ExportArray` method - `__dlpack_device__` method on the base PyArrow Array class exposing `ExportDevice` method ### Are these changes tested? Yes, tests are added to `dlpack_test.cc` and `test_array.py`. ### Are there any user-facing changes? No. * Closes: apache#33984 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only).
What changes are included in this PR?
ExportArray
andExportDevice
methods on Arrow C++ Arrays__dlpack__
method on the base PyArrow Array class exposingExportArray
method__dlpack_device__
method on the base PyArrow Array class exposingExportDevice
methodAre these changes tested?
Yes, tests are added to
dlpack_test.cc
andtest_array.py
.Are there any user-facing changes?
No.