Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33984: [C++][Python] DLPack implementation for Arrow Arrays (producer) #38472

Merged
merged 73 commits into from
Dec 19, 2023

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Oct 26, 2023

Rationale for this change

DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only).

What changes are included in this PR?

  • ExportArray and ExportDevice methods on Arrow C++ Arrays
  • __dlpack__ method on the base PyArrow Array class exposing ExportArray method
  • __dlpack_device__ method on the base PyArrow Array class exposing ExportDevice method

Are these changes tested?

Yes, tests are added to dlpack_test.cc and test_array.py.

Are there any user-facing changes?

No.

@AlenkaF
Copy link
Member Author

AlenkaF commented Oct 26, 2023

cc @rok: this is an initial draft for the dlpack support. Will still need to create a separate method for fixed shape tensor array plus add more tests (inspecting the produced PyCapsule with cffi).

python/pyarrow/_dlpack.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 26, 2023
python/pyarrow/_dlpack.pxi Outdated Show resolved Hide resolved
python/pyarrow/_dlpack.pxi Outdated Show resolved Hide resolved
python/pyarrow/_dlpack.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 27, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 8, 2023
cpp/src/arrow/dlpack.cc Outdated Show resolved Hide resolved
cpp/src/arrow/dlpack.h Outdated Show resolved Hide resolved
cpp/src/arrow/dlpack.h Outdated Show resolved Hide resolved
cpp/src/arrow/dlpack.cc Outdated Show resolved Hide resolved
cpp/src/arrow/dlpack.cc Outdated Show resolved Hide resolved
python/pyarrow/_dlpack.pxi Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/includes/libarrow.pxd Outdated Show resolved Hide resolved
python/pyarrow/tests/test_array.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_array.py Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

TODO: C++ tests, tests with arrays with offsets

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting changes Awaiting changes labels Nov 22, 2023

DLDevice ctx;
ctx.device_id = 0;
ctx.device_type = DLDeviceType::kDLCPU;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone hands an array backed by CudaBuffer buffers?

I think we need to validate these are CPU buffers, or better yet support the cases where we have non-CPU buffers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense yes. We planned to support only CPU device at first. But I should not hardcode it like I did but validate they they are in fact CPU buffers and raise if not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for CPU device: 6c886fd
It would be easy to add support for other cases in a follow-up I think. Not sure about the tests though =)

cpp/src/arrow/dlpack.cc Outdated Show resolved Hide resolved
cpp/src/arrow/dlpack.h Outdated Show resolved Hide resolved
cpp/src/arrow/dlpack.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Nov 28, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Dec 12, 2023

@pitrou thank you so much for the reviews! I addressed all your comments so far.

// Create ManagerCtx with the reference to
// the data of the array
std::shared_ptr<ArrayData> array_ref = arr->data();
std::unique_ptr<ManagerCtx> ctx(new ManagerCtx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think it's best practice to use std::make_unique instead of wrapping new call. Any reason we can't use make_unique here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change to make_unique. @pitrou any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have the explicit delete as well, I personally like seeing the new+delete combo explicitly, even though make_unique is exactly equivalent AFAIU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless that delete call can then be replaced with a unique_ptr reset() ?

cpp/src/arrow/c/dlpack.cc Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 13, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Dec 13, 2023

@kkraus14 I am double-checking if I am correct in assuming DLpack only supports byte-packed booleans. Or are bit-packed booleans also supported but not really used?

I am not sure what came out of the discussion in dmlc/dlpack#75 and I am finding the docs unclear:
example in the docs suggest underlying storage size of bool is 8 bits, but there is no specific info about bool1.

Will be happy to hear your thoughts on this.

@AlenkaF
Copy link
Member Author

AlenkaF commented Dec 18, 2023

@pitrou would you have some time to run through this PR again?

@AlenkaF AlenkaF changed the title GH-33984: [Python] __dlpack__ implementation (producer) GH-33984: [C++][Python] DLPack implementation for Arrow Arrays (producer) Dec 19, 2023
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after a couple minor cleanups. Thank you @AlenkaF !

@pitrou
Copy link
Member

pitrou commented Dec 19, 2023

@github-actions crossbow submit -g cpp

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 19, 2023
@pitrou
Copy link
Member

pitrou commented Dec 19, 2023

@github-actions crossbow submit cuda

Copy link

Revision: 49a978f

Submitted crossbow builds: ursacomputing/crossbow @ actions-bbd691007d

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp GitHub Actions
test-debian-11-cpp-amd64 GitHub Actions
test-debian-11-cpp-i386 GitHub Actions
test-fedora-38-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions

Copy link

Revision: 49a978f

Submitted crossbow builds: ursacomputing/crossbow @ actions-b8525f4884

Task Status
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cuda-py3 Azure
conda-win-x64-cuda-py3 Azure
test-cuda-cpp GitHub Actions
test-cuda-python GitHub Actions

@pitrou pitrou merged commit 6c326db into apache:main Dec 19, 2023
33 of 36 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Dec 19, 2023
@AlenkaF AlenkaF deleted the gh-33984-dlpack-producer branch December 19, 2023 18:59
@AlenkaF
Copy link
Member Author

AlenkaF commented Dec 19, 2023

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6c326db.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them.

clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…(producer) (apache#38472)

### Rationale for this change

DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only). 

### What changes are included in this PR?

- `ExportArray` and `ExportDevice` methods on Arrow C++ Arrays
- `__dlpack__` method on the base PyArrow Array class exposing `ExportArray` method
-  `__dlpack_device__` method on the base PyArrow Array class exposing `ExportDevice` method

### Are these changes tested?

Yes, tests are added to `dlpack_test.cc` and `test_array.py`.

### Are there any user-facing changes?

No.

* Closes: apache#33984

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…(producer) (apache#38472)

### Rationale for this change

DLPack is selected for Array API protocol so it is important to have it implemented for Arrow/PyArrow Arrays also. This is possible for primitive type arrays (int, uint and float) with no validity buffer. Device support is not in scope of this PR (CPU only). 

### What changes are included in this PR?

- `ExportArray` and `ExportDevice` methods on Arrow C++ Arrays
- `__dlpack__` method on the base PyArrow Array class exposing `ExportArray` method
-  `__dlpack_device__` method on the base PyArrow Array class exposing `ExportDevice` method

### Are these changes tested?

Yes, tests are added to `dlpack_test.cc` and `test_array.py`.

### Are there any user-facing changes?

No.

* Closes: apache#33984

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Python] Implementation of the dlpack
6 participants