Skip to content

Commit

Permalink
Bump pyo3-arrow to 0.2 (#122)
Browse files Browse the repository at this point in the history
* Bump pyo3-arrow to 0.2

* Update pyo3-arrow docs
  • Loading branch information
kylebarron authored Aug 13, 2024
1 parent 7a34019 commit f69a65b
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 8 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions pyo3-arrow/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Changelog

## [0.2.0] - 2024-08-12

### Enhancements :magic_wand:

- New `ArrayReader`. It parallels `RecordBatchReader` but is more general, supporting arbitrary Arrow arrays that do not have to represent a record batch.
- New `AnyArray` enum that supports either `Array` or `ArrayReader` input.
- Improved documentation.

### Fixes :bug:

- Validate Schema/Field when constructing new Array/ChunkedArray/Table (#72)
- Convert `Table::new` to `Table::try_new` and ensure that all batches have the same schema. Similar for `Array::new` and `ChunkedArray::new`.
- Reorder args for `Table::new`

## [0.1.0] - 2024-06-27

- Initial release
2 changes: 1 addition & 1 deletion pyo3-arrow/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "pyo3-arrow"
version = "0.1.0"
version = "0.2.0"
authors = ["Kyle Barron <kylebarron2@gmail.com>"]
edition = "2021"
description = "Arrow integration for pyo3."
Expand Down
9 changes: 3 additions & 6 deletions pyo3-arrow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,11 +169,8 @@ In this case, you must depend on `nanoarrow` and you can use the `to_nanoarrow`

arrow-rs has [some existing Python integration](https://docs.rs/arrow/latest/arrow/pyarrow/index.html), but there are a few reasons why I created `pyo3-arrow`:

- arrow-rs only supports returning data to pyarrow. Pyarrow is a very large dependency (its unpacked Linux wheels are 130MB, not including a required dependency on Numpy) and some projects may wish not to use it. Now that the Arrow PyCapsule interface exists, it's possible to have a modular approach, where a very small library contains core Arrow objects, and works seamlessly with other libraries.
- arrow-rs's Python FFI integration does not support Arrow extension types, because it omits field metadata when constructing an `Arc<dyn Array>`. pyo3-arrow gets around this by storing both an `ArrayRef` (`Arc<dyn Array>`) and a `FieldRef` (`Arc<Field>`) in a `PyArray` struct.
- arrow-rs has no ability to work with an Arrow stream of bare arrays that are not record batches, and so it has no way to interop with a `pyarrow.ChunkedArray` or `polars.Series`.
- In my opinion arrow-rs is too tightly connected to pyo3 and pyarrow. pyo3 releases don't line up with arrow-rs's release cadence, which means it could be a bit of a wait to use the latest pyo3 version with arrow-rs, especially with arrow-rs [waiting longer to release breaking changes](https://github.com/apache/arrow-rs#release-versioning-and-schedule).
- arrow-rs only supports returning data as pyarrow classes. pyarrow is a very large dependency and some projects may wish not to use it. Now that the Arrow PyCapsule interface exists, it's possible to have a modular approach, where a very small library contains core Arrow objects, and works seamlessly with other libraries.
- arrow-rs's Python FFI integration does not support extension types, because it omits field metadata when constructing an `Arc<dyn Array>`. pyo3-arrow gets around this by storing both an `ArrayRef` (`Arc<dyn Array>`) and a `FieldRef` (`Arc<Field>`) in a `PyArray` struct.
- arrow-rs doesn't have a way to interface with `Table` and `ChunkedArray` constructs. It suggests to use a `RecordBatchReader` instead of a `Table`, but regardless arrow-rs has no ability to work with an Arrow stream of bare arrays that are not record batches.

## Scope

pyo3-arrow defines Rust wrappers for Arrow concepts that are ABI stable. This means that some Arrow concepts, like `DataType`, are not implemented. `DataType` is not a concept that can be shared across Arrow implementations. (It can be shared as part of a `Field` or `Schema`, but not on its own).

0 comments on commit f69a65b

Please sign in to comment.