Skip to content

Commit

Permalink
pyo3-arrow docs edits (#123)
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebarron authored Aug 13, 2024
1 parent f69a65b commit 95d9952
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion pyo3-arrow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ You must depend on the `arro3-core` Python package; then you can use the `to_arr
| `PyField` | `arro3.core.Field` |
| `PySchema` | `arro3.core.Schema` |
| `PyArray` | `arro3.core.Array` |
| `PyArrayReader` | `arro3.core.ArrayReader` |
| `PyRecordBatch` | `arro3.core.RecordBatch` |
| `PyChunkedArray` | `arro3.core.ChunkedArray` |
| `PyTable` | `arro3.core.Table` |
Expand All @@ -149,6 +150,8 @@ In this case, you must depend on `pyarrow` and you can use the `to_pyarrow` meth
| `PyTable` | `pyarrow.Table` |
| `PyRecordBatchReader` | `pyarrow.RecordBatchReader` |

`pyarrow` does not have the equivalent of a `PyArrayReader`, but if the materialized data fits in memory, you can convert a `PyArrayReader` to a `PyChunkedArray` and pass that to `pyarrow`.

#### Using `nanoarrow`

[`nanoarrow`](https://arrow.apache.org/nanoarrow/latest/index.html) is an alternative Python library for working with Arrow data. It's similar in goals to arro3, but is written in C instead of Rust. Additionally, it has a smaller type system than `pyarrow` or `arro3`, with logical arrays and record batches both represented by the `nanoarrow.Array` class.
Expand All @@ -161,10 +164,18 @@ In this case, you must depend on `nanoarrow` and you can use the `to_nanoarrow`
| `PySchema` | `nanoarrow.Schema` |
| `PyArray` | `nanoarrow.Array` |
| `PyRecordBatch` | `nanoarrow.Array` |
| `PyArrayReader` | `nanoarrow.ArrayStream` |
| `PyChunkedArray` | `nanoarrow.ArrayStream` |
| `PyTable` | `nanoarrow.ArrayStream` |
| `PyRecordBatchReader` | `nanoarrow.ArrayStream` |

## Version compatibility

| pyo3-arrow | pyo3 | arrow-rs |
| ---------- | ---- | -------- |
| 0.1 | 0.21 | 52 |
| 0.2 | 0.21 | 52 |

## Why not use arrow-rs's Python integration?

arrow-rs has [some existing Python integration](https://docs.rs/arrow/latest/arrow/pyarrow/index.html), but there are a few reasons why I created `pyo3-arrow`:
Expand All @@ -173,4 +184,3 @@ arrow-rs has [some existing Python integration](https://docs.rs/arrow/latest/arr
- arrow-rs's Python FFI integration does not support Arrow extension types, because it omits field metadata when constructing an `Arc<dyn Array>`. pyo3-arrow gets around this by storing both an `ArrayRef` (`Arc<dyn Array>`) and a `FieldRef` (`Arc<Field>`) in a `PyArray` struct.
- arrow-rs has no ability to work with an Arrow stream of bare arrays that are not record batches, and so it has no way to interop with a `pyarrow.ChunkedArray` or `polars.Series`.
- In my opinion arrow-rs is too tightly connected to pyo3 and pyarrow. pyo3 releases don't line up with arrow-rs's release cadence, which means it could be a bit of a wait to use the latest pyo3 version with arrow-rs, especially with arrow-rs [waiting longer to release breaking changes](https://github.com/apache/arrow-rs#release-versioning-and-schedule).

0 comments on commit 95d9952

Please sign in to comment.