Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved docstrings #114

Merged
merged 1 commit into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
375 changes: 310 additions & 65 deletions arro3-core/python/arro3/core/_core.pyi

Large diffs are not rendered by default.

52 changes: 49 additions & 3 deletions arro3-core/python/arro3/core/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,66 @@


class ArrowSchemaExportable(Protocol):
"""A C-level reference to an Arrow Schema or Field."""
"""
An object with an `__arrow_c_schema__` method.

Supported objects include:

- arro3 `Schema`, `Field`, or `DataType` objects.
- pyarrow `Schema`, `Field`, or `DataType` objects.

Such an object implements the [Arrow C Data Interface
interface](https://arrow.apache.org/docs/format/CDataInterface.html) via the
[Arrow PyCapsule
Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
This allows for zero-copy Arrow data interchange across libraries.
"""

def __arrow_c_schema__(self) -> object: ...


class ArrowArrayExportable(Protocol):
"""A C-level reference to an Arrow Array or RecordBatch."""
"""
An object with an `__arrow_c_array__` method.

Supported objects include:

- arro3 `Array` or `RecordBatch` objects.
- pyarrow `Array` or `RecordBatch` objects

Such an object implements the [Arrow C Data Interface
interface](https://arrow.apache.org/docs/format/CDataInterface.html) via the
[Arrow PyCapsule
Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
This allows for zero-copy Arrow data interchange across libraries.
"""

def __arrow_c_array__(
self, requested_schema: object | None = None
) -> Tuple[object, object]: ...


class ArrowStreamExportable(Protocol):
"""A C-level reference to an Arrow RecordBatchReader, Table, or ChunkedArray."""
"""
An object with an `__arrow_c_stream__` method.

Supported objects include:

- arro3 `Table`, `RecordBatchReader`, `ChunkedArray`, or `ArrayReader` objects.
- Polars `Series` or `DataFrame` objects (polars v1.2 or higher)
- pyarrow `RecordBatchReader`, `Table`, or `ChunkedArray` objects (pyarrow v14 or
higher)
- pandas `DataFrame`s (pandas v2.2 or higher)
- ibis `Table` objects.

For an up to date list of supported objects, see [this
issue](https://github.com/apache/arrow/issues/39195#issuecomment-2245718008).

Such an object implements the [Arrow C Stream
interface](https://arrow.apache.org/docs/format/CStreamInterface.html) via the
[Arrow PyCapsule
Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
This allows for zero-copy Arrow data interchange across libraries.
"""

def __arrow_c_stream__(self, requested_schema: object | None = None) -> object: ...
8 changes: 8 additions & 0 deletions docs/api/core/array-reader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# ArrayReader

::: arro3.core.ArrayReader
options:
filters:
- "!^_"
- "^__arrow"
members:
2 changes: 1 addition & 1 deletion docs/api/core/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
filters:
- "!^_"
- "^__arrow"
members:
show_if_no_docstring: true
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,12 @@ nav:
- API Reference:
- arro3.core:
- api/core/array.md
- api/core/array-reader.md
- api/core/chunked-array.md
- api/core/datatype.md
- api/core/field.md
- api/core/record-batch-reader.md
- api/core/record-batch.md
- api/core/record-batch-reader.md
- api/core/schema.md
- api/core/table.md
- api/core/types.md
Expand Down
13 changes: 0 additions & 13 deletions pyo3-arrow/src/array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@ use crate::interop::numpy::from_numpy::from_numpy;
use crate::interop::numpy::to_numpy::to_numpy;
use crate::{PyDataType, PyField};

/// A Python-facing Arrow array.
///
/// This is a wrapper around an [ArrayRef] and a [FieldRef].
#[pyclass(module = "arro3.core._core", name = "Array", subclass)]
pub struct PyArray {
array: ArrayRef,
Expand Down Expand Up @@ -195,8 +192,6 @@ impl PyArray {
Ok(Self::new(array, Field::new("", data_type, true).into()))
}

/// An implementation of the Array interface, for interoperability with numpy and other
/// array libraries.
#[pyo3(signature = (dtype=None, copy=None))]
#[allow(unused_variables)]
pub fn __array__(
Expand All @@ -208,13 +203,6 @@ impl PyArray {
to_numpy(py, &self.array)
}

/// An implementation of the [Arrow PyCapsule
/// Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
/// This dunder method should not be called directly, but enables zero-copy
/// data transfer to other Python libraries that understand Arrow memory.
///
/// For example, you can call [`pyarrow.array()`][pyarrow.array] to convert this array
/// into a pyarrow array, without copying memory.
#[allow(unused_variables)]
pub fn __arrow_c_array__<'py>(
&'py self,
Expand Down Expand Up @@ -305,7 +293,6 @@ impl PyArray {
Ok(PyArray::new(new_array, self.field.clone()).to_arro3(py)?)
}

/// Copy this array to a `numpy` NDArray
pub fn to_numpy(&self, py: Python) -> PyResult<PyObject> {
self.__array__(py, None, None)
}
Expand Down
17 changes: 0 additions & 17 deletions pyo3-arrow/src/array_reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@ use crate::ffi::{ArrayIterator, ArrayReader};
use crate::input::AnyArray;
use crate::{PyArray, PyChunkedArray, PyField};

/// A Python-facing Arrow array reader.
///
/// This is a wrapper around a [ArrayReader].
#[pyclass(module = "arro3.core._core", name = "ArrayReader", subclass)]
pub struct PyArrayReader(pub(crate) Option<Box<dyn ArrayReader + Send>>);

Expand Down Expand Up @@ -102,13 +99,6 @@ impl Display for PyArrayReader {

#[pymethods]
impl PyArrayReader {
/// An implementation of the [Arrow PyCapsule
/// Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
/// This dunder method should not be called directly, but enables zero-copy
/// data transfer to other Python libraries that understand Arrow memory.
///
/// For example, you can call [`pyarrow.table()`][pyarrow.table] to convert this array
/// into a pyarrow table, without copying memory.
#[allow(unused_variables)]
pub fn __arrow_c_stream__<'py>(
&'py mut self,
Expand Down Expand Up @@ -136,23 +126,17 @@ impl PyArrayReader {
self.to_string()
}

/// Returns `true` if this reader has already been consumed.
#[getter]
pub fn closed(&self) -> bool {
self.0.is_none()
}

/// Construct this from an existing Arrow object.
///
/// It can be called on anything that exports the Arrow stream interface
/// (`__arrow_c_stream__`), such as a `Table` or `ArrayReader`.
#[classmethod]
pub fn from_arrow(_cls: &Bound<PyType>, input: AnyArray) -> PyArrowResult<Self> {
let reader = input.into_reader()?;
Ok(Self::new(reader))
}

/// Construct this object from a bare Arrow PyCapsule.
#[classmethod]
pub fn from_arrow_pycapsule(
_cls: &Bound<PyType>,
Expand Down Expand Up @@ -184,7 +168,6 @@ impl PyArrayReader {
data.extract()
}

/// Access the field of this reader
#[getter]
pub fn field(&self, py: Python) -> PyResult<PyObject> {
PyField::new(self.field_ref()?).to_arro3(py)
Expand Down
19 changes: 0 additions & 19 deletions pyo3-arrow/src/chunked.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ use crate::input::AnyArray;
use crate::interop::numpy::to_numpy::chunked_to_numpy;
use crate::{PyArray, PyDataType, PyField};

/// A Python-facing Arrow chunked array.
///
/// This is a wrapper around a [FieldRef] and a `Vec` of [ArrayRef].
#[pyclass(module = "arro3.core._core", name = "ChunkedArray", subclass)]
pub struct PyChunkedArray {
chunks: Vec<ArrayRef>,
Expand Down Expand Up @@ -241,8 +238,6 @@ impl PyChunkedArray {
))
}

/// An implementation of the Array interface, for interoperability with numpy and other
/// array libraries.
#[pyo3(signature = (dtype=None, copy=None))]
#[allow(unused_variables)]
pub fn __array__(
Expand All @@ -259,14 +254,6 @@ impl PyChunkedArray {
chunked_to_numpy(py, chunk_refs.as_slice())
}

/// An implementation of the [Arrow PyCapsule
/// Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
/// This dunder method should not be called directly, but enables zero-copy
/// data transfer to other Python libraries that understand Arrow memory.
///
/// For example (as of the upcoming pyarrow v16), you can call
/// [`pyarrow.chunked_array()`][pyarrow.chunked_array] to convert this array into a
/// pyarrow array, without copying memory.
#[allow(unused_variables)]
pub fn __arrow_c_stream__<'py>(
&'py self,
Expand All @@ -292,16 +279,11 @@ impl PyChunkedArray {
self.to_string()
}

/// Construct this from an existing Arrow object.
///
/// It can be called on anything that exports the Arrow stream interface
/// (`__arrow_c_stream__`). All batches will be materialized in memory.
#[classmethod]
pub fn from_arrow(_cls: &Bound<PyType>, input: AnyArray) -> PyArrowResult<Self> {
input.into_chunked_array()
}

/// Construct this object from a bare Arrow PyCapsule
#[classmethod]
pub fn from_arrow_pycapsule(
_cls: &Bound<PyType>,
Expand Down Expand Up @@ -400,7 +382,6 @@ impl PyChunkedArray {
Ok(PyChunkedArray::new(sliced_chunks, self.field.clone()).to_arro3(py)?)
}

/// Copy this array to a `numpy` NDArray
pub fn to_numpy(&self, py: Python) -> PyResult<PyObject> {
self.__array__(py, None, None)
}
Expand Down
12 changes: 0 additions & 12 deletions pyo3-arrow/src/datatypes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,6 @@ impl Display for PyDataType {

#[pymethods]
impl PyDataType {
/// An implementation of the [Arrow PyCapsule
/// Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
/// This dunder method should not be called directly, but enables zero-copy
/// data transfer to other Python libraries that understand Arrow memory.
///
/// For example, you can call [`pyarrow.field()`][pyarrow.field] to convert this array
/// into a pyarrow field, without copying memory.
pub fn __arrow_c_schema__<'py>(
&'py self,
py: Python<'py>,
Expand All @@ -126,16 +119,11 @@ impl PyDataType {
self.to_string()
}

/// Construct this from an existing Arrow object.
///
/// It can be called on anything that exports the Arrow schema interface
/// (`__arrow_c_schema__`).
#[classmethod]
pub fn from_arrow(_cls: &Bound<PyType>, input: Self) -> Self {
input
}

/// Construct this object from a bare Arrow PyCapsule
#[classmethod]
pub fn from_arrow_pycapsule(
_cls: &Bound<PyType>,
Expand Down
23 changes: 0 additions & 23 deletions pyo3-arrow/src/field.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@ use crate::ffi::to_python::to_schema_pycapsule;
use crate::input::MetadataInput;
use crate::PyDataType;

/// A Python-facing Arrow field.
///
/// This is a wrapper around a [FieldRef].
#[pyclass(module = "arro3.core._core", name = "Field", subclass)]
pub struct PyField(FieldRef);

Expand Down Expand Up @@ -104,13 +101,6 @@ impl PyField {
Ok(PyField::new(field.into()))
}

/// An implementation of the [Arrow PyCapsule
/// Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
/// This dunder method should not be called directly, but enables zero-copy
/// data transfer to other Python libraries that understand Arrow memory.
///
/// For example, you can call [`pyarrow.field()`][pyarrow.field] to convert this array
/// into a pyarrow field, without copying memory.
pub fn __arrow_c_schema__<'py>(
&'py self,
py: Python<'py>,
Expand All @@ -126,16 +116,11 @@ impl PyField {
self.to_string()
}

/// Construct this from an existing Arrow object.
///
/// It can be called on anything that exports the Arrow schema interface
/// (`__arrow_c_schema__`).
#[classmethod]
pub fn from_arrow(_cls: &Bound<PyType>, input: Self) -> Self {
input
}

/// Construct this object from a bare Arrow PyCapsule
#[classmethod]
pub fn from_arrow_pycapsule(
_cls: &Bound<PyType>,
Expand All @@ -147,13 +132,10 @@ impl PyField {
Ok(Self::new(Arc::new(field)))
}

/// Test if this field is equal to the other
// TODO: add option to check field metadata
pub fn equals(&self, other: PyField) -> bool {
self.0 == other.0
}

/// The schema's metadata.
// Note: we can't return HashMap<Vec<u8>, Vec<u8>> because that will coerce keys and values to
// a list, not bytes
#[getter]
Expand All @@ -168,25 +150,21 @@ impl PyField {
Ok(d)
}

/// The schema's metadata where keys and values are `str`, not `bytes`.
#[getter]
pub fn metadata_str(&self) -> HashMap<String, String> {
self.0.metadata().clone()
}

/// The field name.
#[getter]
pub fn name(&self) -> String {
self.0.name().clone()
}

/// The field nullability.
#[getter]
pub fn nullable(&self) -> bool {
self.0.is_nullable()
}

/// Create new field without metadata, if any
pub fn remove_metadata(&self, py: Python) -> PyResult<PyObject> {
PyField::new(
self.0
Expand All @@ -198,7 +176,6 @@ impl PyField {
.to_arro3(py)
}

/// Create new field without metadata, if any
#[getter]
pub fn r#type(&self, py: Python) -> PyResult<PyObject> {
PyDataType::new(self.0.data_type().clone()).to_arro3(py)
Expand Down
Loading