Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional documentation and examples to ArrayAccessor #6141

Merged
merged 1 commit into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 75 additions & 4 deletions arrow-array/src/array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -437,13 +437,84 @@ impl<'a, T: Array> Array for &'a T {

/// A generic trait for accessing the values of an [`Array`]
///
/// This trait helps write specialized implementations of algorithms for
/// different array types. Specialized implementations allow the compiler
/// to optimize the code for the specific array type, which can lead to
/// significant performance improvements.
///
/// # Example
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is adapted from apache/datafusion#11676

/// For example, to write three different implementations of a string length function
/// for [`StringArray`], [`LargeStringArray`], and [`StringViewArray`], you can write
///
/// ```
/// # use std::sync::Arc;
/// # use arrow_array::{ArrayAccessor, ArrayRef, ArrowPrimitiveType, OffsetSizeTrait, PrimitiveArray};
/// # use arrow_buffer::ArrowNativeType;
/// # use arrow_array::cast::AsArray;
/// # use arrow_array::iterator::ArrayIter;
/// # use arrow_array::types::{Int32Type, Int64Type};
/// # use arrow_schema::{ArrowError, DataType};
/// /// This function takes a dynamically typed `ArrayRef` and calls
/// /// calls one of three specialized implementations
/// fn character_length(arg: ArrayRef) -> Result<ArrayRef, ArrowError> {
/// match arg.data_type() {
/// DataType::Utf8 => {
/// // downcast the ArrayRef to a StringArray and call the specialized implementation
/// let string_array = arg.as_string::<i32>();
/// character_length_general::<Int32Type, _>(string_array)
/// }
/// DataType::LargeUtf8 => {
/// character_length_general::<Int64Type, _>(arg.as_string::<i64>())
/// }
/// DataType::Utf8View => {
/// character_length_general::<Int32Type, _>(arg.as_string_view())
/// }
/// _ => Err(ArrowError::InvalidArgumentError("Unsupported data type".to_string())),
/// }
/// }
///
/// /// A generic implementation of the character_length function
/// /// This function uses the `ArrayAccessor` trait to access the values of the array
/// /// so the compiler can generated specialized implementations for different array types
/// ///
/// /// Returns a new array with the length of each string in the input array
/// /// * Int32Array for Utf8 and Utf8View arrays (lengths are 32-bit integers)
/// /// * Int64Array for LargeUtf8 arrays (lengths are 64-bit integers)
/// ///
/// /// This is generic on the type of the primitive array (different string arrays have
/// /// different lengths) and the type of the array accessor (different string arrays
/// /// have different ways to access the values)
/// fn character_length_general<'a, T: ArrowPrimitiveType, V: ArrayAccessor<Item = &'a str>>(
/// array: V,
/// ) -> Result<ArrayRef, ArrowError>
/// where
/// T::Native: OffsetSizeTrait,
/// {
/// let iter = ArrayIter::new(array);
/// // Create a Int32Array / Int64Array with the length of each string
/// let result = iter
/// .map(|string| {
/// string.map(|string: &str| {
/// T::Native::from_usize(string.chars().count())
/// .expect("should not fail as string.chars will always return integer")
/// })
/// })
/// .collect::<PrimitiveArray<T>>();
///
/// /// Return the result as a new ArrayRef (dynamically typed)
/// Ok(Arc::new(result) as ArrayRef)
/// }
/// ```
///
/// # Validity
///
/// An [`ArrayAccessor`] must always return a well-defined value for an index that is
/// within the bounds `0..Array::len`, including for null indexes where [`Array::is_null`] is true.
/// An [`ArrayAccessor`] must always return a well-defined value for an index
/// that is within the bounds `0..Array::len`, including for null indexes where
/// [`Array::is_null`] is true.
///
/// The value at null indexes is unspecified, and implementations must not rely on a specific
/// value such as [`Default::default`] being returned, however, it must not be undefined
/// The value at null indexes is unspecified, and implementations must not rely
/// on a specific value such as [`Default::default`] being returned, however, it
/// must not be undefined
pub trait ArrayAccessor: Array {
/// The Arrow type of the element being accessed.
type Item: Send + Sync;
Expand Down
26 changes: 18 additions & 8 deletions arrow/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,10 @@
//! assert_eq!(array.values(), &[1, 0, 3])
//! ```
//!
//! It is also possible to write generic code. For example, the following is generic over
//! all primitively typed arrays
//! It is also possible to write generic code for different concrete types.
//! For example, since the following function is generic over all primitively
//! typed arrays, when invoked the Rust compiler will generate specialized implementations
//! with optimized code for each concrete type.
//!
//! ```rust
//! # use std::iter::Sum;
Expand All @@ -60,7 +62,10 @@
//! assert_eq!(sum(&TimestampNanosecondArray::from(vec![1, 2, 3])), 6);
//! ```
//!
//! And the following is generic over all arrays with comparable values
//! And the following uses [`ArrayAccessor`] to implement a generic function
//! over all arrays with comparable values.
//!
//! [`ArrayAccessor`]: array::ArrayAccessor
//!
//! ```rust
//! # use arrow::array::{ArrayAccessor, ArrayIter, Int32Array, StringArray};
Expand All @@ -81,10 +86,11 @@
//!
//! # Type Erasure / Trait Objects
//!
//! It is often the case that code wishes to handle any type of array, without necessarily knowing
//! its concrete type. This use-case is catered for by a combination of [`Array`]
//! and [`DataType`](datatypes::DataType), with the former providing a type-erased container for
//! the array, and the latter identifying the concrete type of array.
//! It is common to write code that handles any type of array, without necessarily
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this less verbose so it was easier to read.

//! knowing its concrete type. This is done using the [`Array`] trait and using
//! [`DataType`] to determine the appropriate `downcast_ref`.
//!
//! [`DataType`]: datatypes::DataType
//!
//! ```rust
//! # use arrow::array::{Array, Float32Array};
Expand All @@ -96,14 +102,18 @@
//!
//! fn impl_dyn(array: &dyn Array) {
//! match array.data_type() {
//! // downcast `dyn Array` to concrete `StringArray`
//! DataType::Utf8 => impl_string(array.as_any().downcast_ref().unwrap()),
//! // downcast `dyn Array` to concrete `Float32Array`
//! DataType::Float32 => impl_f32(array.as_any().downcast_ref().unwrap()),
//! _ => unimplemented!()
//! }
//! }
//! ```
//!
//! To facilitate downcasting, the [`AsArray`](crate::array::AsArray) extension trait can be used
//! You can use the [`AsArray`] extension trait to facilitate downcasting:
//!
//! [`AsArray`]: crate::array::AsArray
//!
//! ```rust
//! # use arrow::array::{Array, Float32Array, AsArray};
Expand Down
Loading