added `PyarrowTableResult` #830

zilto · 2024-04-17T20:29:59Z

You can pass to.SAVER(dependencies=["NODE_NAME"], combine=PyarrowTableResult()) to convert the specified node to the pyarrow.Table before materialization. The first motivation was to support more than pd.DataFrame and pyarrow.Table with the dlt DataSaver plugin. More generally, it can be useful for platform teams that want to have a "single way to store parquet files" that is independent of the specific API of a library (e.g., pandas, polars)

see #829 for more details

Changes

added h_pyarrow and tests
updated the dlt plugin example notebook

How I tested this

added 2 tests

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

skrawcz · 2024-04-22T03:21:15Z

hamilton/plugins/h_pyarrow.py

+    for example:
+    - pandas
+    - polars
+    - dask
+    - vaex
+    - ibis
+    - duckdb results


it would be nice to be stricter on types...

e.g.

def input_types(self) -> List[Type[Type]]: """Gives the applicable types to this result builder. This is optional for backwards compatibility, but is recommended. :return: A list of types that this can apply to. """ _types = [] try: import ... except ... return _types

In that case, the real check is if it implements __dataframe__(), which is done through pyarrow.interchange.from_dataframe() under build_result(). The PyarrowTableResult serve a slightly different role of "universal adapter" to help us avoid maintaining an explicit list of types (which is bound to grow). I opted to not include input_types() if it was to return Any.

added pyarrow resultbuilder; updated dlt example

76b15b5

skrawcz reviewed Apr 22, 2024

View reviewed changes

skrawcz approved these changes Apr 22, 2024

View reviewed changes

skrawcz merged commit 26bc1cc into main Apr 22, 2024
23 checks passed

skrawcz deleted the feat/pyarrow-result-builder branch April 22, 2024 03:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added `PyarrowTableResult` #830

added `PyarrowTableResult` #830

zilto commented Apr 17, 2024

skrawcz Apr 22, 2024 •

edited

Loading

skrawcz Apr 22, 2024 •

edited

Loading

zilto Apr 22, 2024 •

edited

Loading

added PyarrowTableResult #830

added PyarrowTableResult #830

Conversation

zilto commented Apr 17, 2024

Changes

How I tested this

Notes

Checklist

skrawcz Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

skrawcz Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

zilto Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

added `PyarrowTableResult` #830

added `PyarrowTableResult` #830

skrawcz Apr 22, 2024 •

edited

Loading

skrawcz Apr 22, 2024 •

edited

Loading

zilto Apr 22, 2024 •

edited

Loading