Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Fixing DelegatingBlockBuilder to avoid re-serializing objects multiple times #48509

Merged
merged 2 commits into from
Nov 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 1 addition & 12 deletions python/ray/data/_internal/delegating_block_builder.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
import collections
from typing import Any, Mapping, Optional

from ray.air.util.tensor_extensions.arrow import ArrowConversionError
from ray.data._internal.arrow_block import ArrowBlockBuilder
from ray.data._internal.block_builder import BlockBuilder
from ray.data._internal.pandas_block import PandasBlockBuilder
from ray.data.block import Block, BlockAccessor, BlockType, DataBatch


Expand All @@ -23,17 +21,8 @@ def _inferred_block_type(self) -> Optional[BlockType]:
def add(self, item: Mapping[str, Any]) -> None:
assert isinstance(item, collections.abc.Mapping), item

import pyarrow

if self._builder is None:
try:
check = ArrowBlockBuilder()
check.add(item)
check.build()
self._builder = ArrowBlockBuilder()
except (TypeError, pyarrow.lib.ArrowInvalid, ArrowConversionError):
# Can also handle nested Python objects, which Arrow cannot.
self._builder = PandasBlockBuilder()
Comment on lines -26 to -36
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fallbacks are no longer necessary since Arrow now support ArrowPythonObjectType allowing to fallback to serializing rows as native Python objects (using pickle)

self._builder = ArrowBlockBuilder()

self._builder.add(item)

Expand Down