Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated and remove from_legacy_dataframe usage #1168

Merged
merged 9 commits into from
Nov 19, 2024

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented Nov 15, 2024

goes with dask/dask#11529

@phofl phofl closed this Nov 15, 2024
@phofl phofl reopened this Nov 15, 2024
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl

)
def test_dataframe_mode_split_every(pdf, df, split_every, expect_tasks):
assert_eq(df.to_legacy_dataframe().mode(split_every=split_every), pdf.mode())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why to_legacy_dataframe was needed here originally. Maybe some sort of workaround that's no longer needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this is totally fine, just a non-blocking, not very important question

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clue, I was as confused as you are about this. Probably a mistake when adding the test at some point

.github/workflows/dask_test.yaml Outdated Show resolved Hide resolved
.github/workflows/test.yaml Outdated Show resolved Hide resolved
@@ -114,7 +113,6 @@ def read_json(
path_converter=path_converter,
**kwargs,
)
return from_legacy_dataframe(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. It looks like we can do this now since the changes over in dask/dask#11529 make some methods (e.g. dd.read_json) more dask-expr aware

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@@ -172,7 +170,7 @@ def to_json(
from dask.dataframe.io.json import to_json

return to_json(
df.to_legacy_dataframe(),
df,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over in orc.py this is df.optimize() -- should we do something similar here? Or is the .optimize() maybe not needed over there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls to_delayed, which triggers the optimization. I'd rather collect them in a single place

Comment on lines +1420 to +1431
if lengths is True:
lengths = tuple(self.map_partitions(len, enforce_metadata=False).compute())

arr = self.values

chunks = self._validate_chunks(arr, lengths)
arr._chunks = chunks

if meta is not None:
arr._meta = meta

return arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this is the same implementation as over in dask/dask

Comment on lines +1441 to +1447
if is_extension_array_dtype(self._meta.values):
warnings.warn(
"Dask currently has limited support for converting pandas extension dtypes "
f"to arrays. Converting {self._meta.values.dtype} to object dtype.",
UserWarning,
)
return self.map_partitions(methods.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

phofl and others added 2 commits November 19, 2024 10:43
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
@phofl phofl merged commit ef6f27f into dask:main Nov 19, 2024
1 check passed
@phofl phofl deleted the legacy-dataframe branch November 19, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants