Deprecated and remove from_legacy_dataframe usage #1168

phofl · 2024-11-15T20:04:30Z

jrbourbeau

jrbourbeau · 2024-11-18T21:06:29Z

dask_expr/tests/test_reductions.py

 )
 def test_dataframe_mode_split_every(pdf, df, split_every, expect_tasks):
-    assert_eq(df.to_legacy_dataframe().mode(split_every=split_every), pdf.mode())


It's unclear to me why to_legacy_dataframe was needed here originally. Maybe some sort of workaround that's no longer needed?

To be clear, this is totally fine, just a non-blocking, not very important question

No clue, I was as confused as you are about this. Probably a mistake when adding the test at some point

.github/workflows/dask_test.yaml

.github/workflows/test.yaml

jrbourbeau · 2024-11-18T22:17:42Z

dask_expr/io/json.py

@@ -114,7 +113,6 @@ def read_json(
        path_converter=path_converter,
        **kwargs,
    )
-    return from_legacy_dataframe(df)


Nice. It looks like we can do this now since the changes over in dask/dask#11529 make some methods (e.g. dd.read_json) more dask-expr aware

jrbourbeau · 2024-11-18T22:20:41Z

dask_expr/io/json.py

@@ -172,7 +170,7 @@ def to_json(
    from dask.dataframe.io.json import to_json

    return to_json(
-        df.to_legacy_dataframe(),
+        df,


Over in orc.py this is df.optimize() -- should we do something similar here? Or is the .optimize() maybe not needed over there?

This calls to_delayed, which triggers the optimization. I'd rather collect them in a single place

jrbourbeau · 2024-11-18T22:26:00Z

dask_expr/_collection.py

+        if lengths is True:
+            lengths = tuple(self.map_partitions(len, enforce_metadata=False).compute())
+
+        arr = self.values
+
+        chunks = self._validate_chunks(arr, lengths)
+        arr._chunks = chunks
+
+        if meta is not None:
+            arr._meta = meta
+
+        return arr


Just noting that this is the same implementation as over in dask/dask

jrbourbeau · 2024-11-18T22:27:15Z

dask_expr/_collection.py

+        if is_extension_array_dtype(self._meta.values):
+            warnings.warn(
+                "Dask currently has limited support for converting pandas extension dtypes "
+                f"to arrays. Converting {self._meta.values.dtype} to object dtype.",
+                UserWarning,
+            )
+        return self.map_partitions(methods.values)


Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

phofl added 3 commits November 15, 2024 21:02

Deprecated and remove from_legacy_dataframe usage

5e8bdb8

Deprecated and remove from_legacy_dataframe usage

05e37bb

Deprecated and remove from_legacy_dataframe usage

93514e6

phofl closed this Nov 15, 2024

phofl reopened this Nov 15, 2024

phofl added 4 commits November 15, 2024 21:56

Remove to_legacy_dataframe

285fede

Remove to_legacy_dataframe

4357188

Fixup from delayed

b659e31

Fixup

870d924

jrbourbeau reviewed Nov 18, 2024

View reviewed changes

phofl and others added 2 commits November 19, 2024 10:43

Update .github/workflows/test.yaml

2fd0a27

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

Update .github/workflows/dask_test.yaml

d4ea1bb

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

phofl merged commit ef6f27f into dask:main Nov 19, 2024
1 check passed

phofl deleted the legacy-dataframe branch November 19, 2024 09:44

rjzamora mentioned this pull request Nov 19, 2024

TypeError: Unsupported type <class 'numpy.ndarray'> rapidsai/dask-cuda#1405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecated and remove from_legacy_dataframe usage #1168

Deprecated and remove from_legacy_dataframe usage #1168

phofl commented Nov 15, 2024

jrbourbeau left a comment

jrbourbeau Nov 18, 2024

jrbourbeau Nov 18, 2024

phofl Nov 19, 2024

jrbourbeau Nov 18, 2024

phofl Nov 19, 2024

jrbourbeau Nov 18, 2024

phofl Nov 19, 2024

jrbourbeau Nov 18, 2024

jrbourbeau Nov 18, 2024

Deprecated and remove from_legacy_dataframe usage #1168

Deprecated and remove from_legacy_dataframe usage #1168

Conversation

phofl commented Nov 15, 2024

jrbourbeau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment