only interchange necessary columns #4286

MarcoGorelli · 2023-07-19T16:52:49Z

Trying to address this comment: #3901 (comment)

If it's not a wide plot, then only interchange the columns which are needed

MarcoGorelli · 2023-07-19T19:14:38Z

🤔 bit confused by the CI failure, it fails on "install chrome driver"?

alexcjohnson · 2023-07-19T19:29:54Z

@LiamConnors would you mind making another "pin chrome" PR in this repo? (Not in this PR but so this and other PRs can update and succeed again!)

nicolaskruchten · 2023-07-19T20:12:08Z

Wow, I'm excited that someone is biting the bullet on this one, thank you!

Would be nice for this to be reused for the jankier to_pandas() path as well if possible too, for the shorter term :)

LiamConnors · 2023-07-19T20:12:29Z

@LiamConnors would you mind making another "pin chrome" PR in this repo? (Not in this PR but so this and other PRs can update and succeed again!)

Opened a PR here to fix it: #4288 @alexcjohnson

MarcoGorelli · 2023-07-19T20:22:47Z

Would be nice for this to be reused for the jankier to_pandas() path as well if possible too, for the shorter term :)

Sure, but there's no guarantee of what the API to do that would be, right? Before having called to_pandas, the object could in theory be anything, with any API to select columns by name. (I might be missing something though, sorry)

alexcjohnson · 2023-07-19T20:51:44Z

packages/python/plotly/plotly/express/_core.py

-                df_pandas = df_not_pandas.to_pandas()
-            args["data_frame"] = df_pandas
+            args["data_frame"] = df_not_pandas.__dataframe__()
+            columns = args["data_frame"].column_names()


Can we be 100% sure anything returned by __dataframe__() will have column_names and select_columns_by_name methods? If there's any chance an object will come in with either of these missing we should fall back on interchanging the whole thing up front.

yup! you can check the spec here: https://data-apis.org/dataframe-protocol/latest/API.html

Oh I know they're in the spec, but I also know not everyone follows a spec to the letter 😉

this can help highlight shortcomings in their implementation then 😉 I tried it out with polars and it works fine there

I guess we'll know how to respond when we see:
AttributeError: 'MyDataFrame' object has no attribute 'select_columns_by_name'
And in principle you're right that it's not our problem, but we'll be the ones responding to the issue and having to tell our users "don't use this dataframe directly until they fix it." Whereas if we caught this case explicitly we could emit a warning like "This dataframe only partially implements the dataframe interchange protocol. Falling back on a slower full-copy algorithm" so it wouldn't affect usage in px, only performance, and it would be clear where the issue needs to be raised.

thanks for explaining - OK I've added a condition so it'll only use select_columns_by_name if that attribute is present

Nice. I took a look at also adding a fallback for missing column_names and that would be pretty awkward... but if someone has a partial implementation of the protocol presumably column_names is an easy piece so would get included early, whereas select_columns_by_name could be trickier. So let's leave it as you have it now. Thanks!

yeah if they don't have column_names then from_dataframe wouldn't work either, as it uses that internally

https://github.com/pandas-dev/pandas/blob/92792ec063031ae41443dabeb9d12f8daaac3ef1/pandas/core/interchange/from_dataframe.py#L112

nicolaskruchten · 2023-07-19T21:27:06Z

Sure, but there's no guarantee of what the API to do that would be, right? Before having called to_pandas, the object could in theory be anything, with any API to select columns by name. (I might be missing something though, sorry)

Heh, no, I think it's me that's forgotten that this is exactly why we have the data-interchange protocol, you're right ;)

…tly.py into dont-convert-everything

alexcjohnson

💃 Great work @MarcoGorelli, lovely tests!

only interchange necessary columns

d12dc1f

MarcoGorelli marked this pull request as ready for review July 19, 2023 17:13

alexcjohnson reviewed Jul 19, 2023

View reviewed changes

Merge branch 'master' into dont-convert-everything

2741606

MarcoGorelli and others added 3 commits July 20, 2023 08:11

fallback if no select_columns_by_name is present

78ead34

Merge branch 'dont-convert-everything' of github.com:MarcoGorelli/plo…

fa1962a

…tly.py into dont-convert-everything

include dataframe interchange PRs in changelog

f189576

alexcjohnson approved these changes Jul 20, 2023

View reviewed changes

alexcjohnson merged commit e430257 into plotly:master Jul 21, 2023
4 checks passed

LiamConnors mentioned this pull request Jul 25, 2023

Next release docs updates + upgrade plotly.js version #4294

Merged

18 tasks

anmyachev mentioned this pull request Jul 27, 2023

Consider implementing select_columns_by_name function, which is part of the DataFrame API modin-project/modin#6422

Closed

guyrosin mentioned this pull request Aug 13, 2023

Consider necessary columns from complex arguments when interchanging dataframes #4324

Merged

5 tasks

MarcoGorelli mentioned this pull request Aug 23, 2023

polars support holoviz/hvplot#1023

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only interchange necessary columns #4286

only interchange necessary columns #4286

MarcoGorelli commented Jul 19, 2023

MarcoGorelli commented Jul 19, 2023

alexcjohnson commented Jul 19, 2023

nicolaskruchten commented Jul 19, 2023

LiamConnors commented Jul 19, 2023

MarcoGorelli commented Jul 19, 2023 •

edited

Loading

alexcjohnson Jul 19, 2023

MarcoGorelli Jul 19, 2023

alexcjohnson Jul 19, 2023

MarcoGorelli Jul 19, 2023

alexcjohnson Jul 19, 2023

MarcoGorelli Jul 20, 2023

alexcjohnson Jul 20, 2023

MarcoGorelli Jul 21, 2023

nicolaskruchten commented Jul 19, 2023

alexcjohnson left a comment

only interchange necessary columns #4286

only interchange necessary columns #4286

Conversation

MarcoGorelli commented Jul 19, 2023

MarcoGorelli commented Jul 19, 2023

alexcjohnson commented Jul 19, 2023

nicolaskruchten commented Jul 19, 2023

LiamConnors commented Jul 19, 2023

MarcoGorelli commented Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolaskruchten commented Jul 19, 2023

alexcjohnson left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Jul 19, 2023 •

edited

Loading