Feature request: Bypass dataframes #85

malmaud · 2019-05-15T17:52:30Z

Thanks for this fantastic package!

I'm wondering if in the spirit of this package being useful for creating exploratory graphs with minimum boilerplate, it makes sense to be able to pass in arrays in lieu of dataframe column names for the plotting function fields.

eg, let

px.scatter(x=[1,2],y=[3,4])

be equivalent to

df=pandas.DataFrame({'x': [1,2], 'y': [3,4]})
px.scatter(df, x='x', y='y')

With the current Plotly API, I think the simplest alternative is

from plotly import graph_objs as go
plot([go.Scatter(x=[1,2],y=[3,4])])

which feels verbose compared to the px alternative mentioned above and which would require a complete refactor if I decide my data has become rich enough that I want to start using a dataframe-based representation.

As it stands, the necessity of constructing a dataframe can actually make plotly express more vebose than using plotly directly. It would be great for px to be the one-shop-stop package for exploratory plotting in Python.

Seaborn, which is stated as an inspiration for this package, does support this syntax for the majority of its plotting functions.

Thanks!

The text was updated successfully, but these errors were encountered:

nicolaskruchten · 2019-05-15T18:08:10Z

In this case, what would the names be in the resulting chart? just x, y, color etc?

nicolaskruchten · 2019-05-15T18:08:39Z

PS: thanks for the detailed suggestion :)

nicolaskruchten · 2019-05-15T18:16:26Z

(see also #37)

malmaud · 2019-05-15T18:29:00Z

Yes, exactly, the names would just be x, y, etc. I believe that is what seaborn does and it seems like a reasonable default.

#37 does seem similar but seems to be asking for something even more ambitious - the ability to mix dataframe column names and direct arrays/pandas series as feature values in a single call.

For this feature, the implementation seemed like it could be quite simple given the functionality that already exists: just detect if one the features is an array-like object instead of a string-like object, in which case a dataframe could be constructed on the fly and the rest of the code path could be used without alteration.

nicolaskruchten · 2019-05-16T02:23:51Z

Hmmm, interesting proposal!

My biggest concern would be providing very clear error messages if we can't construct a data frame about why we can't. I can imagine people trying to pass in all sorts of stuff and being confused about why it's not working. Arrays of the wrong dimensions come to mind as a likely thing.

If we do this and not #37 yet (i.e. this as a stepping-stone to #37) then we'd have to make the data_frame argument optional, and so we could use the presence/absence of this argument as the implicit flag to switch between the two modes i.e. if you pass in a data_frame then you cannot use arrays, but if you don't pass in a data_frame then you must? In a second pass we could then try to blend the two :)

I'll have to think about the downsides of this approach, if any, a little bit more, but should we decide to go ahead, do you think this is something you'd be interested in contributing a PR for @malmaud ? :)

malmaud · 2019-05-16T21:38:23Z

Sure, I can just contribute the PR now so we can play with it - I don't think it will be too hard. No harm if you don't end up merging it.

nicolaskruchten · 2019-09-11T13:10:45Z

This will be implemented as part of plotly/plotly.py#1767 ... thanks for the input and patience :)

malmaud mentioned this issue May 17, 2019

Support implicit dataframe argument. #87

Closed

malmaud mentioned this issue Jul 29, 2019

Plotly express feature request: Bypass dataframes plotly/plotly.py#1696

Closed

nicolaskruchten mentioned this issue Aug 1, 2019

Support using DataFrame index for colors, labels, x, y, etc. #126

Closed

nicolaskruchten closed this as completed Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Bypass dataframes #85

Feature request: Bypass dataframes #85

malmaud commented May 15, 2019 •

edited

Loading

nicolaskruchten commented May 15, 2019

nicolaskruchten commented May 15, 2019

nicolaskruchten commented May 15, 2019

malmaud commented May 15, 2019

nicolaskruchten commented May 16, 2019

malmaud commented May 16, 2019

nicolaskruchten commented Sep 11, 2019

Feature request: Bypass dataframes #85

Feature request: Bypass dataframes #85

Comments

malmaud commented May 15, 2019 • edited Loading

nicolaskruchten commented May 15, 2019

nicolaskruchten commented May 15, 2019

nicolaskruchten commented May 15, 2019

malmaud commented May 15, 2019

nicolaskruchten commented May 16, 2019

malmaud commented May 16, 2019

nicolaskruchten commented Sep 11, 2019

malmaud commented May 15, 2019 •

edited

Loading