Skip to content
This repository has been archived by the owner on Jun 3, 2024. It is now read-only.

Feature request: Bypass dataframes #85

Closed
malmaud opened this issue May 15, 2019 · 7 comments
Closed

Feature request: Bypass dataframes #85

malmaud opened this issue May 15, 2019 · 7 comments

Comments

@malmaud
Copy link

malmaud commented May 15, 2019

Thanks for this fantastic package!

I'm wondering if in the spirit of this package being useful for creating exploratory graphs with minimum boilerplate, it makes sense to be able to pass in arrays in lieu of dataframe column names for the plotting function fields.

eg, let

px.scatter(x=[1,2],y=[3,4])

be equivalent to

df=pandas.DataFrame({'x': [1,2], 'y': [3,4]})
px.scatter(df, x='x', y='y')

With the current Plotly API, I think the simplest alternative is

from plotly import graph_objs as go
plot([go.Scatter(x=[1,2],y=[3,4])])

which feels verbose compared to the px alternative mentioned above and which would require a complete refactor if I decide my data has become rich enough that I want to start using a dataframe-based representation.

As it stands, the necessity of constructing a dataframe can actually make plotly express more vebose than using plotly directly. It would be great for px to be the one-shop-stop package for exploratory plotting in Python.

Seaborn, which is stated as an inspiration for this package, does support this syntax for the majority of its plotting functions.

Thanks!

@nicolaskruchten
Copy link
Contributor

In this case, what would the names be in the resulting chart? just x, y, color etc?

@nicolaskruchten
Copy link
Contributor

PS: thanks for the detailed suggestion :)

@nicolaskruchten
Copy link
Contributor

(see also #37)

@malmaud
Copy link
Author

malmaud commented May 15, 2019

Yes, exactly, the names would just be x, y, etc. I believe that is what seaborn does and it seems like a reasonable default.

#37 does seem similar but seems to be asking for something even more ambitious - the ability to mix dataframe column names and direct arrays/pandas series as feature values in a single call.

For this feature, the implementation seemed like it could be quite simple given the functionality that already exists: just detect if one the features is an array-like object instead of a string-like object, in which case a dataframe could be constructed on the fly and the rest of the code path could be used without alteration.

@nicolaskruchten
Copy link
Contributor

Hmmm, interesting proposal!

My biggest concern would be providing very clear error messages if we can't construct a data frame about why we can't. I can imagine people trying to pass in all sorts of stuff and being confused about why it's not working. Arrays of the wrong dimensions come to mind as a likely thing.

If we do this and not #37 yet (i.e. this as a stepping-stone to #37) then we'd have to make the data_frame argument optional, and so we could use the presence/absence of this argument as the implicit flag to switch between the two modes i.e. if you pass in a data_frame then you cannot use arrays, but if you don't pass in a data_frame then you must? In a second pass we could then try to blend the two :)

I'll have to think about the downsides of this approach, if any, a little bit more, but should we decide to go ahead, do you think this is something you'd be interested in contributing a PR for @malmaud ? :)

@malmaud
Copy link
Author

malmaud commented May 16, 2019

Sure, I can just contribute the PR now so we can play with it - I don't think it will be too hard. No harm if you don't end up merging it.

@nicolaskruchten
Copy link
Contributor

This will be implemented as part of plotly/plotly.py#1767 ... thanks for the input and patience :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants